Open Source 3D r/Place VR Visualizer [OC] by GregBahm in dataisbeautiful

[–]daniel 0 points1 point  (0 children)

If only one of us had some way of knowing what we were talking about...

Open Source 3D r/Place VR Visualizer [OC] by GregBahm in dataisbeautiful

[–]daniel 10 points11 points  (0 children)

[...] it's not just completely for fun.

It was actually. I'm trying to find a way to twist this so it's blatantly nefarious and self-serving for reddit, but I've got nothing.

To assume that the data collected isn't used for advertising or sold to advertisers is a little naive, though.

Not at all actually. Who would even buy that information?

Place Datasets (April Fools 2017) by Drunken_Economist in redditdata

[–]daniel 7 points8 points  (0 children)

Yeah, I believe that's using the event stream (if you click through to the HN source, you can see it's referring to u/Drunken_Economist's initial data dump), which can have duplicate timestamps or have slight ordering problems due to server side timestamps being different. This dump is the defacto board state that was shown to all users at every point in time, and could be used as synchronization "frames" for that event-stream if someone wanted.

Place Datasets (April Fools 2017) by Drunken_Economist in redditdata

[–]daniel 7 points8 points  (0 children)

Sorry, it's not a stupid question, but they're still in their raw binary format. If you're programmatically inclined you can make them into PNGs. If not, I suspect someone else will shortly given how great the community is at that sort of thing.

Place Datasets (April Fools 2017) by Drunken_Economist in redditdata

[–]daniel[A] 54 points55 points  (0 children)

Here's a torrent of a screenshot of the board taken once per minute from the very beginning. There's a description of the format in the readme, but I'm guessing that's been thoroughly described elsewhere on the site by now. This should help reconcile any events you see that are at the same timestamp.

Please seed.

How We Built r/Place by bsimpson in programming

[–]daniel 1 point2 points  (0 children)

You're right, and it is on roadmap, and it is being actively worked on. Unfortunately all we can really do is tell people "we're working on it" right now.

How We Built r/Place by bsimpson in programming

[–]daniel 1 point2 points  (0 children)

For the case of redis, just some custom scripts built to test it for our use case. In this case, two things were tested: doing large batches of random writes of pixels onto the board and doing reads of the full canvas. That was what led to us realizing reads actually would be a huge hit to our performance since we were network-bound on the instances and 1 read of the board per second would cost us thousands of writes per second, since updating a pixel is <~20 bytes and reading the full board is 500KB.

I think it theoretically would be fine, since we were way overoptimizing for the write rate we ended up getting (I wanted to support one pixel per user per 10 seconds, even though I knew we'd probably never go down that low), but I was also concerned that there would be some event that would cause all users to refresh and request the board at the same time -- a thundering herd that would saturate redis's network throughput. This is what led us down the road of caching. I suspect people were refreshing a lot during the websockets issues, so this was probably a life-saver.

Edit: For cassandra, we already have a good sense of how much read and write capacity we can handle. The only load testing that was done there was in completely filling a board and trying to load a canvas. I think the article mentions it, but that's when we saw it took way too long to load and fully made the decision to go with redis for the board. We also did some napkin math of how much storage space storing pixel info would use and found it was relatively small compared to the rest of reddit.com, so we still used it for that.

How We Built r/Place by bsimpson in programming

[–]daniel 0 points1 point  (0 children)

I disagree with this, bots and scripts took some worth out of the product.

I anticipated people making graphs and collecting data and doing visualization, but not necessarily the group image writing stuff. I can only really speak for myself there though. I don't recall in any discussions the image writing stuff come up -- just that we were actually excited to see what the community would automate with it.

I'm not sure what the alternative is though. People will reverse what you've written and make bots for it. I suppose we could have tried to obfuscate it enough to make it difficult to do in the time of the length of the project. But making it harder to access programmatically would also hinder the people making benign bots (e.g. people graphing what colors are winning) and doing data collection.

How We Built r/Place by bsimpson in programming

[–]daniel 0 points1 point  (0 children)

Mostly just eyeballing from previous april fools' numbers. Robin had a similar if not slightly higher peak concurrent user count (I'm going off of memory here though). I attribute that to users basically being forced to stay around to wait for the channels to merge, lest they be kicked and lose all their progress.

How We Built r/Place by bsimpson in programming

[–]daniel 0 points1 point  (0 children)

Nice catch, especially from just reading an article. I was thinking about this recently as well. Maybe I'll do some analysis at some point and check it. I'm not 100% sure what you mean about Cassandra selecting the one with the most recent timestamp though. It should just be whoever wrote last to both redis and Cass.

My thinking on the problem is like this:

Jack posts to place a red pixel. Jill posts to place a blue pixel. Jack's update to redis goes through. Jill's update to redis goes through. Jill's update to Cass (where individual pixel info is stored) goes through. Jack's update to Cass goes through.

Now they're out of sync. The worst thing that happens is now it looks like Jack placed Jill's blue pixel.

Now that I'm thinking through it further, there's also obviously a problem with the websockets message being miss-synced. My thinking on all of this is is that it is much more likely to happen on hotly contested pixels, which also makes the problem more likely to be quickly corrected. Of course, Murphy's Law is bound to happen at scale, so we definitely saw this happen a couple of times.

How We Built r/Place by bsimpson in programming

[–]daniel 1 point2 points  (0 children)

The initial idea was that we'd start streaming updates from websockets for longer than the TTL, collect those updates, request the canvas, and then overlay them on what we got. I'm not sure if we actually ended up doing it, but yes, it was something we were aware of.

r/place scripts were "artisanal" and "hand-crafted" by tvks in programmingcirclejerk

[–]daniel 26 points27 points  (0 children)

actually a lot of people dont realize that one-liner was also gluten-free

How We Built r/Place by bsimpson in place

[–]daniel 30 points31 points  (0 children)

But really, we just wanted to watch /new freak out. Also, I expected someone to notice the timeout being 10:03 at one point, but I don't recall seeing anything about it.

How We Built r/Place by bsimpson in programming

[–]daniel 91 points92 points  (0 children)

It would follow around to zones of high activity. u/madlee said he used some kind of serial killer algorithm.

How We Built r/Place by bsimpson in programming

[–]daniel 87 points88 points  (0 children)

Maybe they just think highly of themselves?

I am complete by cooliochill in place

[–]daniel 12 points13 points  (0 children)

Seconded. I want one.

The official history of the Green Lattice! by jojo6311 in place

[–]daniel[A] 32 points33 points  (0 children)

I'd really like to see more groups do these.