Hi everybody! It’s been a bit of time since I’ve had a chance to spend any time here on the forums, but there’s been a big reason for that. Since about October, the team and I have been working on a complete overhaul Evolve’s telemetry system. I’m pleased to announce that Thunderchild, the new telemetry system, is now live in Evolve and (mostly!) working as intended!
I’m here to let you know a little about the design, development, and ultimately the “what cool shit can we do now” of Thunderchild. I hope to go even deeper in detail of the story in the future, but for now I’ll try to mostly keep it to features. Or, maybe I’ll just ramble incoherently until I run out of things to say.
Thunderchild started as most things started around September/Octoberish (and times may not be 100% accurate because I honestly don’t remember), with me stomping my feet and saying/yelling “I’m going to tear the beating heart out of the telemetry system.” I’d had enough of the old system, which didn’t even have a cool name. So, I started designing a new system, not knowing if it would ever get implemented, but knowing that if at some point we had an opportunity to start from scratch we’d have a plan. It started life as a technical doc and a Powerpoint presentation pitch to our Technical Director.
As it turned out, all the stars aligned on this theoretical new system. It became very clear that in order to meet the long term plan for Evolve, we needed this system now. Like… right now. Like, drop everything you’re doing, put a team together, and get it into the live game. The “you have until the end of December, don’t fuck this up” kind of now. No pressure.
So, as work began on “Telemetry 2.0” I decided that “Telemetry 2.0” was a really lame name for this thing. I felt like a complete jackass for doing this at first, and mostly expected when I started doing this that I’d get a lot of “oh, that guy… humor him…” kind of looks. But, I gave the new system a name. I called it Thunderchild, after Lennox’s suit as you all know. It turned out to be a great decision and when I started hearing other people refer to the system by name I started to get a sense that we were really on to something here. Giving it a name centralized the vision of this thing in everybody’s head. It wasn’t just a new telemetry system. It was Thunderchild, and it was going to be awesome. When announcing it to the whole studio, I even created this artistic representation giving a thumbs up to encourage people to help test the system:
I know, you’re impressed.
So, what is Thunderchild? Why is it different and better? How many buzzwords can Citizen fit into the next sentence?
Well, Thunderchild is a non-relational schema-less cloud based storage and processing system that uses JSON records, instead of relational events, that offers both real time data transformation and Hadoop/Spark cluster based Big Data analysis.
Here’s what that really means though. First, the system is ridiculously resilient and able to recover from failure. By that, I mean we broke it this morning in the live environment and were able to diagnose the problem and fix the problem without any data loss in just a few hours. The system is very modular so a failure in one piece shouldn’t cause cascading failures elsewhere. With how things are stored, we have the ability to quickly restore to any state and unless the entire cloud catches fire, we’ll be good.
Second, it means that we will be able to rapidly iterate on changes to data collection. Previously, our relational model of the event data was very sensitive to change. If we messed up a column name in some step of a chain somewhere, we would lose data. Changing anything in the old system was like pulling teeth. Being able to rapidly iterate is something that is absolutely needed for the long term future of Evolve, and this system lets us make those changes without worrying about breaking things.
Finally, and most importantly, we no longer have an event based system, but have something that I call a “record.” The old system worked by collecting events, unconnected moments in time. This was great as a first pass, but the problems came whenever we tried to join these events together. Errors in the old system would get magnified. We’d be going through these long chains of table joins, only to find some hidden bug in the system. Now, we aggregate events before they get sent to us and define a collection period for each record. The record then tracks all the things it cares about until its collection period is over and sends it as one record.
This seems like a small change, but it’s actually massive. As an example, take matchmaking. Before, we would have matchmaking enter and matchmaking exit events. Then, we’d go through contortions trying to make those sets of events line up correctly. Any error in the chain of events could cause the whole chain to become invalid. That was happening a lot more that it should have. With the record system, the event only ever gets sent if the data collection is complete for the period the record is interested in. If something goes wrong, we don’t get some broken chain of events that we have no idea how to process, which prevents a lot of biases that existed in the old system. Each record tells a complete story, and it’s been pretty awesome so far what can be done with that.
What about this actually changes your player experience though? Well, other than the usual “more reliable data means better more informed development” there are a few specifics. First, you’ve probably noticed in the past I had to be a bit dodgy about some topics. The measurements were either not that good or I didn’t trust them enough in the old system to talk about them like facts. A lot of that will change. I know people have asked for things like accuracy, dps, skill adjusted stats, and a lot of other stuff. We can answer a lot more of the questions now. We even have a specific dome record now that just tracks what happens in the dome and are starting to come up with new measures like damage per incap for rounds.
Also, ever noticed that our community challenges seem like they drew from a relatively small pool of ideas? (Also, you all now know why I haven’t really been around to oversee a lot of the challenges for the last couple months. Busy times.) Believe it or not, the first ever community challenge I worked on, the St. Patty’s challenge, was not originally meant to be just “wins with Griffin.” It was actually that challenge that started to give me hints at where the old telemetry system was limited. The first pass of that challenge was actually going to be a count of successful dome captures. We couldn’t do that in the old system because of some crazy joins. We can easily do that now, and many other challenges that were impossible before.
Just to get this one too, because I know someone will ask, we won’t have a public facing API for Thunderchild, yet. It’s still my hope to get something like that going in the not too distant future. However, the old system would have been almost impossible to retrofit a public facing API on. Thunderchild is fully capable in the future of having a public facing API added to it. So, not now, and can’t promise anything, but that’s still something I’m interested in and many others are as well. It becomes possible now, though still a bit of work.
Oh, and @mizx made the entire new internal website for Thunderchild and it looks and runs really good, so expect to see some pictures of that soon. He’s also been doing great on rapidly adding features to the new page and we wouldn’t have been able to make this happen without him. I’m hoping he’ll be able to provide feedback in the future as well about that whole API thing.
And, that’s about it. Many thanks to 2k and TRS for having a strong belief in the long term of Evolve. I’m looking forward to sharing more with you all as things progress, and I’ll lurk around the thread for a while to answer questions if I can answer them.