Last week, I attended SREcon 2023. I was invited to attend the “next-gen delivery” working group meetings, due to my involvement within the Continuous Delivery Foundation (CDF). I used the opportunity to attend the rest of the conference as well. While I’ve never been an SRE by title, I’ve toed the line between operations and software development for many years.

Conference makeup

Surprisingly, the makeup of the conference attendees were different than I expected. It appeared that many of the folks were managerial. Talking with an organizer, they believed this was due to the budget cuts happening across our industry. They believed managers were attending to determine if it was a high-value conference to send their reports to. For similar budgetary reasons, large tech companies were underrepresented. The only participants from many (Google, Dropbox, etc) were those speaking. Ultimately, this made the conference feel less mature than I suspect previous years might have felt. Instead of these companies, financial services and healthcare were well represented.

The composition of the conference played heavily into the composition of the next-gen delivery working group. Many people were behind the curve on automation, handling large deploys across clusters manually. As such, there wasn’t a lot of learning about practices that we could learn from there. One notable exception was the vendor prodvana.io, which offers a convergence loop mechanism for deploys. What this means is that you declare some expected outcome state and the system has goal-seeking behavior to make that happen. This is similar to how kubernetes will attempt to achieve your stated definition. This is pretty interesting to me and I’m keen to experiment with building a convergence system to see where the idea has trouble.

In the main conference, the talks were segmented across two tracks. One track seemed to be more focused on the social side of incidents, while the other was more technical. Both had some good content, though there wasn’t always a timeslot with something I found interesting.

My favorite talks

Scaling kafka - Liz is a great speaker. This was good for me, as someone who’s not been near to having to maintain a kafka cluster. One of the things I really liked in this talk was discussion about spending “innovation tokens”. The premise was, you only get a fixed number of “innovation tokens” to spend for your product. They didn’t want to spend them on a kafka replacement, as they were already spending on their data store. The unstated alternative, would be that if they overspent, they would underserve their customers.

Writing a ultra low-latency trading system w/ go & java - Fantastic talk that does a deep dive into what it looks like to run an extremely low-latency system on public cloud infrastructure. They’re running at ~1 microsecond computation time and < 50 usec e2e latency. It’s a look at every section of the stack.

Network Error Logging - TIL about NEL, which allows browsers to send failed (and even successful) network fetch telemetry. Super useful for diagnosing connectivity issues, as discussed in the talk from Wikimedia.

There were several talks about diving into incident data. There was a shallow view from spotify and a deep view from a researcher at Verica. The latter talk pulled from a repository of 10k+ public incident reports, which is a juicy data set to look through.

Planet-scale prometheus - A great talk from eBay folks about a massively scaled prometheus cluster. Lots of interest in it, which is exciting.

Adaptive concurrency control - A topic about how to recover from bursted workloads and preserve your services, tapping into some queue theory stuff. It’s a step beyond just a simple rate-limit. One thing I liked about this talk was specific examples of how to implement it.

Not a talk per-se, but I had some great conversations about risk estimation with John Benninghoff, an SRE manager in healthcare. He pointed me to the Society of Information Risk Analysts and a book on how to quantify risk.

Talks I want to watch afterwards

I wasn’t able to attend all the talks I wanted to, either due to the next-gen working group or other factors. These are on my list to watch later.

Conclusion

Interesting conference. There were gems and it certainly offered me some alternative viewpoints. I don’t think I would go back (working group aside), but I think folks more directly involved SRE may benefit.