Learning by Reimplementing Raft

Back in 2016, I was working on an NCache-to-Redis migration. The catch? This wasn’t a clean, containerized world. Everything ran on dedicated servers — some literal metal boxes you could walk over to and kick.

Redis was fairly new at the time, and unless you had spare budget lying around, “cloud-hosted Redis” wasn’t really an option. My task: learn Redis from scratch and set up a production-ready cluster.

This was around the time Kubernetes had just been announced. Cloud providers didn’t support it yet, and the best you could get was a few virtual machines with Docker Swarm. Redis had recently deprecated its Sentinel-based clustering in favor of a new gossip-based protocol, and I was trying to make sense of it all — learning Ruby scripts, writing shell scripts to automate Redis installs, and figuring out what “replication lag” even meant.

Somewhere in that process, I stumbled upon Raft, the consensus algorithm by Diego Ongaro and John Ousterhout. It was designed to be understandable, a direct response to the notorious complexity of Paxos. That simple promise — understandable distributed consensus — hooked me.

The Algorithm That Stuck

Over the years, I kept coming back to Raft. Not because I needed it for work, but because I wanted to understand distributed systems at a deeper level.

I’ve implemented Raft three times now — first in C#, then in TypeScript, and most recently in Go. Each time, I learned something new — not just about consensus or replication, but about the trade-offs between clarity, correctness, and complexity.

Why Raft Resonates

The genius of Raft isn’t in its novelty. It’s in its design for comprehension.

Where Paxos can feel like a riddle wrapped in an academic paper, Raft reads like a story:

There are three roles: follower, candidate, and leader.
There are two RPCs: one for voting, one for log replication.
There’s a shared log that everyone eventually agrees on.

That’s it. Elegant and approachable — until you start implementing it.

Because the moment theory meets code, simplicity evaporates into edge cases. Timeouts, term mismatches, dropped messages, log consistency — suddenly, that “easy to understand” algorithm becomes a minefield of subtle bugs.

That tension — between conceptual simplicity and implementation complexity — is exactly what makes Raft such a rewarding problem to revisit.

The Latest Iteration: Raft in Go

This time, I approached my Go implementation with a single goal: make it as readable and approachable as possible — even if that meant breaking some of Go’s conventions.

Each state — follower, candidate, leader — lives in its own file, encapsulating its logic and the way it handles incoming RPCs. There’s a thin wrapper struct around them, containing the shared properties: the log, current term, and node ID.

When a node transitions between states, it simply swaps out the current state object. No massive if-else trees. No mysterious control flow. Each state knows how to respond to events in its own way.

Every method and variable name is explicit, even verbose. Some might argue it goes against idiomatic Go. I’d argue it makes the code self-explanatory. Because the purpose here isn’t just to get a Raft cluster running — it’s to understand why it works.

What Reimplementing Raft Taught Me

Over the years, I’ve realized that implementing Raft isn’t about building a production-grade consensus system.

It’s about developing an intuition for how distributed systems think:

Failure is the default. A node that’s alive is a temporary state. Design around that.
State machines are your friends. Modeling systems explicitly as states (follower, candidate, leader) makes logic clearer and bugs rarer.
Clarity beats cleverness. Code that mirrors the whitepaper’s structure is easier to reason about — and debug.

Understanding precedes optimization. You can’t scale what you don’t understand.

Reimplementing Raft, again and again, forced me to slow down and internalize these lessons. It’s like rereading a classic book — you notice new layers each time, because you’ve changed since the last read.

A Living Implementation

If you’d like to explore my Go implementation of Raft — designed to be readable, not necessarily production-ready — you can find it here: 👉 GitHub Repository

I’ve documented it extensively so others can follow the flow and, hopefully, learn the same lessons I did — not by reading about Raft, but by watching it come alive through code.

Closing Thoughts

Looking back, it’s funny how a Redis migration from nearly a decade ago led me down a path of learning distributed consensus.

Back then, I was wiring together shell scripts and hoping nothing crashed. Today, I find joy in understanding why things fail and how systems recover from it.

That, to me, is the essence of distributed systems: embracing failure, designing for it, and learning from it — one Raft implementation at a time.