When a Custom In-Memory Engine Outran DuckDB and MS SQL

A while back at a previous job, I spent some time working on a performance-critical query. The dataset wasn’t huge — somewhere between 10 and 50 million rows — but the query was complex enough that performance started to matter.

The use case was straightforward: compare two time periods, calculate growth across category and country combinations, and return the top results. The catch was in the permissions — clients could have access restricted by country or category, and those filters had to be applied before computing anything. That made preaggregation a non-starter.

At the time, we solved this with MS SQL Server. It worked. The queries were tuned, the indexes carefully chosen, and the system delivered acceptable performance. But it always felt like we were brushing up against the ceiling.

Recently, I revisited the same problem — not because I needed a new solution, but because I was curious. I wanted to see how much performance you could get out of a focused, purpose-built implementation, with no general-purpose overhead.

I tried DuckDB first. With the dataset loaded into a local columnar file, it ran the full comparison query in about 900ms on a 6-core, 16GB machine. That alone was impressive — faster than MS SQL in a much leaner setup. But I still wondered: how fast could this go?

So I wrote an in-memory version in Go. I streamed the 7GB dataset into RAM, stored it using compact columnar structs, and implemented just the logic I needed: match rows from two periods, compute the growth, sort the result. That’s it.

It ran in under 100ms and used around 546MB of memory.

Obviously, this isn’t scalable. It’s not flexible, not SQL-compatible, and definitely not something I’d reach for in a real-world multi-user system. But it was a fun experiment. And it shows what you can achieve when you eliminate everything but the essential logic and data structures.

General-purpose databases are incredibly powerful, and for most problems, they’re the right tool. But every now and then, it’s worth stepping outside the box — if only to remind yourself how much performance is hiding under all the abstraction.

If you’re curious, the full Go implementation is available on GitHub.