backend engineer · distributed systems
I work at the intersection of correctness and scale — where the interesting problems live. Building distributed systems for enterprise network management.
I started at Tejas Networks as an embedded developer. Before my first day, an HR conversation rerouted me to a Java backend role. Before my second week, I was in a room with a lead, a manager, and an architect, being asked to evaluate Apache Mina against Netty for a 10k traps/second target.
I didn't know what Netty was. I said so. Then I went and figured it out.
That pattern stuck. I've since designed graph-backed metadata services for 25k+ device digital twins, built a Zero-Touch Provisioning controller using the transactional outbox pattern, and cut query latency in half by rethinking indexing from the ground up.
The work is on distributed systems for enterprise network management — unglamorous, complex, and exactly where I want to be. I'm drawn to the problems that require understanding the system before touching the code.
Coordination is the enemy of scale.
Every lock, every synchronous cross-boundary call, every leader dependency creates a ceiling. Good distributed design is about knowing where you can afford eventual consistency — and being honest where you can't.
Most systems don't fail — they degrade quietly.
Crashes are the easy case. The hard failures are the ones where the system keeps running but slowly loses correctness. Idempotency, deduplication, and recovery paths aren't features — they're the real design.
Operational complexity is technical debt with a delay.
A system that requires someone to know which node holds which state is a system waiting to fail at 2am. The best architectures are boring to operate.
Distributed job scheduler. 50M tasks/day. No external coordinator.
"The interesting problem wasn't scheduling — it was what happens to in-flight tasks during a Raft leader election. Most schedulers pretend this edge case doesn't exist."
Event-driven feed system exploring the write amplification vs. read latency tradeoff.
"Fanout-on-write is fast to read but expensive to write. The right answer depends on your read/write ratio — not on what Twitter did."
I'm open to the right conversation — engineering roles, hard problems, or anything in the distributed systems space worth thinking through.