Why we rewrote the tunnel stack in Rust — build notes from 21tunnel

Every major tunneling service I've looked at is Go-powered: ngrok, Cloudflare Tunnel, localtunnel, boringtun, chisel. It's a sensible default. Go ships fast, cross-compiles trivially, and its networking ecosystem is exactly as mature as it needs to be for this kind of workload. The obvious choice.

I picked Rust anyway. This post is why, and what it cost us to have the most aggressive lint profile I could justify turned on from day one.

The itch

A tunnel server has three jobs: accept TLS connections from agents, accept public HTTP requests, route bytes between the two. The cost of getting it wrong is real: a tunnel that crashes on malformed input takes down every customer on the same node. A control-plane panic drops every session. A memory leak in a session handler compounds until the host OOMs.

None of these are hard problems in Go — you just need to be careful. My objection is that the language doesn't enforce “be careful” anywhere meaningful. A nil dereference is always one re-slice-of-a-nil away. A goroutine that panics escalates the whole process unless every go statement is wrapped in a recover. I've written that code a dozen times. I wanted the compiler to help instead.

Why Rust, specifically

Two reasons the usual memory-safety arguments don't capture:

#![forbid(unsafe_code)]

Every crate root in our repo carries #![forbid(unsafe_code)]. Not deny, not warn — forbid, which is the strongest form; it can't be overridden per-function. The count of unsafe blocks in our 5,000-line codebase is zero. The count of pointer-dereference bugs is, therefore, also zero. We haven't avoided an entire class of bug — the class doesn't exist in our source.

Can you do the same in Go? Basically no: Go has unsafe too, and while most code doesn't touch it, you can't turn it off at compile time. In Rust, I ran a one-line grep and could prove the property holds.

The lint wall

Rust's clippy is opinionated in a way that pays back. Our workspace lint section reads like an aggressive attitude:

Cargo.toml[workspace.lints.rust]
unsafe_code         = "forbid"
missing_docs        = "warn"
unused_must_use     = "deny"

[workspace.lints.clippy]
all                 = "deny"
pedantic            = "deny"
unwrap_used         = "deny"
expect_used         = "deny"
panic               = "deny"
unreachable         = "deny"
todo                = "deny"
unimplemented       = "deny"
dbg_macro           = "deny"
print_stdout        = "deny"
print_stderr        = "deny"
exit                = "deny"
panic_in_result_fn  = "deny"
unwrap_in_result    = "deny"

What does that buy you in practice? Every place the code would blow up on bad input was flagged at compile time. Not caught — flagged. I had to go write the error-path response for every failure case before the binary would link. That sounds tedious, and it is, for about three days. After that the forced discipline pays compound interest: the failure paths work, the first time, every time, on the most inconvenient inputs you can throw at them.

The practical win: zero-downtime deploys from day one. A malformed webhook payload, a dropped database connection, a weird Accept-Encoding header — none of them crash the process, because the compiler refused to let the crash path exist.

Why not QUIC

Our first prototype used Quinn, the reference Rust QUIC implementation. Quinn is excellent. It didn't work for our use case for three reasons:

Middleboxes hate QUIC. UDP on anything other than port 443 is a coin flip behind corporate firewalls, mobile NAT pools, or any hotel Wi-Fi. TCP+TLS on 443 is what those networks expect to see. We tunnel from agents on laptops in coffee shops. Any amount of “but QUIC works through this one” is a bug report I don't want.
Binary size. Quinn + rustls + our app logic was close to 14 MB. tokio-rustls + yamux + our app logic is ~6 MB. Agents get shipped everywhere; a 60% binary-size cut matters for edge devices.
Debugging story. When TLS+TCP breaks, you can Wireshark it. When QUIC breaks, you can't, because it's encrypted at the transport layer. For a small team that hasn't standardized its tooling, that's a weekend lost per incident.

The migration was mostly mechanical: yamux handles multiplexing (QUIC gave it to us for free), tokio-rustls handles the TLS state machine, and our session driver is the same on either transport. 600 net lines of Rust disappeared.

What the code looks like

Here's the whole agent-accept loop, minus logging boilerplate. It's the hot path on the server; every connection from every customer touches it:

async fn run_agent_accept_loop(
    listener: TcpListener,
    tls_config: Arc<ServerConfig>,
    authenticator: AgentAuthenticator,
    registry: Arc<SessionRegistry>,
    mut shutdown_rx: watch::Receiver<bool>,
) -> anyhow::Result<()> {
    let acceptor = TlsAcceptor::from(tls_config);
    loop {
        tokio::select! {
            accept = listener.accept() => {
                let (tcp, peer) = accept?;
                let acceptor = acceptor.clone();
                let registry = Arc::clone(&registry);
                let auth = authenticator.clone();
                tokio::spawn(async move {
                    let tls = acceptor.accept(tcp).await?;
                    session_driver::spawn(tls, peer, auth, registry).await
                });
            }
            _ = shutdown_rx.changed() => {
                if *shutdown_rx.borrow() { break Ok(()); }
            }
        }
    }
}

Note what's not there: no unwrap, no .expect(), no panic!. Every error short-circuits with ?, every spawned task returns a Result. The shutdown channel is checked on every iteration. If the listener fails, we unwind to try_join! in main.rs and every other task winds down. No zombie goroutines.

What surprised us

Three things I expected to be hard were easier than anticipated, and one thing I expected to be easy was genuinely hard.

Easy: async ecosystem. Tokio + axum + sqlx is a production-grade stack. I expected “async in Rust” to still be the rough experience it was two years ago. It isn't. Writing a JSON handler that queries Postgres and returns a response is about the same number of characters as in Go.

Easy: cross-compile. cargo build --target aarch64-apple-darwin from Linux works out of the box with cross or Zig-as-linker. Same for musl, same for Windows. The agent ships to every arch that matters.

Easy: serialization. serde is magic. bincode makes our capability tokens a constant-size wire format. sqlx's query_as::<_, UserModel> with FromRow is the fastest I've ever wired up a Postgres-to-struct mapping.

Hard: lifetimes in HTTP middleware. Extractor + middleware composition in axum does the right thing, but when it doesn't, the error messages are a wall of 'static and Send bounds. I spent two afternoons figuring out why a cloneable AppState wasn't satisfying a trait. Answer: it was, and the error was misleading. This is a real tax.

Honest trade-offs

Rust isn't the right choice for everything. Here's what we're paying:

Build times. Our release build is 4 minutes on a 2 vCPU VM. Go would be ~30 seconds. We use cargo watch for dev and accept the CI cost.
Hiring pool. Go engineers are easier to find than Rust engineers. I'm a solo build so this didn't matter yet; it will.
Ecosystem maturity. The Stripe SDK is more complete in Go. We rolled our own HTTP-direct Stripe client in 300 lines because async-stripe for Rust had a heavier dependency surface than we wanted. Not a big deal, but noting it.
The learning cliff. Three days of fighting the borrow checker before your first feature lands is real, and it's fine for a solo project, and it's a serious onboarding tax for a team.

What's open now

Everything. The repo is on GitHub, dual MIT + Apache-2.0 (the Rust convention), with the agent, server, dashboard, migrations, and design docs all in-tree. If you're curious how any specific decision looks in source, grep for it: we write for people who'll read the code, because from day one, that audience was the whole audience.

If you're building something adjacent — a Rust web service, a Postgres-backed multi-tenant SaaS, a tunneling service — the relevant bits to cherry-pick are:

qnt-core/src/password.rs: argon2id hashing, opaque-token generation, SHA-256 at-rest.
qnt-db/src/repositories/session.rs: rotating refresh tokens with theft detection via a rotated_to chain.
qnt-server/src/api/middleware_jwt.rs: the dual_auth middleware (admin bearer or user JWT, same AuthContext either way).

The build log walks through how all of this came together in four days. And the comparison page tells you whether you should switch from ngrok today.

← All posts

Back to the blog

Honest comparison: ngrok vs 21tunnel

Why we rewrote the tunnel stack in Rust.