Rust Meets Serverless, Part 3: Anatomy of a Cold Start

In Part 2 I measured the cold start of a minimal Rust Lambda at 18 to 22 ms. This post breaks down what that init window is actually paying for, and which parts you can move.

The short version: out of the four phases that make up a cold start, only two are under your control. The rest is AWS infrastructure that you inherit whether you like it or not.

The Four Phases

Lambda init phase breakdown for Rust

Lambda’s Init Duration field in the REPORT log covers four phases that run in sequence before your handler is allowed to answer a request.

Container provisioning
Runtime initialization
Function code loading
Handler initialization (your main before the first invocation)

Two of these (1 and 2) are AWS’s problem. Two of them (3 and 4) are yours.

Phase 1: Container Provisioning

Lambda runs each function invocation inside its own Firecracker microVM. Firecracker is a lightweight virtual machine monitor built by AWS specifically for serverless workloads. It boots a microVM in around 125 ms, uses under 5 MiB of memory overhead per VM, and lets a single host pack thousands of them.

A quick note on the numbers: Firecracker’s 125 ms boot is not part of the 18 to 22 ms Init Duration we measured in Part 2. AWS keeps a warm pool of pre-booted microVMs and allocates one on demand, so the VM is already running by the time the Init Duration clock starts. If the boot were in your window, cold start could not come in under 125 ms. This is not publicly documented in precise terms, but the math only works if the pool exists.

What you control here: nothing. AWS picks the hardware, schedules the microVM, and pins it to your function. The only tangential lever is memory size, which also affects vCPU allocation and therefore the speed of every later phase. More memory means faster CPU means the rest of the cold start runs quicker, at a higher per-ms price.

Phase 2: Runtime Initialization

Rust has no managed runtime on Lambda. It reached general availability in November 2025, but the execution model is still “bring your own binary”. You deploy against the OS-only runtime family, provided.al2023 at time of writing, which is Amazon Linux 2023 with no language runtime preinstalled.

Inside your deployment package, a binary named bootstrap plays the role of the runtime. It implements the Lambda Runtime Interface Client loop: poll the runtime API for an invocation, hand it to your handler, post the response back, repeat. cargo lambda wires this up for you through the lambda_http crate.

What you control here: the runtime choice itself (provided.al2023 vs older provided.al2) and the fact that your bootstrap is a single statically compiled binary instead of a script plus an interpreter. Both of these already favor you. There is not much more to squeeze.

Phase 3: Function Code Loading

In this phase Lambda fetches your ZIP from an internal S3 bucket, unpacks it, and maps the bootstrap binary into the container’s filesystem.

Binary size matters here, on two axes:

Transfer cost. Smaller ZIP means a shorter pull from the internal bucket.
Page-in cost. The OS has to fault pages of the binary into memory the first time they are touched.

The release profile from Part 2 is exactly the lever for this phase:

[profile.release]
opt-level = 3
lto = "fat"
codegen-units = 1
strip = true

Link-time optimization and a single codegen unit let the compiler see the whole program and drop everything that is not reachable. strip = true removes debug symbols, which are dead weight at runtime.

With those flags, the Axum health handler from Part 2 produces a bootstrap binary of about 1.7 MB uncompressed. Zipped for upload it drops to 885.7 kB. That is the number Lambda charges S3 storage on, and the size of the payload it pulls during a cold start.

For reference, an equivalent Node.js function with node_modules can easily push tens of MB. A JVM artifact is larger still. Rust gets away with well under a MB because there is no runtime or interpreter to ship.

What you control here: the release profile, and how many dependencies you actually need. Every crate with heavy generics or large macro output pushes the binary size up. Watch cargo bloat output when the binary starts growing.

Phase 4: Handler Initialization

Everything inside your main function before run(...) is awaited counts as init. This is where your secrets get fetched, your database pool opens, your HTTP clients spin up, your config is parsed.

It is also where you have the largest concrete wins. A careless main that warms up a connection you use in 10% of invocations pays for that work on every cold start. Some practical rules:

Open persistent clients at init, not per request. Anything cacheable between invocations (database pools, signed clients, tracing setup) belongs in init. A few ms added to cold start pays off on every warm invocation that follows.
Lazy-initialize what is not on the critical path for every request. Clients that are only used by some endpoints can be built on first use instead of at startup. You still amortize across warm invocations, but you do not pay the cost on cold starts that never touch that code path.

For the Part 2 health handler, this phase is near zero because there is nothing to initialize beyond Axum’s router. A service that opens a connection pool and loads secrets from Secrets Manager can easily add hundreds of ms here.

Billing: You Pay For All Of It

AWS bills the init phase as part of your function’s duration. This has always been the case for custom runtimes, which is what Rust uses. It was not the case for managed-runtime ZIP packages until August 1, 2025, when AWS standardized init-phase billing across all packaging formats.

For Rust specifically this changed nothing, init has always been billed, but it is worth knowing when comparing cold start economics across languages. Every ms you trim off phases 3 and 4 is money saved on top of latency saved.

What You Cannot Reach: SnapStart

SnapStart is AWS’s answer to the cold start tax for JVM-like runtimes. It takes a snapshot of an initialized execution environment and resumes new invocations from that snapshot instead of cold-booting every time.

SnapStart is currently limited to Java, Python, and .NET managed runtimes. It does not support custom runtimes, which means Rust on Lambda cannot use it today. For most Rust workloads this is fine, because a native binary was never paying the hundreds of ms of class loading and JIT warmup that SnapStart was designed to paper over. The lever that matters at the Rust scale is provisioned concurrency, which I will cover in the next post.

Summary: Levers, Ranked

From most to least impactful:

Your main before run(...) (phase 4). Easiest place to accidentally add hundreds of ms. Also the easiest to fix.
Binary size (phase 3). Release profile, dependency hygiene, cargo bloat. Worth auditing once, revisiting rarely.
Memory allocation (phase 1, indirectly). More memory, faster vCPU, shorter everything else. Lambda bills on GB-seconds, so raising the memory setting raises the per-ms price in exact proportion. Shorter duration at a higher rate can still come out cheaper, but it can also go the other way. Measure before turning the dial.
Runtime choice (phase 2). Keep provided.al2023, keep the OS-only runtime. Nothing to tune beyond that.

Cold start on Rust Lambda is not a single number to optimize. It is a pipeline where two stages are inherited and two are yours. Ship a small binary and keep your init lean and you are most of the way there.

The Four Phases#

Phase 1: Container Provisioning#

Phase 2: Runtime Initialization#

Phase 3: Function Code Loading#

Phase 4: Handler Initialization#

Billing: You Pay For All Of It#

What You Cannot Reach: SnapStart#

Summary: Levers, Ranked#

Links#