Skip to content

Instantly share code, notes, and snippets.

@jboner
Last active May 27, 2026 15:48
Show Gist options
  • Select an option

  • Save jboner/2841832 to your computer and use it in GitHub Desktop.

Select an option

Save jboner/2841832 to your computer and use it in GitHub Desktop.
Latency Numbers Every Programmer Should Know
Latency Comparison Numbers (~2012)
----------------------------------
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy 3,000 ns 3 us
Send 1K bytes over 1 Gbps network 10,000 ns 10 us
Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD
Read 1 MB sequentially from memory 250,000 ns 250 us
Round trip within same datacenter 500,000 ns 500 us
Read 1 MB sequentially from SSD* 1,000,000 ns 1,000 us 1 ms ~1GB/sec SSD, 4X memory
Disk seek 10,000,000 ns 10,000 us 10 ms 20x datacenter roundtrip
Local LLM, generate 1 token 15,000,000 ns 15,000 us 15 ms Small model on consumer GPU (2026)
Read 1 MB sequentially from disk 20,000,000 ns 20,000 us 20 ms 80x memory, 20X SSD
Frontier LLM, generate 1 token 20,000,000 ns 20,000 us 20 ms Hosted model output (2026)
Local LLM, time to first token 75,000,000 ns 75,000 us 75 ms Small model, short prompt (2026)
Local LLM (CPU), generate 1 token 100,000,000 ns 100,000 us 100 ms Small model, no GPU (2026)
Send packet CA->Netherlands->CA 150,000,000 ns 150,000 us 150 ms
Fast LLM, time to first token 250,000,000 ns 250,000 us 250 ms Specialized inference hardware (2026)
Frontier LLM, time to first token 1,000 ms Short prompt, no cache (2026)
Frontier LLM, short response 3,000 ms ~100 output tokens (2026)
Frontier LLM, long context prefill 10,000 ms ~100K input tokens, no cache (2026)
Frontier LLM, reasoning response 30,000 ms Single call with thinking (2026)
Notes
-----
1 ns = 10^-9 seconds
1 us = 10^-6 seconds = 1,000 ns
1 ms = 10^-3 seconds = 1,000 us = 1,000,000 ns
Credit
------
By Jeff Dean: http://research.google.com/people/jeff/
Originally by Peter Norvig: http://norvig.com/21-days.html#answers
Contributions
-------------
'Humanized' comparison: https://gist.github.com/hellerbarde/2843375
Visual comparison chart: http://i.imgur.com/k0t1e.png
Interactive Prezi version: https://prezi.com/pdkvgys-r0y6/latency-numbers-for-programmers-web-development/latency.txt
@fujohnwang
Copy link
Copy Markdown

以讹传讹,有个数据明显不对..

@jdnichollsc
Copy link
Copy Markdown

Wondering which chips and OS are you using here to calculate that latency

@Hawzen
Copy link
Copy Markdown

Hawzen commented Oct 1, 2025

Send 1K bytes over 1 Gbps network

1Gbps is a measure of bandwidth and is irrelevant to latency

@tris
Copy link
Copy Markdown

tris commented Feb 23, 2026

@Hawzen

1Gbps is a measure of bandwidth and is irrelevant to latency

1Gbps = 1,000,000,000 bits/sec
1KB = 1024 bytes = 1024 x 8 = 8192 bits
8192 bits / 1,000,000,000 bps = 0.00000819 seconds (aka 8.192 µs, or 8192 ns)

That's before TCP, IPv4, Ethernet headers and other framing on the wire -- not quite 10,000 ns but close enough.

Stuart Cheshire's Latency and the Quest for Interactivity paper is still excellent 30 years later. It digs into serialization delays a fair bit (albeit with much smaller numbers). Well worth a read.

@andyrosa2
Copy link
Copy Markdown

@jboner
Copy link
Copy Markdown
Author

jboner commented May 27, 2026

Thanks. Updated the gist.

Added LLM inference latencies in a fork.

https://gist.github.com/andyrosa2/e654892dbc023475a904944a5a70ccd2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment