Last active
May 27, 2026 15:48
-
Star
(5,000+)
You must be signed in to star a gist -
Fork
(2,372)
You must be signed in to fork a gist
-
-
Save jboner/2841832 to your computer and use it in GitHub Desktop.
Latency Numbers Every Programmer Should Know
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Latency Comparison Numbers (~2012) | |
| ---------------------------------- | |
| L1 cache reference 0.5 ns | |
| Branch mispredict 5 ns | |
| L2 cache reference 7 ns 14x L1 cache | |
| Mutex lock/unlock 25 ns | |
| Main memory reference 100 ns 20x L2 cache, 200x L1 cache | |
| Compress 1K bytes with Zippy 3,000 ns 3 us | |
| Send 1K bytes over 1 Gbps network 10,000 ns 10 us | |
| Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD | |
| Read 1 MB sequentially from memory 250,000 ns 250 us | |
| Round trip within same datacenter 500,000 ns 500 us | |
| Read 1 MB sequentially from SSD* 1,000,000 ns 1,000 us 1 ms ~1GB/sec SSD, 4X memory | |
| Disk seek 10,000,000 ns 10,000 us 10 ms 20x datacenter roundtrip | |
| Local LLM, generate 1 token 15,000,000 ns 15,000 us 15 ms Small model on consumer GPU (2026) | |
| Read 1 MB sequentially from disk 20,000,000 ns 20,000 us 20 ms 80x memory, 20X SSD | |
| Frontier LLM, generate 1 token 20,000,000 ns 20,000 us 20 ms Hosted model output (2026) | |
| Local LLM, time to first token 75,000,000 ns 75,000 us 75 ms Small model, short prompt (2026) | |
| Local LLM (CPU), generate 1 token 100,000,000 ns 100,000 us 100 ms Small model, no GPU (2026) | |
| Send packet CA->Netherlands->CA 150,000,000 ns 150,000 us 150 ms | |
| Fast LLM, time to first token 250,000,000 ns 250,000 us 250 ms Specialized inference hardware (2026) | |
| Frontier LLM, time to first token 1,000 ms Short prompt, no cache (2026) | |
| Frontier LLM, short response 3,000 ms ~100 output tokens (2026) | |
| Frontier LLM, long context prefill 10,000 ms ~100K input tokens, no cache (2026) | |
| Frontier LLM, reasoning response 30,000 ms Single call with thinking (2026) | |
| Notes | |
| ----- | |
| 1 ns = 10^-9 seconds | |
| 1 us = 10^-6 seconds = 1,000 ns | |
| 1 ms = 10^-3 seconds = 1,000 us = 1,000,000 ns | |
| Credit | |
| ------ | |
| By Jeff Dean: http://research.google.com/people/jeff/ | |
| Originally by Peter Norvig: http://norvig.com/21-days.html#answers | |
| Contributions | |
| ------------- | |
| 'Humanized' comparison: https://gist.github.com/hellerbarde/2843375 | |
| Visual comparison chart: http://i.imgur.com/k0t1e.png | |
| Interactive Prezi version: https://prezi.com/pdkvgys-r0y6/latency-numbers-for-programmers-web-development/latency.txt |
Author
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thanks. Updated the gist.