Napkin Math

Napkin Math #

Originally from https://github.com/sirupsen/napkin-math.

The goal of this project is to collect software, numbers, and techniques to quickly estimate the expected performance of systems from first-principles. For example, how quickly can you read 1 GB of memory? By composing these resources you should be able to answer interesting questions like: how much storage cost should you expect to pay for logging for an application with 100,000 RPS?

The best introduction to this skill is through my talk at SRECON.

The best way to practise napkin math in the grand domain of computers is to work on your own problems. The second-best is to subscribe to this newsletter where you’ll get a problem every few weeks to practise on. It should only take you a few minutes to solve each one as your facility with these techniques improve.

The archive of problems to practise with are here. The solution will be in the following newsletter.

Numbers #

Below are numbers that are rounded from runs on a GCP c2-standard-4 (Intel Cascade) and 2017 Macbook (2.8GHz, quad-core).

Note 1: Numbers have been rounded, which means they don’t line up perfectly. Note 2: Some throughput and latency numbers don’t line up (for ease of calculations see exact results e.g. here).

OperationLatencyThroughput1 MiB1 GiB
Sequential Memory R/W (64 bytes)5 ns10 GiB/s100 μs100 ms
Hashing, not crypto-safe (64 bytes)25 ns2 GiB/s500 μs500 ms
Random Memory R/W (64 bytes)50 ns1 GiB/s1 ms1 s
Fast Serialization [8] [9]N/A1 GiB/s1 ms1s
Fast Deserialization [8] [9]N/A1 GiB/s1 ms1s
System Call500 nsN/AN/AN/A
Hashing, crypto-safe (64 bytes)500 ns200 MiB/s10 ms10s
Sequential SSD read (8 KiB)1 μs4 GiB/s200 μs200 ms
Context Switch [1] [2]10 μsN/AN/AN/A
Sequential SSD write, -fsync (8KiB)10 μs1 GiB/s1 ms1 s
TCP Echo Server (32 KiB)10 μs4 GiB/s200 μs200 ms
Sequential SSD write, +fsync (8KiB)1 ms10 MiB/s100 ms2 min
Sorting (64-bit integers)N/A200 MiB/s5 ms5 s
Random SSD Seek (8 KiB)100 μs70 MiB/s15 ms15 s
Compression [3]N/A100 MiB/s10 ms10s
Decompression [3]N/A200 MiB/s5 ms5s
Serialization [8] [9]N/A100 MiB/s10 ms10s
Deserialization [8] [9]N/A100 MiB/s10 ms10s
Proxy: Envoy/ProxySQL/Nginx/HAProxy50 μs???
Network within same region [6]250 μs100 MiB/s10 ms10s
{MySQL, Memcached, Redis, ..} Query500 μs???
Random HDD Seek (8 KiB)10 ms70 MiB/s15 ms15 s
Network between regions [6]Varies25 MiB/s40 ms40s
Network NA East <-> West60 ms25 MiB/s40 ms40s
Network EU West <-> NA East80 ms25 MiB/s40 ms40s
Network NA West <-> Singapore180 ms25 MiB/s40 ms40s
Network EU West <-> Singapore160 ms25 MiB/s40 ms40s

†: “Fast serialization/deserialization” is typically a simple wire-protocol that just dumps bytes, or a very efficient environment. Typically standard serialization such as e.g. JSON will be of the slower kind. We include both here as serialization/deserialization is a very, very broad topic with extremely different performance characteristics depending on data and implementation.

You can run this with RUSTFLAGS='-C target-cpu=native' cargo run --release -- --help. You won’t get the right numbers when you’re compiling in debug mode. You can help this project by adding new suites and filling out the blanks.

I am aware of some inefficiencies in this suite. I intend to improve my skills in this area, in order to ensure the numbers are the upper-bound of performance you may be able to squeeze out in production. I find it highly unlikely any of them will be more than 2-3x off, which shouldn’t be a problem for most users.

Cost Numbers #

Approximate numbers that should be consistent between Cloud providers.

WhatAmount$ / Month$ / Hour
CPU1$10$0.02
Memory1 GB$1
SSD1 GB$0.1
Disk1 GB$0.01
S3, GCS, ..1 GB$0.01
Network1 GB$0.01

Compression Ratios #

This is sourced from a few sources. [3] [4] [5] Note that compression speeds (but generally not ratios) vary by an order of magnitude depending on the algorithm and the level of compression (which trades speed for compression).

I typically ballpark that another x in compression ratio decreases performance by 10x. E.g. we can get a 2x ratio on English Wikipedia at ~200 MiB/s, and 3x at ~20MiB/s, and 4x at 1MB/s.

WhatCompression Ratio
HTML2-3x
English2-4x
Source Code2-4x
Executables2-3x
RPC5-10x

Techniques #

  • Don’t overcomplicate. If you are basing your calculation on more than 6 assumptions, you’re likely making it harder than it should be.
  • Keep the units. They’re good checksumming. Wolframalpha has terrific support if you need a hand in converting e.g. KiB to TiB.
  • Calculate with exponents. A lot of back-of-the-envelope calculations are done with just coefficients and exponents, e.g. c * 10^e. Your goal is to get within an order of magnitude right–that’s just e. c matters a lot less. Only worrying about single-digit coefficients and exponents makes it much easier on a napkin (not to speak of all the zeros you avoid writing).
  • Perform Fermi decomposition. Write down things you can guess at until you can start to hint at an answer. When you want to know the cost of storage for logging, you’re going to want to know how big a log line is, how many of those you have per second, what that costs, and so on.

Resources #