Cache Awareness and Performance

Modern CPUs have L1/L2/L3 caches with cache lines (typically 64 bytes). Sequential memory access is fast (prefetcher predicts the pattern); random access causes cache misses (~100x slower). Implications for C: arrays of structs vs structs of arrays, struct field ordering, avoiding pointer chasing in hot loops. False sharing: two threads writing to different fields in the same cache line cause the line to bounce between cores. perf stat and perf record are the primary Linux profiling tools. -O2 is the standard optimization level; -O3 enables vectorization.

Appears In

m10-going-below-c

C Systems Lab Wiki

Explorer

Cache Awareness and Performance

Cache Awareness and Performance

Appears In

Graph View

Table of Contents

Backlinks