Cache Awareness and Performance
Modern CPUs have L1/L2/L3 caches with cache lines (typically 64 bytes). Sequential memory access is fast (prefetcher predicts the pattern); random access causes cache misses (~100x slower). Implications for C: arrays of structs vs structs of arrays, struct field ordering, avoiding pointer chasing in hot loops. False sharing: two threads writing to different fields in the same cache line cause the line to bounce between cores. perf stat and perf record are the primary Linux profiling tools. -O2 is the standard optimization level; -O3 enables vectorization.