A good day to trie-hard: saving compute 1% at a time

The Nugget

  • Optimizing small functions in high-traffic services can lead to significant CPU savings. By using a new Rust crate called trie-hard, Cloudflare successfully reduced CPU utilization by over 1% in its pingora-origin service.

Make it stick

  • 🚀 1.71% of CPU time was initially consumed by the function for clearing internal headers.
  • 🔧 The original function ran in 3.65µs; refined to 0.93µs with trie-hard.
  • 📈 1.28% overall reduction in CPU utilization achieved with new optimizations.
  • 🔍 Real-world metrics validated performance improvements through stack trace sampling.

Key insights

Background and Motivation

  • Cloudflare's global network handles over 60 million HTTP requests per second.
  • The pingora-origin service is critical for processing non-cached requests, contributing significantly to CPU time.

Optimization Process

  1. Initial Performance Benchmarking:

    • Original function clear_internal_headers consumed 1.7% of CPU time.
    • Average runtime was 3.65µs for the function.
  2. First Improvement - Reducing Reads:

    • Inverted header lookup reduced reads; improved runtime to 1.53µs.
    • Resulted in a calculated CPU reduction of 0.993%.
  3. Searching Data Structures:

    • Explored various data structures (HashMap, BTreeSet, regex) to find a better solution.
    • The trie data structure was identified as promising despite initial implementations being slower.

Development of trie-hard

  • Cloudflare developed trie-hard, a trie build optimized for their needs.
  • This implementation led to a runtime of less than 0.93µs, achieving a substantial performance gain.
  • Real production data confirmed the expected CPU savings, leading to an actual utilization of 0.34%.

The Importance of Profiling

  • The key takeaway is that measuring and understanding performance bottlenecks is vital for effective optimization.
  • Utilizing profiling tools can help identify and focus on the most impactful areas for improvement.

Key quotes

  • "Knowing where your code is slow and by how much is more important than how you go about optimizing it."
  • "Optimizing operations that are already measured in microseconds may seem a little silly, but these small improvements add up."
  • "The structure of the trie lends itself to quickly identifying strings that are not contained."
  • "Trie-hard…gets its speed from storing node relationships in the bits of unsigned integers."
  • "A trie…is a type of tree data structure normally used for prefix searches."
This summary contains AI-generated information and may have important inaccuracies or omissions.