How Apple built iCloud to store billions of databases

One-liner

Apple integrated Cassandra and FoundationDB into iCloud's architecture via a sophisticated multi-tenant approach to manage billions of databases, ensuring scalability, latency concealment, and reliable concurrent transactions.

Summary

Architectural Choices and Lessons

Apple designed iCloud using both asynchronous processing and stateless architecture, paralleling some techniques used by Meta. This strategy aimed to provide smooth functionality and intense scalability. Logical resource isolation enhances reliability, while handling diverse data needs with a single system simplifies operations. Architectural abstractions at various layers improve the developer experience, underscoring the importance of understanding user needs when designing technology.

Use of Cassandra in iCloud

Cassandra is a foundational component of iCloud, with Apple hosting one of the world's most extensive Cassandra frameworks. With hundreds of thousands of nodes, petabytes of data, and millions of queries per second, Cassandra contributes significantly to iCloud's data management capabilities. Nonetheless, scalability issues within Cassandra prompted Apple to adopt FoundationDB to address certain limitations like single-zone operation constraints and partition size restrictions.

FoundationDB and the Record Layer

Acquired by Apple in 2015, FoundationDB is an openly shared, distributed database system that underpins iCloud's CloudKit service. The FoundationDB Record Layer provides a structured storage solution that supports Apple's needs for a cross-application, multi-tenant architecture housing billions of databases with thousands of shared schemas. This layer operates statelessly for effortless scaling and uses record store abstraction to manage resource allocation.

CloudKit's Inner Workings

CloudKit represents applications within logical containers, organizing data in 'zones' for efficient data synchronization and usage. CloudKit is dynamically scaled and interacts with the FoundationDB Record Layer's record stores on behalf of the user. Through the integration of FoundationDB, CloudKit overcomes challenges of high-concurrency operations, latency in queries, personalized search, and conflicting transactions.

Key Quotes

  1. "Apple really does store billions of databases in their extreme multi-tenant architecture."

  2. "Both [Apple and Meta] use asynchronous processing smartly in order to make user functionalities smoother."

  3. "[The Record Layer is] engineered to handle multi-tenancy at such a large scale thanks to two fundamental architectural decisions."

  4. "CloudKit converts the defined application schema into a metadata definition within the Record Layer, which is stored in a separate metadata store."

  5. "FoundationDB helps solve personalized full-text search...without any extra overhead."

Make it stick

  1. Asynchronous Is Key: Just like how gears in a clockwork asynchronously turn to keep time, Apple's asynchronous processing helps CloudKit smooth out all operations, hiding any sluggishness from the user.
  2. Cassandra's Capacity: Imagine a library vast enough to contain all the world's books—that's analogous to Apple's Cassandra deployment, with storage capabilities reaching exabytes and handling millions of requests every second.
  3. The Flexibility of FoundationDB: Think of FoundationDB as the Swiss Army knife for iCloud's database needs, solving complex problems from high-concurrency handling to personalized search capabilities with finesse.
This summary contains AI-generated information and may have important inaccuracies or omissions.