Of all the NewSQL databases, CockroachDB is currently one of the most impressive and formidable. It is inherently a distributed database system that is highly suitable for cloud deployments, but can also be used on-premise or in containers.
It is possible to bring CockroachDB close to the users, distributed across regions, data centers, and different cloud providers. Data can dynamically move to the location where it is needed. CockroachDB automatically scales, balances, and repairs itself. Manual sharding is not required, and ACID transactions are possible.
How does CockroachDB ensure that consistency?
CockroachDB guarantees the “serializable” isolation level, the highest defined in the SQL standard. It achieves this by combining the Raft consensus algorithm for writes with time-based synchronization algorithms for reads.
Stored data has a version within MVCC, so read actions are limited to the version at the moment the read transaction began. The write actions use the Raft consensus algorithm, which means that a majority of nodes together determine if an update has been successful. Updates (write actions) must reach a majority of the nodes (by default 2 out of 3) before they are considered successful.
To ensure that a write action does not interfere with a read action that starts afterward, CockroachDB uses a “timestamp cache” that remembers when data was last read by ongoing transactions.
This ensures that users (clients) will always experience serializable consistency with respect to other concurrent transactions.
But how is CockroachDB both highly available and strongly consistent?
The CAP theorem states that it is impossible for distributed systems to simultaneously have more than 2 of the following properties:
consistency
availability
partition tolerance
CockroachDB is a CP (consistent and partition tolerant) system. This means that if partitions are present, the system becomes unavailable rather than allowing inconsistent results to occur. For example, write actions require confirmation from a majority of the nodes, and read actions require a lease, which can only come from another node if write actions are possible.
Additionally, CockroachDB is highly available, although “available” in this context means something different from how it is used in the CAP theorem. In the CAP theorem, it is a binary property, but with High Availability, we are talking about a spectrum (such as the term “five nines” for a system that is available 99.999% of the time).
Being CP and having HA as properties means that if a majority of the nodes can communicate with each other, they will continue to function. For example, if you deploy CockroachDB in 3 data centers and the network to one of these data centers becomes unavailable, the other 2 data centers will continue to operate with only a few seconds of interruption. CockroachDB achieves this by quickly and efficiently detecting partitions and failures, then transferring leadership to nodes that can communicate with the majority, and redirecting internal traffic away from the partitioned nodes.
Why is CockroachDB compatible with PostgreSQL?
The creators of CockroachDB wanted to focus on what is important: developing a simple, strongly distributed, consistent database. This meant not spending too much time on things like developing a network protocol.
Initially, CockroachDB considered being compatible with MySQL. However, several factors ultimately led to the choice of PostgreSQL.
From the beginning, it was clear that the documentation for PostgreSQL’s network protocol was much clearer and more comprehensive. Additionally, it was much more supportive of third-party implementations than MySQL.
The PostgreSQL license is compatible with CockroachDB’s own Apache license, which means that (a part of) the source code did not need to be modified. In contrast, MySQL (and MariaDB) fall under the GNU GPL license, which excludes direct use of their source code for CockroachDB.
What does the future hold for PostgreSQL compatibility?
In early 2016, it became clear that client drivers are not the only factor in making a product successful. Development frameworks must also enable the rapid adoption of a new product. This means that not only the network protocol of PostgreSQL must be maintained, but the entire SQL layer as well, or investing in the development of extensions on the existing PostgreSQL framework.
CockroachDB realizes this and is, in fact, investing in both directions.
The priority now is to increase compatibility with PostgreSQL semantics out-of-the-box, such as adding more and more of PostgreSQL’s built-in functions and operators, and providing compatible data via the information_schema and pg_catalog meta-information tables. This will be the focus in upcoming versions of CockroachDB.
There will always be some existing features of CockroachDB that will remain unchanged and never become fully compatible because they are fundamentally different in terms of database architecture. For example, PostgreSQL’s “FOR Locking clauses” cannot be implemented in CockroachDB because the principles of concurrency control in CockroachDB are so different. When existing client frameworks require such PostgreSQL features, CockroachDB will invest in developing a specific version tailored for them.
Want to learn more?
OptimaData BV closely monitors the developments of NewSQL databases to assist its clients in making future-proof choices between database platforms. Feel free to contact us; we’re here to brainstorm with you!