Bitcask | Notion

Bitcask is a Key-Value database that uses a few approaches to make an highly performant, efficient with high write-read throughput. Backing up is also very easy as it’s basically just copying the directory. Crash recovery is also a joy (explained in Booting ).

A limitation to bitcask is the storage of the KeyDir in memory. This becomes reliant on the RAM of the machine running the instance.

From my perspective to bitcask and LevelDB, the main issue to bitcask is the RAM requirement. Holding the KeyDir in memory is expensive on RAM. Approaches to mitigate this will be discussed ahead.

An instance is a bitcask directory of multiple data files that are closed and immutable except one active file.

Each file has the following entries

CRC
Time_stamp
Key_size
Value_size
Key
Value

The data file holds the translations of each database query or command. Also, it has a threshold of size which after exceeding, the file closes as immutable and a new one is opened.

There’s also the KeyDir which is the in-memory store of the key and a pointer to the value location in the data file. This facilitates data lookup. This also contains a few other metadata.

In my case, the KeyDir also has to be persisted to avoid having to re-read this KeyDir to memory every time the instance restarts.

Operations

Set

SET is an operation to include a new entry in the store. The flow is

Receive the SET command with valid commands.
An atomic transaction with the following:
1. Append the entry into the data file.
2. Create an entry in the KeyDir.

Note, the data file is opened in an append only mode which makes for an effectively high write throughput. Throughput can be seen as a metric for data stores to note the rate at which command is issued and action is effected. Throughput can vary at amount of stress on the store.