Designing Signet Storage

In our previous post, we introduced signet-libmdbx, our Rust bindings for libmdbx that encode database invariants in the type system. That work is one piece of a larger puzzle: signet-storage, a modular storage backend for Signet nodes.

Why Build Our Own?

Signet’s storage requirements are similar to other EVM chains. We need hot-path access to current state, as well as cold-path access to historical blocks and transactions. State (account balances, contract storage) is frequently read and written in the hot path of the consensus system. Blocks and transactions are written once and read infrequently, mostly for historical queries to serve RPC requests.

Background: reth’s storage evolution

Up to now, we have used reth-db as a library to provide storage for Signet nodes. We made minimal modifications to remove unneeded tables and add Signet-specific indexes. reth-db is actively maintained and relied on by many Ethereum nodes. It provides a solid foundation for EVM chain storage.

As Signet has grown, we’ve shipped a number of applications that rely on access to chain data, including signet (the Signet full node) and builder (a simple Signet block builder). Many of these applications rely on fast access to chain data to simulate EVM transactions and bundles, provide Signet’s JSON-RPC API, and more.

Overall, we have been quite happy with reth-db as a storage solution. It does what it says on the tin, and has been stable and reliable. However, as Signet has grown, our needs have diverged. We want to use storage components as standalone libraries, while reth has (reasonably) optimized for tight integration as a complete node implementation.

As part of reth’s ongoing optimization efforts, the reth team has been migrating their database from a single MDBX instance to a combination of an MDBX instance and an on-disk file storage called NippyJar. Recently, they added a rocksdb instance to their ProviderFactory as part of their evolving storage architecture.

This migration from MDBX to multiple backends aims to improve reth performance by separating hot-path data from cold-path data. It has also helped us identify design requirements for future Signet storage backends.

Design Requirements

1. Abstract Hot/Cold Storage Model

Data should be stored in an explicit hot/cold model. Hot data (the EVM state) should be stored in a fast key-value store (like mdbx), while cold data (blocks, transactions, receipts) should be stored in more cost-effective storage (like NippyJar) optimized for large, infrequent reads.

┌─────────────────────────────────┐
Hot Path     │  mdbx — State, Accounts, Storage  │
  (sync)        │  Transactional reads/writes      │
                 └─────────────────────────────────┘
                 ┌─────────────────────────────────┐
Cold Path    │  NippyJar / Postgres — Blocks, Txs │
  (async)       │  Append-only, infrequent reads   │
                 └─────────────────────────────────┘

By separating these two data paths, we can optimize each storage backend for its specific access patterns. A transaction simulator may require ONLY hot-path access to state data, while a block explorer may require ONLY cold-path access to historical blocks and transactions.

These backends should be abstracted behind a unified interface, allowing users to interact with the storage system without needing to know the details of each backend. This abstraction should allow for easy swapping of storage backends in the future, without requiring changes to the rest of the codebase. Nodes should be able to use postgres for cold storage while using mdbx for hot storage, without any changes to the consensus code.

Hot-path data should be accessed synchronously, while cold-path data should be read and written asynchronously. Cold-path data should NEVER block the consensus system. This is a key design principle that should be enforced by the storage backend in the type system.

2. Storage Consistency with Staged Changes

Writes to mdbx are transactional, while writes to NippyJar are filesystem operations. When a write has been staged in a storage provider but not yet committed to disk, mdbx provides transactional read access to the staged data, while NippyJar does not. This can lead to inconsistencies when reading data that has been written but not yet committed.

Consistency Pitfall

If you provider.append_block() and then provider.block_by_number(latest), you will NOT get the block you just appended, even though it is visible in state reads.

For example, when appending a block and then reading it back, the state data will be available via the mdbx transaction. However, the transactions and headers will not be available in NippyJar until the write is fully committed to disk. This is especially noticeable when calling state access methods that return the latest block data. The ProviderFactory uses NippyJar for block data, so its view of “latest block” does NOT include the staged-but-uncommitted block.

Reth, being tightly integrated with its storage layer, is able to work around this limitation by ensuring that reads are NOT performed within the same commit-environment as writes. However, when using reth as a library, it is up to the user to ensure that reads are not performed on staged-but-uncommitted data. This seems like a great opportunity to use the Rust type system to enforce correct usage patterns and avoid inconsistencies.

Design requirement: When using the storage backend, it should be impossible to have inconsistent reads backed by multiple inconsistent data sources. This should be enforced by the type system, so that incorrect usage patterns are caught at compile time.

3. Encapsulation of the Storage Backend

Writes to mdbx are atomic, writes to NippyJar are atomic. However, they are not mutually atomic. Interrupted commits may require a recovery process to ensure consistency between the two storage backends. Reth runs this recovery process on node bootup. Running this process is mandatory before any read operations to ensure data integrity. Failing to do so may lead to reading inconsistent or corrupted data.

This occurs because if the node crashes or is interrupted during a write, it may have written data to one backend but not the other. On startup, reth checks for such inconsistencies and attempts to reconcile them.

The storage backend should be a self-contained component that can be used independently of any specific node implementation. This means that the storage backend should not rely on any external components or services to function correctly. This will make it easier to use the storage backend in different contexts, such as in a standalone application or in a different node implementation.

Design requirement: The storage backend should handle its own consistency checks and recovery processes internally. It should not rely on any external components or managers to ensure data integrity. This should be enforced by the type system, so that incorrect usage patterns are caught at compile time. It should be impossible to perform read operations without first ensuring that the storage backend is in a consistent state.

What’s Next: signet-storage

We’re prototyping signet-storage now. The goal: modular backends, strong consistency guarantees in the type system, and flexibility to swap storage implementations without touching consensus code.

signet-storage will provide a hot/cold storage model and a selection of backends for each. It will provide strong consistency guarantees, enforced by the type system, to prevent inconsistent reads and ensure data integrity.

We plan to use signet-storage as the primary storage backend for Signet nodes and the Signet block builder. We also plan to expose it as a library for other Signet applications that require access to chain data, such as signet-rpc.

Our plans include:

Node type	What it does
Fast-sync	State-diff based catch-up without processing historical blocks
Cold RPC	Serves only historical queries (blocks, transactions)
Hot RPC	Serves only transaction simulation (current state)
Remote-backed full node	Block and tx stores backed by remote relational databases

The philosophy remains simple: if the type system can prevent a bug, it should. We’re encoding storage invariants in the types so you don’t have to remember them.