Skip to main content

Updates

Optimizing MDBX Access in Rust

James Prestwich //7 min read

We shipped signet-libmdbx, new Rust bindings for libmdbx. We forked reth-libmdbx and redesigned most of the API. It’s faster and harder to misuse. We encoded MDBX’s transaction and cursor invariants in the type system, preventing entire classes of bugs at compile time, and removing costly runtime checks in hot paths.

Performance

But enough talk, let’s see some numbers. We benchmarked common operations in reth-libmdbx and signet-libmdbx, both in synchronized (multi-threaded) and unsynchronized (single-threaded) modes. Here are the results:

Benchmarkreth-syncsignet-syncsignet-unsyncraw mdbx ptr
put10.868 µs10.916 µs5.9223 µs5.7732 µs
cursor gets (100 entries)908.10 ns818.91 ns630.32 ns613.83 ns
iterator gets (100 entries)927.12 ns807.13 ns666.53 nsn/a

reth-sync is the reth-libmdx crate on main in reth. signet-sync is the code in signet-libmdbx providing equivalent behavior. signet-unsync is the new single-threaded transaction type. raw mdbx ptr is direct FFI access without safety checks.

When running in sync mode, signet-libmdbx has comparable performance to reth-libmdbx in put operations, and significant speedups when traversing a database. This is mostly due to improved cursor and iterator implementations.

Where does overhead come from?

Overhead in database operations comes from two main sources:

  1. Synchronization: reth-libmdbx and the signet-sync version use a Mutex to enforce MDBX’s single-threaded transaction access rules at runtime. This adds overhead to every database operation.
  2. Work in the hot path: cursors and iterators are used heavily in database operations. Any extra work done in these hot paths adds up quickly. This work includes runtime checks, error handling, result conversions, and branching. When you’re shaving nanoseconds off of each operation, even a single branch can have double-digit percentage impact on performance.
  3. Memory allocations: frequent allocations and deallocations in hot paths can lead to cache misses and increased latency.

MDBX Invariants

MDBX has several invariants that must be upheld to ensure safe and correct operation. Violating these invariants can lead to undefined behavior, data corruption, crashes, and memory leaks.

  1. Transaction access ordering: All operations on a transaction must be totally ordered and non-concurrent.
  2. Thread affinity: Read-write transactions can only be committed or aborted from the thread that created them.
  3. Cursor lifetime: Stale cursors must be reaped within their parent transaction’s.
  4. Cursor cleanup: Cursors must be properly closed to avoid resource leaks.
  5. Zero-copy reads: Data read as borrowed references must not outlive the transaction that created them.
Invariant Enforcement

Invariant           Risk            TxSync         TxUnsync
--------------------------------------------------------------
Tx ordering        UB/assert       Arc+Mutex         !Sync
Thread affinity    UB/assert       Arc+Mutex+mgr     !Send
Cursor lifetime    Use-after-free  'tx lifetime      'tx lifetime
Cursor cleanup     Leaks           Drop impl         Drop impl
Zero-copy reads    UB              TableObject<'tx>  TableObject<'tx>

UB = Undefined Behavior

To model this, we split the API into two transaction types: TxSync and TxUnsync. TxSync uses a mutex to enforce transaction access rules at runtime, similar to reth-libmdbx. TxUnsync uses Rust’s type system to enforce access rules at compile time, eliminating synchronization overhead in single-threaded workloads.

Type Hierarchy

                Transaction
                     |
       +-------------+-------------+
       |                           |
    TxSync                    TxUnsync
    Arc + Mutex               Single-threaded
       |                           |
  +----+----+                 +----+----+
  |         |                 |         |
 RO         RW               RO         RW
Shared    +Manager          Send      !Send
                            !Sync     !Sync

Single-Threaded Transactions: TxUnsync

reth-libmdbx enforces these requirements at runtime using a Mutex. Every database operation acquires and releases the lock, ensuring safe access even when the transaction handle is shared across threads. This is correct and safe, but the synchronization overhead adds up in hot paths. 100 reads == 100 locks.

signet-libmdbx introduces TxUnsync, an unsynchronized transaction type that enforces MDBX’s requirements at compile time instead of runtime. Transactions CANNOT be shared or accessed concurrently from multiple threads. This ensures that all operations on a TxUnsync are totally ordered and non-concurrent by construction. If there is no situation where concurrent access is possible, there is no need for synchronization.

For TxUnsync<RW>, the compiler guarantees that only one thread can ever access the transaction. It is !Send and !Sync, so it cannot be shared or moved between threads.

Ensuring that TxUnsync uses !Sync to enforce access rules means that there is zero runtime overhead. No locks, no atomic operations. Just plain old function calls. This is how we get such significant speedups in single-threaded workloads.

rust
use signet_libmdbx::{
    Environment, DatabaseFlags, WriteFlags, Geometry,
    TxUnsync, RW, RO,
};
use std::path::Path;

// Open environment
let env = Environment::builder()
    .set_geometry(Geometry {
        size: Some(0..(1024 * 1024 * 1024)),
        ..Default::default()
    })
    .open(Path::new("/tmp/my_db"))?;

// Write with TxUnsync<RW>
// The compiler enforces single-threaded access via self
let txn = TxUnsync::<RW>::new(env.clone())?;
let db = txn.create_db(None, DatabaseFlags::empty())?;
txn.put(db, b"hello", b"world", WriteFlags::empty())?;
txn.commit()?;

// Read with TxUnsync<RO>
// Can be moved between threads, but not shared concurrently
let txn = TxUnsync::<RO>::new(env)?;
let db = txn.open_db(None)?;
let value: Option<Vec<u8>> = txn.get(db.dbi(), b"hello")?;

For cases where you need to share a transaction across threads (e.g., serving concurrent RPC requests from a single read snapshot), signet-libmdbx also provides TxSync, which uses the traditional mutex-based approach.

Fearless borrowing via Lifetimes

A common source of bugs in database code is using a cursor or borrowed data after the transaction that created it has been closed. In C, this is a use-after-free. In some Rust bindings, it’s a runtime error or undefined behavior.

signet-libmdbx prevents this class of bugs at compile time. Cursors carry a lifetime parameter 'tx that ties them to their transaction. The compiler rejects any code that would use a cursor after its transaction is dropped:

rust
// This compiles - cursor lifetime is tied to transaction:
let txn = env.begin_ro_txn()?;
let db = txn.open_db(None)?;
let mut cursor = txn.cursor(db)?;

// `first` borrows from `txn` - the compiler ensures it cannot outlive the txn.
let first: Option<(Cow<[u8]>, Cow<[u8]>)> = cursor.first()?;

// This would NOT compile - cursor cannot outlive transaction:
// let cursor = {
//     let txn = env.begin_ro_txn()?;
//     let db = txn.open_db(None)?;
//     txn.cursor(db)?  // Error: `txn` dropped at end of block
// };
// cursor.first()?;  // Would be use-after-free - rejected by compiler

The same lifetime tracking applies to zero-copy reads. When you read a value as Cow<[u8]>, the borrowed variant points directly into the memory-mapped database pages. The lifetime system ensures this borrowed data cannot escape the transaction scope.

Adding a lifetime parameter to TableObject<'a> ensures that the resulting references cannot outlive the transaction. We can then extend this to support copying deserialization as well via the TableObjectOwned trait, using HRTBs.

API Consistency and Ergonomics

We made a few small improvements to API consistency:

  • Iterator behavior: iter() and iter_dup() now have consistent starting behavior. Both check if the cursor is positioned and reposition to the first entry if needed.

  • Custom deserialization: The TableObject trait allows zero-copy deserialization of custom types directly from database pages. The related ReadError type captures both MDBX errors and codec-specific errors, making it easy to distinguish “key not found” from “data was corrupted.”

  • Documentation: We added comprehensive rustdoc with examples throughout.

Why It Matters

The philosophy here is simple: if the type system can prevent a bug, it should. MDBX has a lot of subtle invariants that are easy to violate. We encoded them in the types so you don’t have to remember them.

signet-libmdbx is faster, safer, and more ergonomic. It has better support for MDBX features like zero-copy reads, DUPFIXED, and INTEGERKEY. It helps you debug your application.

Code’s on GitHub. PRs welcome.

What’s Next

signet-libmdbx is the first release of a larger project. Signet’s storage requirements are similar to other EVM chains: hot-path access to current state and cold-path access to historical blocks and transactions. We’ve been working on a modular storage backend that builds on these bindings to provide strong consistency guarantees enforced by the type system. More on that soon.

ESC

Start typing to search documentation...

Navigate Select ⌘K Open