Our modular, high-performance Merkle Tree library for Rust

119 pointsposted 11 hours ago
by bibiver

26 Comments

bibiver

11 hours ago

We've just released rs-merkle-tree, a Merkle tree crate designed with performance and modularity in mind. It comes with the following key features:

* Fixed depth: All proofs have a constant size equal to the depth of the tree. The depth can be configured via a const generic.

* Append-only: Leaves are added sequentially starting from index 0. Once added, a leaf cannot be modified.

* Optimized for Merkle proof retrieval: Intermediate nodes are stored so that proofs can be fetched directly from storage without recomputation, resulting in very fast retrieval times.

* Configurable storage and hash functions: Currently supports Keccak and Poseidon hashers, and in-memory, Sled, RocksDB, and SQLite stores.

The Rust ecosystem already offers several Merkle tree implementations, but rs-merkle-tree is built for a specific use case: append-only data structures such as blockchains, distributed ledgers, audit logs, or certificate transparency logs. It’s particularly optimized for proof retrieval, storing intermediate nodes in a configurable and extensible storage backend so they don’t need to be recomputed when requested.

xpe

9 hours ago

For those in the know... Do you have any recommended reading for Merkle-curious people that may have some scars from blockchain-related hype? It would be nice to find a nice introductory resource that covers some interesting use-cases that may not be well-known. Perhaps with nice graphics, interactivity, etc.?

For example, here is a slick web site that shows how certificate transparency uses Merkle trees: https://certificate.transparency.dev/howctworks/

sunshowers

7 hours ago

The joke is there are two kinds of distributed systems: Spanner and Bitcoin.

raphinou

5 hours ago

I'm working on a project where I need to prove that a file in a git repo is append only, ie all changes to the file only added lines. The only way I can think of is looking at the git history of the file, but would there be a faster way using Merkel trees somehow?

jmount

3 hours ago

Is this trying to deal with that git history can be replaced?

6r17

9 hours ago

There are use-cases outside of "crypto" for blockchain - notably for security related use-cases such as historization ; tough I concede that use-case can be handled with more standard technologies it is very fun to get into it and study such use as it may have properties that could be interesting. One has to have some time for this but to be frank it's not really something super hard to pull-off (Bc itself conceptually)

tialaramex

8 hours ago

I think the question, certainly the question I have - is, what are some use cases outside of "crypto" and the CT log ?

trn217

8 hours ago

You can make use of merkle-trees when ever you want to proof the data integrity of large amounts of individually independent data (distributed FS etc). After playing arround with SMTs for a bit, possible use-cases came to mind quite frequently.

vlovich123

7 hours ago

The challenge to me is that the ability to prove the data integrity is quite hard once you store the data elsewhere, particularly on a single disk - I don’t know that people are regularly reading and reevalidating the stored data and likely not on every I/o operation.

xpe

7 hours ago

I'm currently exploring how different APIs compute ETags [1]. I'm inclined to think the overhead of Merkle-trees make them relatively less useful for ETags for paginated APIs where responses are small and flat hashing is fast. One rule of thumb would be: the overhead isn't justified unless you need either (a) efficient updates on very large collections, or (b) collection-level state tracking across pagination boundaries. Sound about right?

[1]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/...

Also, ETags can use weak validators:

> W/ (case-sensitive) indicates that a weak validator is used. Weak ETags are easy to generate, but are far less useful for comparisons. Strong validators are ideal for comparisons but can be very difficult to generate efficiently. Weak ETag values of two representations of the same resources might be semantically equivalent, but not byte-for-byte identical. This means weak ETags prevent caching when byte range requests are used, but strong ETags mean range requests can still be cached.

NoahZuniga

7 hours ago

How are you supposed to show that two tree heads are consistent?

Retr0id

4 hours ago

Isn't the whole point of a merkle tree that you just... compare them?

NoahZuniga

7 hours ago

Blockchains are an alternative way to make an append only data structure, so it's not clear why you would want to use a merkle tree to create a block chain.

infogulch

7 hours ago

Yes, the term block chain does has a specific technical meaning: a sequence of hashed values where each contains the hash of the previous, similar to the way a git commit includes the hash of its parent commit. But the term blockchain has also taken on a broader colloquial meaning of "log with certain cryptographic properties", which both block chain and merkle tree implementations can satisfy with various advantages and disadvantages. I think it's fair to allow usage of the broader definition.

tromp

6 hours ago

A blockchain can use Merkle trees in several ways. First, the block header usually contains the Merkle root of all transactions. This makes the header, and in turn the Proof-of-Work, commit to all txs, which helps make the chain immutable. Second, the header can also directly commit to the UTXO set, simplifying the design of light clients (sadly, Bitcoin doesn't do this).

dafelst

6 hours ago

Doesn't bitcoin use merkle-trees internally to create its blockchain?

vlovich123

8 hours ago

Nice. Since I/O is typically asynchronous, any ideas on how an async storage backend could be integrated elegantly?

Also do you have any benchmarks in the repo that would stress the I/O layer for latency too?

bibiver

6 hours ago

open to integrate async storage backends. do you have one in mind? more benchmarks coming.

gritzko

7 hours ago

Is it a binary tree? SQL storage may be an overshoot. Using the RFC 7574 layout, plain files would do fine.

geopert

7 hours ago

nice crate thanks. I wasn't expecting sqlite to be faster than rocksdb, which is key-value. The way you store the leaves (level, index) should be super fast for a key-value store.

Retr0id

5 hours ago

This surprised me too. My guess is that while rocks is typically faster for write-heavy workloads, inserting into the merkle tree actually require a fair few reads as part of the process (but I haven't looked closely).

anonymousDan

9 hours ago

Is it concurrent or sequential?

wngr

6 hours ago

What does that mean in the context of a Merkle tree?!

bibiver

6 hours ago

by now sequential, but we are working on it. unsure though which concurrency model to choose tbh.

N_Lens

10 hours ago

Good show! Can imagine a lot of crypto usecases.