Syncing Data Between Bitcoin Nodes Is About to Get Easier
Minisketch is a new solution that’s trying to solve an old problem.
Spearheaded by Blockstream co-founder Pieter Wuille, Bitcoin Core contributor and fellow Blockstream co-founder Gregory Maxwell, and Blockstream software engineer Gleb Naumenko, the open-source initiative is designed to achieve set reconciliation between the mempools of each full node.
“Set reconciliation, in short, is the problem of trying to figure out what the differences are between two sets stored on different computers, while minimizing how much data needs to be exchanged between them. In particular, it’s trying to do so while sending less data than the entire set of data,” Wuille told Bitcoin Magazine
For Bitcoin, this means discerning the differences in transaction data between nodes. Maxwell likened set reconciliation to the process of syncing your phone’s contact list with another person who has many of the same contacts.
“You could send them your whole list but it won’t fit on a postcard and would be pretty wasteful in any case, since they already know most of the contacts … It is possible, in fact, to communicate your whole set of contacts to them by sending only as much information as the size of the difference between your lists even without any idea in advance of what the actual differences are,” Maxwell told Bitcoin Magazine.
In short, set reconciliation would reduce the bandwidth required to run a full node on the Bitcoin network by minimizing how much data each node transmits to each other. This would effectively allow nodes to sync up the data in their mempools more efficiently.
Breaking Down Set Reconciliation
The problem that minisketch wants to amend isn’t blockchain specific.
Set reconciliation is a constant frustration that any distributed system grapples with. Writ large, it simply means that two or more parties on a distributed network hold different sets of data, and to reconcile this, they must figure out which pieces of the data they’re missing — as well as which pieces they have that the other party lacks.
For Bitcoin, these data are transactions. These transactions are relayed from node to node until they are picked up by miners to be included into new blocks.
The problem is that the order of transactions can vary from mempool to mempool (the queues that list incoming transactions to be included into new blocks). This means that there can be (and usually is) a discrepancy in the order of transactions between mempools and newly relayed blocks.
“Bitcoin nodes have a … problem when relaying transactions between each other. Any given node will have mostly the same transactions as any one of its peers — received from other links, but not exactly the same. Today nodes waste a lot of bandwidth just figuring out who needs to be sent what [data],” Maxwell said.
How Minisketch Closes the Gap
An implementation of the PinSketch algorithm, minisketch constructs a library of data to be used for constructing set sketches (i.e., for this use case, sets of transaction data). Nodes and miners can then use these sets for compact set reconciliation.
Simply put, the solution would allow node operators to compare notes on transaction data. It would allow them to sketch (create) sets (lists) of transactions, and the program would cross-check these sets to see which data occurs in one but not both sets. Instead of spending time and energy revealing all of this data to each other, though, with minisketch, nodes only need to know the difference between their transaction sets to sketch a full set.
As Wuille explained, it would look like this in practice:
“If we simplify it to just a single difference, it’s easy to see how this may work:
- Say I have the set {3,5,7,11}, and you have the set {3,5,7,9,11}, so the difference is {9}.
- We both compute the sum of our elements, so I obtain 3+5+7+11=26, and you obtain 3+5+7+9+11=35.
- I send my sum 26 to you, and you subtract it from your sum; the difference is 9.
“This works, but it is restricted to finding a single difference. Minisketch generalizes this by sending various types of ‘sums’ of the data. The result is that with N different sums, you can find N differences … As long as the number of differences between the sets is not more than the number of ‘sums’ sent, minisketch will always succeed in finding all the differences.”
If successfully implemented, this set reconciliation could make transaction relay between nodes more efficient. Along with other in-the-works improvements to Bitcoin’s infrastructure, this could reduce each node’s broadcasting burden by a significant margin, Maxwell said.
“I made a measurement a while back and found that transaction relay was about 87 percent of a node’s bandwidth usage. This was before compact blocks, so the number is potentially greater now. Our simulation results indicate that we might be able to cut relay overheads by the order of 40x through a combination of improvements that include minisketch.”
According to Wuille, the minisketch solution could also create “a more robust network.” Maxwell expanded on this line of thinking by indicating that with the protocol, nodes could use their saved bandwidth to connect with 16 to 24 other nodes instead of the standard 8, “which would make some theoretical attacks even harder to pull off without using more bandwidth,” he claimed.
Maxwell is also hopeful that one day, minisketch could be used to revamp block propagation as well. “There have been other protocols for block propagation using IBLT (Invertible Bloom Lookup Table) that minisketch would make a lot better,” he said, though he admitted that such a solution is “not urgent” because “[compact] blocks already make block transfer itself use only about 4MB per node per day, so even if improvements reduced that number to nothing it wouldn’t make that big of a difference to users or the network.” Instead, he envisions this solution being more useful for “very low transmission like Blockstream’s satellite.”
Wuille indicated that minisketch will reduce node bandwidth requirements and have a higher success probability than IBLT, but he conceded that for larger data sets, IBLT would be faster. Maxwell added that “IBLT is very inefficient for less than a few hundred differences,” while minisketch is more efficient for smaller data sets.
Minisketch is still very much in the preliminary stages, however; Wuille stated that an actual BIP is a while away and that adoption is subject to a number of factors.
Reiterating his observations to a point, Maxwell noted that the protocol is also not part of Bitcoin’s network consensus. If any node operators felt comfortable enough and wanted to try to improve their transaction relays, they could choose to use minisketch even without its ubiquitous adoption on the network.
“One useful point is that relay mechanisms are not a part of the Bitcoin consensus. You and I could start using an improved protocol between us regardless of what other people choose to do. This means that improvements to relay are only delayed by ordinary protocol/software engineering considerations — we have to build it, verify it, integrate it, etc. — but unlike a consensus change, it doesn’t depend on anyone agreeing to it except the people who choose to use it.”
This article originally appeared on Bitcoin Magazine.