DePIN Data Verification: A Challenge With no Silver Bullets
At times, all it takes to defeat a high-tech solution is something as low-tech as a piece of bubble gum. You see, certain goods — mostly food products such as fish or dairy — have to be transported at a specific temperature. Companies set up sensors in delivery trucks to keep track of that, but drivers, who can get fined for mishandled shipments, aren’t fans of those. So some of them just put chewing gum on the sensors, which makes sensor readings invalid.
In a data-dependent decentralized environment like DePIN, malicious actors can mess with the system as easily as the drivers in this anecdote, which comes from an industry partner of ours. And the consequences can be a lot more dire than a food poisoning, potentially bringing entire projects down. So how should builders handle DePIN data verification? That’s a question we took to heart when working on peaq, the layer-1 for DePIN.
Data makes DePIN go round
First, let’s take a moment to fully grasp the issue at hand. In essence, it comes down to two things: data being crucial for DePINs to function — and their incentive model inviting potential abuse.
On the data side, there is an entire segment of DePINs that are fully focused on crowd-sourcing data. Silencio Network collects data on noise pollution, MapMetrics collects navigation data, WeatherXM, as shocking as that is, collects weather data, and so on and so forth. For all of these projects, data is their product; it is what they offer to the demand side, enterprises, researchers, and everyone in between. As such, quality data is crucial for their ability to consistently reward users sharing it, which is a key part of the DePIN flywheel enabling projects to scale.
That said, DePIN data verification matters even if data isn’t the core product. Let’s take a DePIN for electric vehicle charging, for example. The charging session is obviously an off-chain event, and the DePIN needs data to charge the buyer for the session and reward the provider for it. More specifically, it needs such variables as the vehicle and chargepoint IDs, the duration of the session, the amount of electricity consumed, et cetera.
In both cases, data and rewards ultimately go hand in hand: If you provide the right data, you get the tokens. Most people will do exactly that, playing by the rules. However, a small amount of bad actors might decide to spice things up — and spoil the meal for everyone.
Fake it till you break it
Here is the problem: Data can be spoofed, or forged, in non-techie terms. The chewing gum story very much applies, but here’s another entertaining examples — remember Pokemon Go? Well, there’s an entire subreddit on spoofing your location in the game, enabling you to catch a Pikachu or two from the comfort of your sofa. In a scenario where a DePIN offers increased rewards for data from a specific location, you could use the same principles to spoof the location of your sensors and earn more tokens for providing, well, garbage data. By the same account, a chargepoint that’s not really there will probably entertain the prankster who put it on the map, but not the driver whose electric vehicle was just about to run out of juice.
From the DePIN’s perspective, such spoofing is pretty abysmal. If the DePIN’s whole product is data, spoofing is poisoning its well, making its datasets less valuable. In the short term, this means fewer happy customers, in the long term, it could kill off the demand side and bring the project down. With non data-centric DePINs, fake activity cuts into the reward pools for honest participants and can enable real-world crime.
The catch is, though, that Web3’s ethos demands DePINs to be open for anyone to join, with no centralized device checkups. At the same time, DePINs must operate trustlessly, featuring baked-in mechanisms for filtering out malicious actors at every step.
So what is the ultimate DePIN data verification solution? Well… There’s none.
The simple truth is, there are no one-size-fits-all solutions here. The reason for that is quite simple: DePINs are very diverse, they dive into hundreds of industries and leverage thousands of device types. This means that any sort of device-level uniform controls can only be enforced if a DePIN works exclusively on proprietary hardware. The strategy is not illegitimate, but comes at the expense of openness, while also introducing a centralized point of failure into the project — the device manufacturer.
The good news is that we don’t need a one-size-fits-all solution. We can rather mix and match various approaches and build decentralized networks with multi-level safeguards. Using the devices’ private keys to sign all the data they broadcast at the source is a good device-agnostic first step that allows us to track the data to its source and thus weed out malicious actors post-factum. However, the device needs to be able to sign transactions for that to work, and that also takes specialized software. And, of course, the dutifully and correctly signed data might still be wrong if there’s chewing gum involved.
Machine learning is another handy data verification tool, seeking out abnormalities in the data coming in from the devices. This approach benefits from the very network effect that makes a DePIN — the more data sources you have, the more on-point your estimates of the underlying pattern will be — and flags anything that doesn’t fit into it. Granted, an anomaly might not always be the result of malicious activities, though, which is why DePINs would also benefit from using data oracles as well. These oracles would provide another benchmark for the DePIN to compare the data against, further boosting its ability to weed out malicious actors.
When working in tandem, all of these solutions provide us with a toolbox that allows DePIN builders to create the suitable data verification solution for their specific case, making the DePIN data trustworthy. That, in turn, is a crucial step toward wider DePIN adoption across the board, from enterprises in need of quality data and services to everyday people.
So, while DePIN data verification will always take some customization, the value of getting it right will be as huge as the downside for getting it wrong.
Note: The views expressed in this column are those of the author and do not necessarily reflect those of CoinDesk, Inc. or its owners and affiliates.
Edited by Benjamin Schiller.