The way we design and build software is continually evolving. Just as we now think of security as something we build into software from the start, we are also increasingly looking for new ways to minimize trust in that software. One of the ways we can do that is by designing software so that you can get cryptographic certainty of what the software has done.
In this post, we’ll introduce the concept of verifiable data structures that help us get this cryptographic certainty. We’ll describe some existing and new applications of verifiable data structures, and provide some additional resources we have created to help you use them in your own applications.
A verifiable data structure is a class of data structure that lets people efficiently agree, with cryptographic certainty, that the data contained within it is correct.
Merkle Trees are the most famous of these and have been used for decades because they can enable efficient verification that a particular piece of data is included among many records – as a result they also form the basis of most blockchains.
Although these verifiable data structures are not new, we now have a new generation of developers who have discovered them and the designs they enable — further accelerating their adoption.
These verifiable data structures enable building a new class of software that have elements of verifiability and transparency built into the way they operate. This gives us new ways to defend against coercion, introduce accountability to existing and new ecosystems, and make it easier to demonstrate compliance to regulators, customers and partners.
Certificate Transparency is a great example of a non-blockchain use of these verifiable data structures at scale to secure core internet infrastructure. By using these patterns, we have been able to introduce transparency and accountability to an existing system used by everyone without breaking the web.
Unfortunately, despite the capabilities of verifiable data structures and the associated patterns, there are not many resources developers can use to design, build, and deploy scalable and production-quality systems based on them.
To address this gap we have generalized the platform we used to build Certificate Transparency so it can be applied to other classes of problems as well. Since this infrastructure has been used for years as part of this ecosystem it is well understood and can be deployed confidently in production systems.
This is why we have seen solutions in areas of healthcare, financial services, and supply chain leverage this platform. Beyond that, we have also applied these patterns to bring these transparency and accountability properties to other problems within our own products and services.
To this end, in 2019, we used this platform to bring supply chain integrity to the Go language ecosystem via the Go Checksum Database. This system allows developers to have confidence that the package management systems supporting the Go ecosystem can’t intentionally, arbitrarily, or accidentally start giving out the wrong code without getting caught. The reproducibility of Go builds makes this particularly powerful as it enables the developer to ensure what is in the source repository matches what is in the package management system. This solution delivers a verifiable chaiin all the way from the source repositories to the final compiled artifacts.
Another example of using these patterns is our recently announced partnership with the Linux Foundation on Sigstore. This project is a response to the ever-increasing influx of supply chain attacks on the Open Source ecosystem.
Supply chain attacks have been possible because there are weaknesses at every link in the chain. Components like build systems, source code management tools, and artifact repositories all need to be treated as critical production environments, because they are. To address this, we first need to make it possible to verify provenance along the entire chain and the goal of the Sigstore effort is to enable just that.
We are now working on using these patterns and tools to enable hardware-enforced supply chain integrity for device firmware, which we hope will discourage supply chain attacks on the devices, like smartphones, that we rely on every day by bringing transparency and accountability to their firmware supply chain.
In all of the above examples, we are using these verifiable data structures to ensure the integrity of artifacts in the supply chain. This enables customers, auditors, and internal security teams to be confident that each actor in the supply chain has lived up to their responsibilities. This helps earn the trust of those that rely on the supply chain, discourages insiders from using their position as it increases the chance they will get caught, introduces accountability, and enables proving the associated systems continually meet their compliance obligations.
When using these patterns the most important task is defining what data should be logged. This is why we put together a taxonomy and modeling framework which we have found to be helpful in designing verifiability into the systems we discussed above, and which we hope you will find valuable too.
Please take a look at the transparency.dev website to learn about these verifiable data structures, and the tools and guidance we have put together to help use them in your own applications.