Announcing a unified vulnerability schema for open source (Google Online Security Blog)

In recent months, Google has launched several efforts to strengthen open-source security on multiple fronts. One important focus is improving how we identify and respond to known security vulnerabilities without doing extensive manual work. It is essential to have a precise common data format to triage and remediate security vulnerabilities, particularly when communicating about risks to affected dependencies—it enables easier automation and empowers consumers of open-source software to know when they are impacted and make security fixes as soon as possible.

We released the Open Source Vulnerabilities (OSV) database in February with the goal of automating and improving vulnerability triage for developers and users of open source software. This initial effort was bootstrapped with a dataset of a few thousand vulnerabilities from the OSS-Fuzz project. Implementing OSV to communicate precise vulnerability data for hundreds of critical open-source projects proved the success and utility of the format, and garnered feedback to help us improve the project; for example, we dropped the Cloud API key requirement, making the database even easier to access by more users. The community response also showed that there was broad interest in extending the effort further.

Today, we’re excited to announce a new milestone in expanding OSV to several key open-source ecosystems: Go, Rust, Python, and DWF. This expansion unites and aggregates four important vulnerability databases, giving software developers a better way to track and remediate the security issues that affect them. Our effort also aligns with the recent US Executive Order on Improving the Nation’s Cybersecurity, which emphasized the need to remove barriers to sharing threat information in order to strengthen national infrastructure. This expanded shared vulnerability database marks an important step toward creating a more secure open-source environment for all users.
A simple, unified schema for describing vulnerabilities precisely

As with open source development, vulnerability databases in open source follow a distributed model, with many ecosystems and organizations creating their own database. Since each uses their own format to describe vulnerabilities, a client tracking vulnerabilities across multiple databases must handle each completely separately. Sharing of vulnerabilities between databases is also difficult.

The Google Open Source Security team, Go team, and the broader open-source community have been developing a simple vulnerability interchange schema for describing vulnerabilities that’s designed from the beginning for open-source ecosystems. After starting work on the schema a few months ago, we requested public feedback and received hundreds of comments. We have incorporated the input from readers to arrive at the current schema:

{

        « id »: string,

        « modified »: string,

        « published »: string,

        « withdrawn »: string,

        « aliases »: [ string ],

        « related »: [ string ],

        « package »: {

                « ecosystem »: string,

                « name »: string,

                « purl »: string,

        },

        « summary »: string,

        « details »: string,

        « affects »: [ {

                « ranges »: [ {

                        « type »: string,

                        « repo »: string,

                        « introduced »: string,

                        « fixed »: string

                } ],

                « versions »: [ string ]

        } ],

        « references »: [ {

                « type »: string,

                « url »: string

        } ],

        « ecosystem_specific »: { see spec },

        « database_specific »: { see spec },

}

This new vulnerability schema aims to address some key problems with managing vulnerabilities in open source. We found that there was no existing standard format which:

Enforces version specification that precisely matches naming and versioning schemes used in actual open source package ecosystems. For instance, matching a vulnerability such as a CVE to a package name and set of versions in a package manager is difficult to do in an automated way using existing mechanisms such as CPEs.Can be used to describe vulnerabilities in any open source ecosystem, while not requiring ecosystem-dependent logic to process them.Is easy to use by both automated systems and humans.
With this schema we hope to define a format that all vulnerability databases can export. A unified format means that vulnerability databases, open source users, and security researchers can easily share tooling and consume vulnerabilities across all of open source. This means a more complete view of vulnerabilities in open source for everyone, as well as faster detection and remediation times resulting from easier automation.

The current state


The vulnerability schema spec has gone through several iterations, and we are inviting further feedback as it gets closer to finalized. A number of public vulnerability databases today are already exporting this format, with more in the pipeline:
Go vulnerability database for Go packagesRust advisory database for Cargo packagesPython advisory database for PyPI packagesDWF database for vulnerabilities in the Linux kernel and other popular softwareOSS-Fuzz database for vulnerabilities in C/C++ software found by OSS-FuzzThe OSV service has also aggregated all of these vulnerability databases, which are viewable at our web UI. They can also be queried with a single command via the same existing APIs:

  curl X POST d

      ‘{« commit »: « a46c08c533cfdf10260e74e2c03fa84a13b6c456 »}’

      « https://api.osv.dev/v1/query »

    

  curl X POST d

      ‘{« version »: « 2.4.1 », « package »: {« name »: « jinja2 », « ecosystem »: « PyPI »}}’

      « https://api.osv.dev/v1/query »


Automating vulnerability database maintenance


Producing quality vulnerability data is also difficult. In addition to OSV’s existing automation, we built more automation tools for vulnerability database maintenance, and used these tools to bootstrap the community Python advisory database. This automation takes existing feeds, accurately matches them to packages, and generates entries containing precise, validated version ranges with minimal human intervention. We plan to extend this tooling to other ecosystems for which there is no existing vulnerability database, or little support for ongoing database maintenance.


Get involved


Thank you to all the open source developers who have provided feedback and adopted this format. We’re continuing to work with open source communities to develop this further and earn more widespread adoption in all ecosystems. If you are interested in adopting this format, we’d appreciate any feedback on our public spec.