A key aspect of open data is the easy access to them. Data journalists and application developers can tap into data faster and better if they are discoverable in central portals. Centralized data management is hardly feasible beyond administrative and domain boundaries for various reasons (heterogeneous data, distributed competence, conflicts of interest, etc.) and it is not necessarily useful. Therefore, distributed data storage with a central metadata portal is generally a good idea. In a prominent place – like daten.berlin.de – information about and links to the data poviders’ data are collected and presented – in Berlin, for example, the various Senate administrations or, say, city cleaning and transport agencies.
But what is recorded in addition to the name, description and author in the metadata of open datasets? This question arises when capturing the metadata as well as in the automatic exchange of metadata records, known as harvesting. Only if structure and meaning are sufficiently uniform or self-explanatory, a central portal can be realized, in this case for Germany, which unites various data offers and the contents of existing data catalogs.
Consistent metadata is addressed in many domains with different approaches and priorities, such as environmental data or bibliographic data (see OpenGov study, section on metadata). For Open Data it has been best practice in Europe and America to use the metadata structures of CKAN (Comprehensive Knowledge Archive Network) of OKFN. In OpenData CKAN is the de facto standard for data catalog software.
CKAN exchanges metadata in JSON format. The only required field is the name that should be both readable for users and URL friendly, all other fields are optional. The core fields are title, description, resource (ie data files or services), license and contact person. Further details can be stored in a JSON dictionary, i.e. as nested key-value pairs. This focus on the essentials along with great flexibility are likely to be the reason for the spread of this metadata model.
Throughout the development of open data, especially in Berlin and Germany, a desire for more structure became apparent: many data providers and developers wanted precise instructions what information should be persisted in what form. In order to obtain the minimal, flexible character of CKAN and JSON on the one hand and also to clearly define how the metadata should look for OGPD on the other, we develop the JSON scheme for Open Government Data (OGD).
The OGD-metadata structure is maintained on github.com. It is intended not so much as a tool to validate metadata, but rather as a communication tool for those interested, like public decision-makers, data providers, developers and other open data initiatives in the German speaking area. This purpose is also served publishing in early beta stage and publicly transparent development on github.com.
On github.com you can find a tabular HTML representation next to the schema and lists of to be used categories and licenses. We are looking forward to comments, suggestions and questions.
This work and content is licensed under a Creative Commons Attribution 3.0 Unported License.