11. June 2013 · 2 comments · Categories: Open Data

CKAN is a free data catalog software that is used in many open data portals worldwide. Also in Germany, many developers use, operate, develop, interface with and extend CKAN. We are looking forward to host a CKAN user meetup in Berlin: a half day get together of CKAN practitioners to share insights and ideas about CKAN.

Date: Tuesday, June 25th, 2013 (Day after Berlin Open Data Day)

Venue: Fraunhofer FOKUS, Germany (Berlin)
More »

We are looking forward to host a German CKAN user meetup in Berlin!

Date: Tuesday, June 25th, 2013

(Day after Berlin Open Data Day)

Venue: Fraunhofer FOKUS, Germany (Berlin)

Details: http://s.fhg.de/ckan-meetup-june-13

Target Audience: CKAN practitioners, i.e. developers, users, operators etc. Unsure if this is for you? Quiz Question: What’s wrong here:

$ paster –plugin=ckan search index rebuild –config=std.ini

If this is the kind of question you have spent time with, you should come!

We need your input to figure out an agenda. There are already some offers from CKAN developers Sean and Dominik and some from FOKUS. Yet,

this is supposed to be an interactive workshop by users for users, so we are interested in your insights and questions.

The idea is to have:

  • 1h hands on tutorials, where presenters take attendees step by step through a specific task like ‘install from source’ or ‘harvest a CSW’
  • 1h open discussions, where presenters just share their situation and problems and every one can chip in.

If you want to attend please drop me a line and leave your name in this gdoc: http://s.fhg.de/ckan-meetup-june-13

Please take one minute to say what topics you’re interested in. You can offer or request a session, or just indicate that you like a topic. I would like to generate a preliminary agenda from your input so please RSVP by June 7th.

One of the main goals of the GovData.de prototype is to unite as many open data sets from Germany as possible in a single catalogue. Thspre biggest part is automatically imported by so-called harvesters. In this article we provide you with an overview on which tools have been used and how useful they have proven.
More »

Everything ready! We are looking forward a great 1st International Open Data Dialog in Berlin on Dec. 5-6, 2012 at Fraunhofer FOKUS

Having built the first German Open Data portal and currently building the German Open Government Platform with a strong emphasis on open government data, we now invite all our friends, partners, colleagues and all interested people to discuss about innovative business models based on open data. We like to discuss how to promote the potentials of new value-added services using open data offers, how to design, develop and deploy infrastructure and tools for the provision and processing of open data, in combination with closed commercial and private data as well as how to foster transparency in society, business, politics and administration by use of open data.
More »

The Open Government Platform for Germany (OGPD) is an access portal for electronic resources in public administration, in particular data, but also documents and applications. It bundles locally maintained files in one convenient interface and provides in particular a central access point for citizens in general and in particular developers, data journalists, governments and business. It also provides users with a feedback channel to the data providers within the authorities.

To fulfill this purpose, the platform includes two main components: a content management system (CMS) and a data catalog. The CMS provides for the management of editorial content such as information pages, links, news, opportunities for comment and reviews by users and supports an integrated view of the data catalog. In the catalog the metadata describing the data, documents and applications, are kept, which in turn refer to distributed data offers (available online files or services).

This architectural pattern occurs frequently in similar portals. Differences arise mainly in the choice of software products for the components and the way how they interact with each other. Choosing Liferay as CMS and CKAN as data catalog is referred to the OGPD study. Here, we only want to explain how they fit together and can be used by the actors (for example, user or editor) of the platform.

At its core is the Liferay CMS that provides most of the functionality known as portlets in a web interface. Editorial content such as articles and blog post are created right here. The contents of the data catalog are displayed using search fields and result lists. Data providers can register new or update existing datasets via the web form.

In addition to queries/edits via CMS, the data catalog can be accessed directly via a REST interface. With this, data providers can automate the release of their data in the OGPD.

For such data providers who catalog for their own metadata, the harvesting component come into play. It allows to “harvest” existing catalogs, that is to import their content while filtering transforming it into the metadata structures of the OGPD. For OGPD the catalogs of spatial data infrastructure, PortalU, destatis, Berlin, Bremen and Hamburg are currently read via INSPIRE CSW respectively CKAN API. With respect to the open data criteria only such datsets are considered that have an electronic resource, a description and a well-defined license.

For users, the web interface is the main entrance to OGPD. Here, editorial, information and community content can be searched. Over OGPD users get direct access to available online data offers. At the same time you can comment and review.

Creative Commons Lizenzvertrag

This work and content is licensed under a Creative Commons Attribution 3.0 Unported License.

A key aspect of open data is the easy access to them. Data journalists and application developers can tap into data faster and better if they are discoverable in central portals. Centralized data management is hardly feasible beyond administrative and domain boundaries for various reasons (heterogeneous data, distributed competence, conflicts of interest, etc.) and it is not necessarily useful. Therefore, distributed data storage with a central metadata portal is generally a good idea. In a prominent place – like daten.berlin.de – information about and links to the data poviders’ data are collected and presented – in Berlin, for example, the various Senate administrations or, say, city cleaning and transport agencies.

But what is recorded in addition to the name, description and author in the metadata of open datasets? This question arises when capturing the metadata as well as in the automatic exchange of metadata records, known as harvesting. Only if structure and meaning are sufficiently uniform or self-explanatory, a central portal can be realized, in this case for Germany, which unites various data offers and the contents of existing data catalogs.

Consistent metadata is addressed in many domains with different approaches and priorities, such as environmental data or bibliographic data (see OpenGov study, section on metadata). For Open Data it has been best practice in Europe and America to use the metadata structures of CKAN (Comprehensive Knowledge Archive Network) of OKFN. In OpenData CKAN is the de facto standard for data catalog software.

CKAN exchanges metadata in JSON format. The only required field is the name that should be both readable for users and URL friendly, all other fields are optional. The core fields are title, description, resource (ie data files or services), license and contact person. Further details can be stored in a JSON dictionary, i.e. as nested key-value pairs. This focus on the essentials along with great flexibility are likely to be the reason for the spread of this metadata model.

Throughout the development of open data, especially in Berlin and Germany, a desire for more structure became apparent: many data providers and developers wanted precise instructions what information should be persisted in what form. In order to obtain the minimal, flexible character of CKAN and JSON on the one hand and also to clearly define how the metadata should look for OGPD on the other, we develop the JSON scheme for Open Government Data (OGD).

The OGD-metadata structure is maintained on github.com. It is intended not so much as a tool to validate metadata, but rather as a communication tool for those interested, like public decision-makers, data providers, developers and other open data initiatives in the German speaking area. This purpose is also served publishing in early beta stage and publicly transparent development on github.com.

The metadata structure that supports the description of datasets (including data services), as well as documents and applications. This is how it is setup: The most important properties are stored at the top level. These include: title, identifier, description, responsible and terms of use. Furthermore, the list of resources is essential, that is the actual data, documents or applications. The most important property of each resource in turn is their URL. In addition, each resource description and format can be recorded. This configuration allows, for example, to capture related files as one record, possibly for different periods, in different languages or formats. Within the “extras” all other data are stored. These mainly include the temporal and spatial arrangement, and details about the origin of imported items.

On github.com you can find a tabular HTML representation next to the schema and lists of to be used categories and licenses. We are looking forward to comments, suggestions and questions.

Creative Commons Lizenzvertrag

This work and content is licensed under a Creative Commons Attribution 3.0 Unported License.

On 25 September 2012 representatives met from Bavaria, Berlin, Bremen, Hamburg and Baden-Wuerttemberg and from PortalU and GDI-DE at our institute Fraunhofer FOKUS in Berlin to discuss the metadata structure for OGDP. It was also discussed how existing data offerings can be converted into the OGDP.

Harvesting refers to the merging of metadata from different catalogs. As part of OGPD the metadata of that workshop participants as well as from DESTATIS are harvested insofar as they meet the minimum criteria for Open Data: Only those records, documents and applications are accepted which have a freely and available electronic resource, description, and a well-defined license.

For that I discussed the proposed metadata structure. It was re-adjusted especially regarding unique identifier to uniquely trace the origin and detection of duplicates, of dealing with contact details, the detection of open licenses and the geographical coverage. In addition, the main categories were discussed for the classification of datasets, documents and applications, and summarized in the following 14 main categories:

  • Economics and Labour
  • Transport and Traffic
  • Environment and Climate
  • Geography, Geology and Geodata
  • Health
  • Consumer Protection
  • Infrastructure, Construction and Housing
  • Education and Science
  • Public administration, Budget and Tax
  • Law and Justice
  • Social
  • Culture, Leisure, Sport and Tourism
  • Population
  • Politics and Elections

These primary categories are the basic classification and supplemented by specific, for example, subject-specific, sub-categories. For harvesting existing categorizations, such as in INSPIRE or EVAS, are mapped to these 14 categories.

After clarification of the metadata structure as a target to be provided for data, documents and applications, various ways to provide datasets for OGPD were discussed. As a result, four different ways are going to be implemented and offered:

  • Passive providing by CSW, which is used for example for the spatial data catalog and PortalU
  • Passive providing by CKAN / JSON, which is used for example in Berlin, Hamburg and Bremen
  • Active providing by CKAN API, which will be used for example by Bavaria
  • Manually record by form, which is for example used by the Ministry of Finance for the fiscal data

The main result of our harvesting workshop is certainly the revised metadata structure, which is now available on github.com.

Administrating the metadata structure for OGPD on GitHub allows transparent, collaborative development including version control. Change requests can be made public, the history of the metadata structure is documented and the current status is always visible.

Florian Marienfeld and Thomas Scheel just have added an HTML representation of the JSON schema of metadata structure for OGPD that makes the metadata structure more readable and easier to understand.

We look forward to your comments and suggestions to the metadata structure and / or to harvesting – directly in GitHub, but just as happy here.

Creative Commons Lizenzvertrag

This work and content is licensed under a Creative Commons Attribution 3.0 Unported License.