11. June 2013 · 2 comments · Categories: Open Data

CKAN is a free data catalog software that is used in many open data portals worldwide. Also in Germany, many developers use, operate, develop, interface with and extend CKAN. We are looking forward to host a CKAN user meetup in Berlin: a half day get together of CKAN practitioners to share insights and ideas about CKAN.

Date: Tuesday, June 25th, 2013 (Day after Berlin Open Data Day)

Venue: Fraunhofer FOKUS, Germany (Berlin)

This is the preliminary agenda. The idea is to start with three tutorials. After lunch we are going to dive into detailed discussions about topics that participants find interesting. If you want to follow the tutorials with your machine, please bring a running CKAN “from source” installation.

  • 10:00: Towards writing CKAN extensions with an application to authorization and custom forms by Seanh (OKF)
  • 11:00: opendataregistry-client – A middleware or rich CKAN client in Java by Simon (FhG FOKUS)
  • 11:30: It’s about data – Data manipulation with CKAN datastore by Dominik (OKF)
  • 12:00: Lunch break
  • 12-15 On demand discussions about at least
    • Making your CKAN upgradable – DOs, DONTs, Issues
    • CKAN documentation road map and questions
    • Managing license variety
    • CKAN for research data
    • CKAN and semantics / linked data
Print Friendly

hack-at-home-logo

open-cities-logo has just launched a new call-out to take part in the Open Data Tourism Hack at Home, a project focused on encouraging the creation of mobile apps to help cities to better manage the challenges and benefits of tourism.

The challenge, focused on the user of Open Data and Open Sensor Networks, will allow Amsterdam, Barcelona, Berlin, Helsinki, Paris, Rome and Bologna, as also any other European city, to benefit from the talents of app creators to find solutions for managing tourism in the urban space. One of the main goals of the challenge is to develop apps that can minimize the impact of mass tourism on the city and its inhabitants. For example, the Open Data Tourism Hack at Home is looking for mobile technology solutions that will allow residents to connect with visitors, improve and personalize tourist itineraries, optimize the time for visits and improve the opportunities for tourists to move around the city during their stay.

The winning app will receive a 3.000€ prize and a nomination for the Mobile Premier Awards 2014, the most prestigious awards in the app industry. The app that best uses the platforms of Open Data and Sensor Networks will also be recognized, alongside the most voted by the public app.

By participating in the Open Data Tourism Hack at Home challenge, developers will have the opportunity to turn their ideas into working apps through the Hack at Home platform, which allows participants to present their ideas, form teams with like-minded designers, developers and coders and get the help of expert mentors to build apps for the participating cities.

From today, anyone interested in taking part in the challenge can pre-register at the Open Data Tourism Hack at Home challenge website. From May 13, participants can send their ideas for apps, start building teams and receive feedback from the mentors.

At the same time, Open Cities has also launched in parallel a call for larger solutions, services or technologies related to the same Tourism issues, to participate to its Urban Lab Challenge. This is a challenge focused on using the city resources for experimentation with innovative software or other project in a real urban environment. The winner of this challenge will get €3000 prize and the support and guidance of the city of Barcelona to implement his proposal.

Print Friendly

We are looking forward to host a German CKAN user meetup in Berlin!

Date: Tuesday, June 25th, 2013

(Day after Berlin Open Data Day)

Venue: Fraunhofer FOKUS, Germany (Berlin)

Details: http://s.fhg.de/ckan-meetup-june-13

Target Audience: CKAN practitioners, i.e. developers, users, operators etc. Unsure if this is for you? Quiz Question: What’s wrong here:

$ paster –plugin=ckan search index rebuild –config=std.ini

If this is the kind of question you have spent time with, you should come!

We need your input to figure out an agenda. There are already some offers from CKAN developers Sean and Dominik and some from FOKUS. Yet,

this is supposed to be an interactive workshop by users for users, so we are interested in your insights and questions.

The idea is to have:

  • 1h hands on tutorials, where presenters take attendees step by step through a specific task like ‘install from source’ or ‘harvest a CSW’
  • 1h open discussions, where presenters just share their situation and problems and every one can chip in.

If you want to attend please drop me a line and leave your name in this gdoc: http://s.fhg.de/ckan-meetup-june-13

Please take one minute to say what topics you’re interested in. You can offer or request a session, or just indicate that you like a topic. I would like to generate a preliminary agenda from your input so please RSVP by June 7th.

Print Friendly

One of the main goals of the GovData.de prototype is to unite as many open data sets from Germany as possible in a single catalogue. Thspre biggest part is automatically imported by so-called harvesters. In this article we provide you with an overview on which tools have been used and how useful they have proven.

In earlier articles we showed that a metadata structure which is based on CKAN is used. We also reported on a workshop with the operators of the catalogues which are to be harvested. In this context, four different import techniques were presented: JSON import, CKAN-CKAN harvesting, CSW-ISO19115-harvesting and CKAN-REST-API. In practice, primarily the first three approaches have proven to be the most useful.

Mähdrescher

Harvesting at large
(Foto: JSmith Photo,
Lizenz: Creative Commons keine Bearbeitung)

With the JSON import, the operators of the remote catalogues just name an HTTP URL, under which we can retrieve a JSON file that is updated on a daily basis and contains all of the data sets. This procedure has been used in Bremen, Bayern and Moers. With a few feedback loops, the providers were able to optimize their individual JSON export tools to the extent that a smooth integration of the metadata is possible. The metadata were initially loaded with the Unix application “wget”, if necessary given basic adjustments with a python script, and finally uploaded to the GovData.de-CKAN with the Python library ckanclient. We are currently integrating these steps in our own CKAN harvesting plug-in which makes the regular harvesting easier.

The CKAN-CKAN harvesting is used in the data portals of Hamburg, Berlin, Rostock and Rhineland-Palatinate. Theoretically, it is possible to use the CKAN harvesting extension ckanext-harvest for this task without a further development or configuration, as the providers orientate themselves towards the suggested metadata structure. In practice, however, it is necessary to take several details into account: the adoption of the categories (CKAN: “groups”), for instance, only works with minor tricks, sometimes, the allocation CKAN.author ↔ “publishing authority” is not consistently used, the use of the locally clear CKAN.name and .id has to be considered thoroughly, and capital letters and special characters in the tags, or keywords, are not transferred properly. In addition to this, keywords and titles also have to be supplemented sometimes, since the Hamburg metadata catalogue, for example, naturally does not tag all of the data sets with the word “Hamburg”. Technically, however, this approach is very elegant, since among others, with every update, only those data sets that have been changed in the intervening period are transferred.

Importing geo metadata which are coded according to the ISO 19115 standard is somewhat more complicated (see Working Group metadata site). In my opinion, this is because geo data are distributed and (should be) consumed very differently from the normal approach with open data. In this context, date are called ‘products’, frequently CDs or paper maps, which are gathered and found on the basis of the metadata, but then usually a contract is signed and the data is handed over directly by the provider to the contractual partner. Thus, the details ‘online resource’ and ‘licence’, which are of key importance for Open Data, only have a very limited level of relevance in terms of both the standard and the use by the data provider. Then there is the fact that the very detailed (meta) data model is used with differing profiles from federal state to federal state, which means that it is difficult, for instance, to identify the publishing authority in all the data sets of Geoportal.de which covers the whole of Germany.

For this reason, the import of Geoportal.de and PortalU.de has been put on hold. It has been possible, however, to partially import destatis, the Regional Database and the Open Data-offering of the Environment Office of Lower Saxony. Here, the standard was implemented very consistently and the question of the licences partially clarified (DL-DE-BY and/or UDL). For the harvesting, we have developed a branch of the CKAN extension ckanext-spatial. This adapts the standard CSW client (Catalog Service for the Web) to destatis and the regional statistics: Here, instead of CSW, zipped XML files are distributed via HTTP. At the Environmental Office of Lower Saxony, the relevant data sets are found through a CSW enquiry for ‘opendataident’. Hamburg also uses the CSW harvester to transfer metadata from the Hamburg metadata catalogue to a Hamburg CKAN.

Harvesting-Architektur

Harvesting-Architektur

The first two importers are only based on the ckanext-harvest extension and have therefore been directly installed in the productive CKAN of GovData.de. The ISO Harvester, however, is based on the quite comprehensive ckanext-spatial extension and therefore runs on a separate machine. In a next step, the data sets are then transferred to the actual data catalogue.

In the further development of these and new harvesters, we think that it is necessary for the following problems to be addressed:

  • Differing semantics: What exactly are data, documents, apps? How are services aligned? What is the meaning of time stamps when metadata is harvested from several different catalogues?
  • The standardisation of key words: How do we resolve the problem of ambiguous and differing designations for identical meanings (homonyms and synonyms)? How do we summarize similar tags (tag curation)?
  • Recognizing duplicates: Until now, duplicate data at GovData.de were the exception for organizational reasons. However, the more catalogues are networked with each other, the more it is necessary to ensure that duplicates are reliably recognized using the fields metadata_original_portal und metadata_original_id.
  • Synchronization instead of harvesting: Nowadays it is usually clear from where to where the harvesting is taking place, yet challenges are also foreseeable here: Berlin, for example, is interested in the datasets of destatis that contain the keyword Berlin; the university library centre of the federal state of North Rhine-Westphalia (hbz) has registered its Open Data exports at thedatahub.org, yet they also belong to GovData.de.

That final point is also clearly evident today: those who harvest are also harvested. The GovData.de metadata are found at offenedaten.de. It is possible that there will be harvesting in the direction of EU and in further special catalogues. We hope that we are able to support these processes through exposing our CKAN-API and the maintenance of the metadata structure.

In conclusion, it can be said that harvesting accounts for a key part of the work at GovData.de and clearly offers a corresponding added value. To grow continuously better in this area, a lot of small scale work is necessary. The cooperation between the providers and catalogue operators should ideally lead to a subsequent standardization of the metadata structure and the catalogue interfaces.

This work and its content is subject to a Creative Commons Naming 3.0 Unported Licence.

Print Friendly

Derzeit für viel um die Lizenzbestimmungen für Datensätze der öffentlichen Hand in Deutschland diskutiert. In dieser Diskussion wird unter anderem angemahnt, die international etablierten Standards, sprich Creative Commons, einzusetzen. So habe ich mir die Frage gestellt, inwiefern die Creative Commons Lizenzen als Standard etabliert sind.

In Anwendung der Methode von Paul Miller vom 11 Juli 2012, der http://datahub.io/ analysiert hat, habe ich die Methode auf

  • http://publicdata.eu
  • http://publicdata.eu
  • http://publicdata.eu
  • angewendet.

    Folgendes hat die Analyse ergeben:

    Print Friendly