A standard of metadata and distribution for open data.

Recently I am very obsessed with a tool that provide me with more efficient data managing. I didn’t use any specific tool to organize CSV and JSON file which I got someplace, or make it. Especially it’s a pain to add metadata to data, like source URL, time, and description of how I merge with tools.

Also, I want to make website that users can modify existing data realtime, with branch and record –something like Github or Wikipedia. For it, somewhat of standard form of metadata is necessary.

 

The standard metadata file they propose is “datapackage.json.” The file should be on top level of the folder, due to recognizing whole data under the top. As the example below, license, last update, resource, and description, and so on can be placed on the folder including dataset like CSV or JSON files. It would help with file-managing and user-modifying. Extensibility is good as well for private use and user’s customization.

<pre><code>{
  "name": "a-unique-human-readable-and-url-usable-identifier",
  "datapackage_version": "1.0-beta",
  "title": "A nice title",
  "description": "...",
  "version": "2.0",
  "keywords": ["name", "My new keyword"],
  "licenses": [{
    "url": "http://opendatacommons.org/licenses/pddl/",
    "name": "Open Data Commons Public Domain",
    "version": "1.0",
    "id": "odc-pddl"
  }]
  "sources": [{
    "name": "World Bank and OECD",
    "web": "http://data.worldbank.org/indicator/NY.GDP.MKTP.CD"
  }],
  "contributors":[ {
    "name": "Joe Bloggs",
    "email": "joe@bloggs.com",
    "web": "http://www.bloggs.com"
  }],
  "maintainers": [{
    # like contributors
  }],
  "publishers": [{
    # like contributors
  }],
  "dependencies": {
    "data-package-name": "&gt;=1.0"
  },
  "resources": [
    {
      ... see below ...
    }
  ],
  # this is an attribute that is not part of the data package spec
  # you can add your own attributes to a datapackage.json
  "my-own-attribute": "data-packages-are-awesome"
}</code></pre>

There are the tools for datapackage.json to make, validate, and parse and fetch. I want to get a making tool with GUI, but I didn’t find it. Fetcher program that I found is only of Python’s “datapackage.py“, despite of the usefulness for local environment, I want a module for node.js for webapp.
Anyway, I tried and it worked.


import datapackage

<span style="line-height: 1.5em;">datapkg = datapackage.DataPackage('http://data.okfn.org/data/country-list/')</span>

print datapkg.title
&gt;&gt;List of all countries with their 2 digit codes (ISO 3166-2)

print datapkg.description
&gt;&gt;ISO 3166-1-alpha-2 English country names and code elements. This list states
&gt;&gt;the country names (official short names in English) in alphabetical order as
&gt;&gt;given in ISO 3166-1 and the corresponding ISO 3166-1-alpha-2 code elements.

for i in datapkg.data:
....: print i
....:
&gt;&gt;{u'Code': u'AF', u'Name': u'Afghanistan'}
&gt;&gt;{u'Code': u'AX', u'Name': u'\xc5land Islands'}
&gt;&gt;{u'Code': u'AL', u'Name': u'Albania'}
&gt;&gt;{u'Code': u'DZ', u'Name': u'Algeria'}

Ah, but is there no column name in standard? If so, that is not good for me, because I want to see whole column names at a glance without load actual data. Or did I overlook it?

Leave a Reply