Our ultimate goal is to bootstrap transforming of government data into RDF and better as Linked Data. While many government are going open, we want to help at making them going open and linked :-).
We present here a couple of contributions:
-
dcat is an RDF vocabulary for the exchange of data catalogs. Its primary purpose is the expression of government data catalogs, such as data.gov or data.gov.uk, in RDF.
As a feasibility study, we put four existing catalogs (namely data.gov, data.london.gov.uk, data.australia.gov.au and datasf.org) in RDF using dcat. We provide Linked Data interface and SPARQL endpoint to access this data.
-
If you are not aware of Freebase Gridworks (yet?)... watch one of the great screencasts on their website now!
Freebase Gridworks is a power tool that allows you to load data, understand it, clean it up, reconcile it internally, augment it with data coming from Freebase, and optionally contribute your data to Freebase for others to use. All in the comfort and privacy of your own computer.
But it doesnot provide a direct way to export RDF!!
we enabled navigating a catalog represented in RDF using dcat through Freebase Gridworks (the catalog can be provided as RDF dump file or through a SPARQL endpoint). We also added RDF export functionality to Gridworks.
In conclusion, we provide a way to represent government catalogs, which are hubs for very valuable government data, in RDF and then provide an easy way to navigate throught this data, open it using, the very powerful tool, Freebase Gridworks where data can be cleaned, linked and enhanced. Finally, we enable exporting this data as RDF. We believe that this two-steps tackling of RDFizing government data is necessary to manage the various datasets that governments provide i.e. It enables tackling domain-specific dataset in a case-by-case manner.
download (Updated:13/08/2010)
Download, unzip, navigate to the folder and run
java -jar gw.jar
Currently, the application runs only on Java 1.6 We will provide a distribution that runs on Java 1.5 soon.
The image below shows the starting screen. The arrow 1 points to the new section added to Freebase Gridworks which enables browsing a government catalog represented as RDF according to dcat vocabulary and recommendations. Note that you can browse a SPARQL end point or a dump RDF file (arrow 2).
The example shows the result of our experimental SPARQL endpoint which contains dcat representation of four catalogs: data.gov, data.australia.gov.au, data.london.gov.uk and datasf.org.
When using the application you can use this as a SPARQL endpoint URL: http://lab.linkeddata.deri.ie/govcat/sparql (please be patient as the endpoint is running on limited resources)
or use one of these as dump files:
- data.australia.gov.au: http://lab.linkeddata.deri.ie/2010/dcat/files/data_australia.rdf
- data.london.gov.uk: http://lab.linkeddata.deri.ie/2010/dcat/files/data_london.rdf
- datasf.org: http://lab.linkeddata.deri.ie/2010/dcat/files/data_sf.rdf
After loading the RDF data, you can browse through the available datasets. Full-text search(arrow 1) and category and data format facets(arrow 2)can be used to search teh catalog.
Any dataset can be dowloaded and tabular ones can be opened using Freebase Gridworks(arrow 3), then you have all the goodness and power of it to clean and polish the data.
Inside Freebase Gridworks you can now find "Edit RDF Schema" option under Schemas menu
The Dialog empowers you to shape the RDF the way you want... you can set base URI(arrow 2) and all relative URIs will be resolved against it. You can add rdf:type to resources(arrow 1) You can define your own property if the autocomplete popup does not help(arrow 4) entering a relative URI will coin a new property within the namespace dtermined by the base URI you entered. At any point, you can preview the resulting RDF(arrow 3) this will show (up to) the first 20 rows represented in Turtle. Vocabulary Manager (arrow 5) enables managing the used vocabularies/ontologies.
Vocabulary manager. A handful of popular vocabularies are predefined for convinience.
Autocomplete options. Terms are based on the vocabularies defined in the vocabulary manager prefix.cc
Clicking on anode shows a dialog where you can specify all the details of the intended RDF resources.
To define your custom URIs you have the full power of Gridwokrs Expression Language (GEL). We also add a urlify function to it.
While this is work-in-progress and still have some bugs and missing features, we wat to highlight a further issue here. Catalogs (especially data.gov) might define the format of a dataset as CSV but actually provide the data in different format (usually exe) and things will just not work as expected. So at least some unexpected behavior is not our fault :-)
Extending GridWorks
The additional functionality, namely exporting RDF and browsing governemnt catalogues described in dcat, is developed by Fadi Maali and Richard Cyganiak.
This site is © Copyright , Linked Data Research Centre (LiDRC), DERI 2010, All Rights Reserved
Free website templates
|