Publishing and consuming geo-spatial and government data on the semantic web

Atemezing, Ghislain Auguste

Over the past few years, the domain of Open Data has received an increasing attention from public administrations. Potential benefits for citizens include more transparency in the decision making, a better governance and a virtuous development of a digital eco-system that would create added-value apps when processing and analyzing open data. However, opening up and publishing data is not enough to create this data value chain due to a number of challenges such as the heterogeneity of formats (XML, CSV, Excel, PDF, Shape Files), the variety of access methods (API, database, dump) and the lack of nomenclature that would enable to better re-use and interconnect datasets. In this thesis, we explore how semantic web technologies can be used to tackle the research problems related to the integration and consumption of geo-spatial data.

This thesis applies the Linked Data principles in the domain of geographic information (a key domain for open government). In particular, we address three key challenging problems in the publishing workflow of geospatial open data publication and consumption, with real world use cases from the the French National mapping agency (IGN): (1) How to efficiently represent and store geospatial data on the Web to ensure interoperable applications? (2) What are the best options for a user to interact with semantic content using visualizations? (3) What are the mechanisms that support preserving structured data of a high quality on the Web?

Our contributions are thus break down into three parts with applications in the geographical domain. We propose and model three vocabularies for representing coordinate reference systems (CRS), topographic entities and their geometries. These ontologies extend existing vocabularies and add two additional advantages: an explicit use of CRS identified by URIs for geometry, and the ability to describe structured geometries in RDF. We have contributed to the development of the Datalift platform that aims to support lay users in the process of "lifting" raw data into RDF. We have published the French authoritative database GEOFLA using this tool and we provide a systematic evaluation of the performances of the most used endpoints when dealing with spatial queries.

Regarding the consumption of linked data, after reviewing different categories of visualization (generic and Linked Data specific), we propose a vocabulary for Describing VIsualization Applications (DVIA). We formalize and implement a novel workflow for visualizing datasets with the LDVizWiz tool: a Linked Data Visualization Wizard.

The last part of the thesis describes contributions to the Linked Open Vocabularies (LOV) catalogue: it shows how LOV can be used with an ontology modeling methodology (e.g. the NeOn methodology) to improve reuse of vocabularies. We propose an heuristic to align vocabularies and a ranking of vocabularies based on information content (IC) metrics. Finally, the thesis provides answers on how to check the license compatibility between vocabularies and datasets in the publication workflow. Through this thesis, we demonstrate the benefits of using semantic technologies and W3C standards to improve the discovery, interlinking and visualization of geo-spatial government data for their publication on the Web.

Data Science
Eurecom Ref:
© TELECOM ParisTech. Personal use of this material is permitted. The definitive version of this paper was published in and is available at :