NeoGeo Vocabulary: Defining a shared RDF representation for GeoData

This Version
http://geovocab.org/doc/survey.html
Latest Version
http://geovocab.org/doc/neogeo.html
Last Modified
$Id: survey.html 18 2011-04-08 09:00:11Z non88sense@gmail.com $
Status
Public draft
Authors
Juan Martín Salas (FRLP, Universidad Tecnológica Nacional)
Andreas Harth (AIFB, Karlsruher Institut für Technologie)

Abstract

Nowadays, although many datasets describe GeoData as Linked Data, this is mostly done by using different vocabularies for each dataset. Moreover, this vocabularies vary significantly, each having its advantages and disadvantages. The purpose of this document is emphasize the need of a common vocabulary for the description of GeoData, analyze currently used design options and ultimately provide a possible vocabulary for the description of georeferenced geometric shapes and spatial relations, in order to incentivate the discussion on this matter.


Contents


1. Introduction

A geographic region is generally described in three basic, common ways, which seem to be shared across datasets. The first is obviously through its name (e.g. Argentina), secondly through its geographic location (i.e. a set of geographic coordinates defining its location and shape), and finally through its location in relation to other regions (e.g. the country which shares borders with Chile and Uruguay).

In the context of the Semantic Web, the most obvious way to identify a region is through its Uniform Resource Identifier (URI), which is a well-known standard for unambiguously identifying resources. The geographic location of a region may be described by a georeferenced geometric shape and spatial relations may be qualitatively defined by a description logic such as the Region Connection Calculus (RCC).

However, no consense has been yet achieved for developing an RDF vocabulary with enough descriptive power to describe GeoData. It is the intention of this document to address the importance of this matter and to incentivate its discussion within the community.


2. Survey

In order to develop a useful vocabulary, while maintaining the required effort to integrate it to existing representations to a minimum, a number of popular datasets were analyzed.

2.1 List of datasets

2.1.1 UN FAO Geopolitical Ontology

The "Food and Agriculture Organisation of the United Nations" (FAO) is a is a specialised agency of the United Nations that leads international efforts to defeat hunger. The UN FAO Geopolitical Ontology was developed, in order to provide the FAO and its associated partners with a master reference for geopolitical information.

Example: http://aims.fao.org/aos/geopolitical.owl#Ireland (Ireland)

2.1.2 UK Ordnance Survey

Ordnance Survey, an executive agency and non-ministerial department of the United Kingdom government, is the national mapping agency for Great Britain. Ordnance Survey has recently released a number of its products as Linked Data, by the name of OS OpenData.

Example: http://data.ordnancesurvey.co.uk/id/7000000000025156 (Southampton, Itchen)

2.1.3 GeoLinkedData.es

GeoLinked Data (.es) is an open initiative aimed to provide geospatial information about Spain's national territory. Information is gathered from various sources such as the "Instituto Geográfico Nacional de España" (IGN) and the "Instituto Nacional de Estadística" (INE).

Example 1: http://geo.linkeddata.es/resource/Provincia/Madrid (Comunidad de Madrid)

Example 2: http://geo.linkeddata.es/resource/R%C3%ADo/R%C3%ADo%20Adaja%20 (Río Adaja)

2.1.4 LinkedGeoData.org

LinkedGeoData is a project in which data from the OpenStreetMap project is gathered and made available as Linked Data. The LinkedGeoData project was developed by the University of Leipzig.

Example 1: http://linkedgeodata.org/triplify/way27743320 (Alte Mensa, Dresden)

Example 2: http://linkedgeodata.org/triplify/node264695865 (Liebigstraße, Dresden)

2.1.5 GeoNames

GeoNames is a geographical database which covers all countries. The GeoNames database is accesible under a Creative Commons attribution license.

Example: http://sws.geonames.org/2964180/ (Galway, Ireland)

2.1.6 Uberblic.org

Uberblic is an integration service for data. Some of Uberblic's sources of information are: GeoNames, Wikipedia, MusicBrainz, Freebase, Last.fm and Foursquare.

Example: http://uberblic.org/resource/0ede6ccd-c805-444f-9e3f-b67669b7fef0 (Nantes, France)

2.1.7 EU NUTS

The Nomenclature of Units for Territorial Statistics (NUTS) is a geocode standard for referencing the subdivisions of countries for statistical purposes. The standard was developed by the European Union in order to divide its economic territory into regions with comparable populations, in order to obtain comparable regional statistics.

Example: http://rdfdata.eionet.europa.eu/ramon/nuts2008/DE111 (Stuttgart, Germany)

2.1.8 DBpedia

DBpedia is a community effort to extract structured information from Wikipedia and make it available as Linked Data. The project was developed by the Free University of Berlin and the University of Leipzig, in collaboration with OpenLink Software.

Example: http://dbpedia.org/resource/Berlin (Berlin, Germany)


3. Geometry

Current approaches to the representation of geometric shapes vary widely across the analyzed datasets, from GML code in an RDF Literal to a collection of nodes in an RDF container. The following section will provide examples of these representation methods.

3.1 Current representation methods

3.1.1 Point

Location of objects is merely represented by a geographic point. The most common vocabulary to do so is W3C Geo. Furthermore, this is information is sometimes complemented with a GeoRSS representation, such is the case of the UK Ordnance Survey, although GeoRSS is not a proper RDF vocabulary but an XML-Schema. In some cases, neither W3C Geo nor GeoRSS is used, but an own vocabulary. This is the case of the Uberblic Ontology, which uses its own "latitude", "longitude" and "altitude" predicates.

Example from GeoNames:

@prefix wgs84_pos: <http://www.w3.org/2003/01/geo/wgs84_pos#> .

<http://sws.geonames.org/2964180/> wgs84_pos:lat 53.27194 .
<http://sws.geonames.org/2964180/> wgs84_pos:long -9.04889 .

3.1.2 Bounding box

The location is represented by two points forming a georeferenced rectangle (on a Mercator projection). This is the case of the FAO Geopolitical Ontology, which uses four predicates (hasMinLongitude, hasMinLatitude, hasMaxLongitude, hasMaxLatitude) to represent a rectangle. Note that the rectangle is represented by its line segments, which should be tangential to the region at some point.

Example from the FAO Geopolitical Ontology:

@prefix fao: <http://aims.fao.org/aos/geopolitical.owl#> .

fao:Ireland fao:hasMinLatitude "51.42"^^xsd:float .
fao:Ireland fao:hasMaxLatitude "55.38"^^xsd:float .
fao:Ireland fao:hasMinLongitude "-10.58"^^xsd:float .
fao:Ireland fao:hasMaxLongitude "-5.99"^^xsd:float .

3.1.3 Points in lists

In this case, a region's geometric shape is represented by a collection of points which are defined in either an RDF Collection or an RDF Container. LinkedGeoData.org uses this approach to represent geometric shapes, by using a "hasNodes" predicate, which links to a rdf:Seq Container. This container describes the shape's nodes, which are represented by using the W3C Geo Vocabulary.

However, some reasoners (e.g. Pellet) seem to discard the triples that link the RDF Container to its elements. Moreover, there is no intuitive way of manipulating the order of the nodes in a SPARQL query.

Example from LinkedGeoData.org:

@prefix lgdo:    <http://linkedgeodata.org/ontology/> .
@prefix lgd:     <http://linkedgeodata.org/triplify/> .

lgd:way23289876 lgdo:hasNodes <http://linkedgeodata.org/triplify/way23289876/nodes> .
	
<http://linkedgeodata.org/triplify/way23289876/nodes>
      a       rdf:Seq ;
      rdf:_1  lgd:node252104902 ;
      rdf:_2  lgd:node252104889 ;
      rdf:_3  lgd:node252104890 ;
      rdf:_4  lgd:node252104891 ;
      rdf:_5  lgd:node304644309 ;
      rdf:_6  lgd:node252104900 ;
      rdf:_7  lgd:node252104901 ;
      rdf:_8  lgd:node252104902 .

Note that the first and last nodes must be the same in order to form a polygon.

And a node would be represented like:

@prefix lgdo:    <http://linkedgeodata.org/ontology/> .
@prefix lgd:     <http://linkedgeodata.org/triplify/> .
@prefix geo:     <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix georss:  <http://www.georss.org/georss/> .

lgd:node252104891
      lgdo:memberOfWay lgd:way23289876 , lgd:way27743320 ;
      georss:point "51.0269888 13.726175" ;
      geo:lat 51.0269888 ;
      geo:long 13.726175 .

Notice in this case the "lgdo:memberOfWay" predicate, which relates the node to the geometric shapes it belongs to.

3.1.4 Nodes defined by a single predicate

In the GeoLinkedData.es ontology, rivers are currently represented by a group of "Curva" RDF resources (similar to a GML LineString). This "Curva" resources use a single predicate "formadoPor" to link each of their nodes, which at the same time contain the WGS-84 coordinates (represented with the W3C Geo Ontology) and an "orden" (order) predicate, defining the position of each node within the geometric shape.

Example from GeoLinkedData.es:

@prefix geo:     <http://www.w3.org/2003/01/geo/wgs84_pos#> .

<http://geo.linkeddata.es/resource/R%C3%ADo/R%C3%ADo%20Adaja%20> geo:geometry 
			<http://geo.linkeddata.es/resource/70800fa69dac72ecb3ab0b8199208d57e87f2ec1> ,
			<http://geo.linkeddata.es/resource/478602f7bcf834d4ceea76ba4b109a09bd47849e> ,
			<http://geo.linkeddata.es/resource/dc6a2a768769a6f840953c7ffca50d6e6f2e9543> ,
			<http://geo.linkeddata.es/resource/b9771b76ad5db7241d349979d01c98045584772f> ,
			<http://geo.linkeddata.es/resource/859c0c85961f24a5e7a23361e93bfcad62d244b6> .

The above code shows how some of the rivers in the GeoLinkedData.es are represented by a set of "Curva" (Curve) resources. Notice that the "geo:geometry" predicate is not part of the W3C Geo vocabulary, but an extension introduced by Virtuoso Servers. The "Curva" resource is then defined by a set of nodes like shown in the following code:

@prefix geoes:   <http://geo.linkeddata.es/ontology/> .
	
<http://geo.linkeddata.es/resource/70800fa69dac72ecb3ab0b8199208d57e87f2ec1>
      rdf:type geoes:Curva ;
      geoes:formadoPor 
			<http://geo.linkeddata.es/resource/wgs84/41.168441591773714_-4.697274497787241> ,
			<http://geo.linkeddata.es/resource/wgs84/41.16826151181368_-4.6972698477885535> , 
			<http://geo.linkeddata.es/resource/wgs84/41.16797321187768_-4.697274327787851> ,
			<http://geo.linkeddata.es/resource/wgs84/41.16806343185765_-4.697264737789995> .

A node is defined as follows:

@prefix geoes:   <http://geo.linkeddata.es/ontology/> .
@prefix geo:     <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix rdf:      .

<http://geo.linkeddata.es/resource/wgs84/41.168441591773714_-4.697274497787241>
      rdf:type geo:Point ;
      geoes:orden "3"^^xsd:int ;
      geo:lat "41.16844159177371"^^xsd:double ;
      geo:long "-4.697274497787241"^^xsd:double .

Note the inclusion of a "geoes:orden" (order) predicate, which defines the node's position within a geometric shape.

3.1.5 Literals

Both the GeoLinkedData.es and the UK Ordnance Survey ontologies include a predicate allowing to include a GML representation of the resource, which is coded in RDF as a literal.

Example from UK Ordnance Survey:

@prefix geometry: <http://data.ordnancesurvey.co.uk/ontology/geometry/> .
	
<http://data.ordnancesurvey.co.uk/id/7000000000025156> geometry:extent <http://data.ordnancesurvey.co.uk/id/geometry/122138> .
	

The "geometry:extent" property links a feature to its geometric representation. As seen on the following code, this consists of a GML code contained in a RDF literal:

@prefix geometry: <http://data.ordnancesurvey.co.uk/ontology/geometry/> .
	
<http://data.ordnancesurvey.co.uk/id/geometry/122138>
      geometry:asGML '<gml:Polygon xmlns:gml="http://www.opengis.net/gml" srsName="os:BNG"><gml:exterior><gml:LinearRing><gml:posList srsDimension="2">
				441358.5 110740.4 441387.3 110714.4 441458.3 110660.8 441506.2 110624.0 441358.5 110740.4 
		      </gml:posList></gml:LinearRing></gml:exterior></gml:Polygon>'^^rdf:XMLLiteral ;
	

3.2 Vocabularies used across the analyzed datasets

While not all of the analyzed datasets describe geometric shapes, the ones that do describe them provide their own vocabulary. It is possible to apreciate this fact by looking at the following table:

Point Bounding box Points in lists Single predicate GML as a literal
UN FAO Own
UK Ordnance Survey W3C Geo / GeoRSS Own / GML
GeoLinkedData.es W3C Geo Own Own / GML
LinkedGeoData.org W3C Geo Own
GeoNames W3C Geo
Uberblic Own
EU NUTS
DBpedia W3C Geo

As it is shown in the table above, geometric shapes are not only described by using different vocabularies, but also these vocabularies are based on different structures, which increases the difficulty when working with GeoData across datasets.


4. Spatial Relations

A spatial relation states the location of an object in relation to another. For example, if an region is said to contain another region, an "inclusion" spatial relation holds between the two regions.

4.1 Region Connection Calculus (RCC)

In 1992, David A. Randell, Zhan Cui and Anthony G. Cohn presented a description logic in their article "A Spatial Logic based on Regions and Connection", which allowed qualitative spatial representation and consistent reasoning. This logic received the name of "Region Connection Calculus" (RCC).

RCC is based on a single dydaic relation C(x,y), which is symetric and reflexive and is read as "x is connected to y". Every other relation can be derived from C(x,y) forming a hierarchy like it is shown in the following figure:

Hierarchy of the RCC relations

Two discrete regions (DR) can be connected (C) or disconnected (DC). If they are connected, but they do not overlap (O), they are said to be externally connected (EC). If they do overlap, there is three posibilities: either they partially overlap (PO), a region is part of the other (P) and inversely (Pi), but when the two last relations hold at the same time, they are said to be the same region (EQ). A region is a proper part of another (PP), if the other region is not a part of the first one. In this case, it may be a tangential proper part (TPP), if they share at least one common point in their borders.

4.2 Spatial relations in the analyzed datasets

The following table shows which predicates are use in each dataset to describe spatial relations:

Disjoint Externally Connected Partially Overlaps Is Proper Part of Properly Contains Equals Proximity
UN FAO hasBorderWith isInGroup 1
UK Ordnance Survey disjoint 4 touches partiallyOverlaps 4 within contains equals 4
GeoLinkedData.es formaParteDe formadoPor 3
LinkedGeoData.org
GeoNames neighbour / neighbouringFeatures parentFeature 2 childrenFeatures nearby / nearbyFeatures
Uberblic adjoining_location containing_location 4
EU NUTS partOf
DBpedia locatedInArea
  1. Only when the range resource is a "geographical_region".
  2. And its sub-properties: parentADM1, parentADM2, parentADM3, parentADM4 and parentCountry.
  3. This property is also used to link nodes to a geometrical shape.
  4. Currently no instance references this property.

Notice that in order to deduce a "tangential proper part" relation, a region A must be a proper part of another region B, and A and B must be externally connected to another region C. In order to do so, regions must be externally connected not only between the ones of the same administrative level, but also with regions of higher administrative levels. Right now, only the UK Ordnance Survey uses these predicates in such manner.


5. Proposed Ontologies

5.1 Geometry

5.1.1 Overview

The following diagram shows a model for a possible ontology for representing georeferenced geometrical shapes:

A proposed model for geometrical shapes

This model uses a single predicate approach to link a geometrical shape to its nodes, since it enables the description of geometrical shape while retaining the ability to query the order of the nodes in SPARQL, and also being the most compatible with reasoners.

Points are differentiated from nodes in order to separate a geographical point from its inclusion in a particular geometrical shape, allowing multiple nodes to reference the same point (i.e. making shared borders more explicit).

This approach allows also the description of polygons with "holes". Notice that a polygon must at least have one exterior boundary, and may or may not have inner boundaries. This model uses a similar vocabulary to the ones used by GML and KML (except that in KML "interior" and "exterior" are "innerBoundaryIs" and "outerBoundaryIs" respectively), in order to keep as much compatibility as possible with this already widely used XML-Schemas.

5.1.2 Example

Map of South Africa Example polygon of South Africa

South Africa is the most southern country in the African continent. The country of Lesotho is fully contained within the borders of South Africa. An example representation of South Africa's border (which are in this case simplified) is shown below:

@prefix ex: <http://example.org/> .
@prefix ex_points: <http://example.org/wgs_84/> .
@prefix geometry: <http://linkeddata.com.ar/geometry#> .

ex:South_Africa geometry:hasBorder _:polygon .

_:polygon rdf:type geometry:Polygon ;
          geometry:exterior [
                       rdf:type geometry:LinearRing ;
                       geometry:hasNode [
                                geometry:hasLocation ex_points:-29_16 ;
                                geometry:order 1
                       ], [
                                geometry:hasLocation ex_points:-22_31 ;
                                geometry:order 2
                       ], [
                                geometry:hasLocation ex_points:-28_33 ;
                                geometry:order 3
                       ], [
                                geometry:hasLocation ex_points:-34_27 ;
                                geometry:order 4
                       ], [
                                geometry:hasLocation ex_points:-35_19 ;
                                geometry:order 5
                       ], [
                                geometry:hasLocation ex_points:-29_16 ;
                                geometry:order 6
                       ] ];
          geometry:interior [
                       rdf:type geometry:LinearRing ;
                       geometry:hasNode [
                                geometry:hasLocation ex_points:-29.5_27 ;
                                geometry:order 1
                       ], [
                                geometry:hasLocation ex_points:-28.5_28.5 ;
                                geometry:order 2
                       ], [
                                geometry:hasLocation ex_points:-29.5_29.5 ;
                                geometry:order 3
                       ], [
                                geometry:hasLocation ex_points:-31_28 ;
                                geometry:order 4
                       ], [
                                geometry:hasLocation ex_points:-29.5_27 ;
                                geometry:order 5
                       ] ].

A geographical point is represented by using the W3C Geo vocabulary as follows:

@prefix geo:     <http://www.w3.org/2003/01/geo/wgs84_pos#> .

ex_points:-29.5_27 rdf:type geo:Point;
                   geo:lat: -29.5;
                   geo:long 27.

5.2 Spatial Relations

5.2.1 Overview

The following diagram shows the hierarchy of properties in a possible ontology for representing spatial relations based on RCC8:

A proposed model for spatial relations

The properties labeled with blue circles are a possible vocabulary to describe spatial relations between objects. It seems more intuitive to use "proper part" and "externally connected with" relations to deduce a "tangential proper part" relation, than to directly define the latter. This approach is already used by the Ordnance Survey to define spatial relations.

Since an explicit definition of disconnected and discrete regions would imply an exponential growing of triples, a closed world assumption could be taken to describe these relations. In this case, regions which do not overlap would be discrete regions. Moreover, if these regions do not hold an "externally connected" relation, would be disconnected regions.

5.2.2 Example

Map of Argentina Map of South America

Argentina is a country in South America, Buenos Aires city is the capital of Argentina, but is also an independent administrative region within the borders of the Buenos Aires province. The code below shows an example representation of this fact by using the proposed vocabulary:

@prefix ex: <http://example.org/> .
@prefix spatial: <http://linkeddata.com.ar/spatial#> .

ex:Buenos_Aires_City spatial:isProperPartOf ex:Buenos_Aires_Province ;

		     spatial:externallyConnectedWith ex:Uruguay .

ex:Buenos_Aires_Province spatial:isProperPartOf ex:Argentina;

			 spatial:externallyConnectedWith ex:Entre_Rios, ex:Santa_Fe, ex:Cordoba, ex:La_Pampa,
							 ex:Rio_Negro, ex:Uruguay .

ex:Argentina spatial:isProperPartOf ex:South_America ;

             spatial:externallyConnectedWith ex:Uruguay, ex:Brazil, ex:Paraguay, ex:Bolivia, ex:Chile .

Since the Buenos Aires City and the Buenos Aires Province are both externally connected with Uruguay, it is implied that the Buenos Aires City is a tangential proper part of the Buenos Aires province. This is also true between the Buenos Aires Province and Argentina. Argentina is a non tangential proper part of South America.


6. Conclusion

Summing up, we conclude that given the current variety of vocabularies, it is necessary to engage the discussion to create a shared vocabulary that addresses the requirements of current and future implementations. Moreover, given the growing amount of GeoData, it seems that the difficulty to establish a shared vocabulary will increase in the future.

A vocabulary that seems to satisfy current requirements was proposed from the analysis made of current providers of GeoData. However, it is the desire of the authors to engage in a wider discussion with the community, in order to ensure the achievement of a vocabulary that suits most use cases.


Acknowledgements

We would like to thank the contributions and many helpful suggestions from:

Barry Norton (AIFB, Universität Karlsruhe)
Jens Lehmann (AKSW/MOLE, Universität Leipzig)

References


Change Log