Notes re. Ordinance Survey Use Case. from Ivan Mikhailov on 2008-11-21 (public-xg-rdb2rdf@w3.org from November 2008)

From: Ivan Mikhailov <imikhailov@openlinksw.com>
Date: Fri, 21 Nov 2008 20:54:06 +0600
To: public-xg-rdb2rdf@w3.org
Message-Id: <1227279246.23392.122.camel@master.iv.dev.null>

Hello all,

I've looked at the use case and I've got an impression that the
complexity of the case is not in mapping data to triples but in mapping
values to values.

The most obvious issue is handling of compound data types. Any problem
we may get with mapping of primary keys to URIs and dependent columns to
objects of triples is really minor if compared with headache after
combining spatial data in OSGB_1936 with statistics build on icosahedron
faces (or even worse, combining data from two similar but not identical
projections).

Next issue is what I call "precision in reproducing errors". Say,
field is LandUse:ArableField if LANDUSEFIELDS.FIELD_USE LIKE 'Arable',
LandUse:PastureField if LANDUSEFIELDS.FIELD_USE LIKE 'Pasture', and
LandUse:SetAsideField if LANDUSEFIELDS.FIELD_USE LIKE 'Set Aside'.
One may write a function that will get FIELD_USE as argument and return
one of three IRIs. That will be more efficient than union of three
independent rules, and it will give identical result _on valid data_.
Nevertheless there will be difference if a field is labeled as 'Pasture
Arable'.

Other subtle issue is "volatile" data, when one should look for
compromise between export of data that are "not valid enough" and data
that are "not complete enough". Say, Topo_Area.dat contains data of
different versions and dates. If I will export all areas of all versions
(or even all latest versions) then I have integrity violation in every
case when a new road is built and placed on the map years after a field
is mapped. If I will restrict the extraction with some time window then
I get blank areas on the "tablet". It is quite possible that one will
need a configurable parameter to set a task-specific threshold, but now
we have no way of expressing that in mapping language.

What remains after solving these and similar problems is a corpus of
"pretty rectangular" data that are ready for trivial mapping to RDF. We
had applications that demonstrated non-trivial problems in the mapping
itself, i.e. values were mapped unchanged and not filtered but writing
the mapping was interesting anyway, but that applications use tenths of
tables with tenth of columns, resulting in hundreds of mapping rules.

I'm sure that Ordinance Survey Use Case will demonstrate
mapping-specific problems when it is tried as a real industrial
application, but it is free from them when it is truncated to a tutorial
case.

Best Regards,

Ivan Mikhailov,
OpenLink Software
http://virtuoso.openlinksw.com

Received on Friday, 21 November 2008 14:57:31 UTC