- From: Ivan Mikhailov <imikhailov@openlinksw.com>
- Date: Fri, 21 Nov 2008 20:54:06 +0600
- To: public-xg-rdb2rdf@w3.org
Hello all, I've looked at the use case and I've got an impression that the complexity of the case is not in mapping data to triples but in mapping values to values. The most obvious issue is handling of compound data types. Any problem we may get with mapping of primary keys to URIs and dependent columns to objects of triples is really minor if compared with headache after combining spatial data in OSGB_1936 with statistics build on icosahedron faces (or even worse, combining data from two similar but not identical projections). Next issue is what I call "precision in reproducing errors". Say, field is LandUse:ArableField if LANDUSEFIELDS.FIELD_USE LIKE 'Arable', LandUse:PastureField if LANDUSEFIELDS.FIELD_USE LIKE 'Pasture', and LandUse:SetAsideField if LANDUSEFIELDS.FIELD_USE LIKE 'Set Aside'. One may write a function that will get FIELD_USE as argument and return one of three IRIs. That will be more efficient than union of three independent rules, and it will give identical result _on valid data_. Nevertheless there will be difference if a field is labeled as 'Pasture Arable'. Other subtle issue is "volatile" data, when one should look for compromise between export of data that are "not valid enough" and data that are "not complete enough". Say, Topo_Area.dat contains data of different versions and dates. If I will export all areas of all versions (or even all latest versions) then I have integrity violation in every case when a new road is built and placed on the map years after a field is mapped. If I will restrict the extraction with some time window then I get blank areas on the "tablet". It is quite possible that one will need a configurable parameter to set a task-specific threshold, but now we have no way of expressing that in mapping language. What remains after solving these and similar problems is a corpus of "pretty rectangular" data that are ready for trivial mapping to RDF. We had applications that demonstrated non-trivial problems in the mapping itself, i.e. values were mapped unchanged and not filtered but writing the mapping was interesting anyway, but that applications use tenths of tables with tenth of columns, resulting in hundreds of mapping rules. I'm sure that Ordinance Survey Use Case will demonstrate mapping-specific problems when it is tried as a real industrial application, but it is free from them when it is truncated to a tutorial case. Best Regards, Ivan Mikhailov, OpenLink Software http://virtuoso.openlinksw.com
Received on Friday, 21 November 2008 14:57:31 UTC