Re: Some implementation experience on ISSUE-72 (translation tables)

Thank you for the detailed explanation. I will get up to speed for today's telcon

Juan Sequeda
www.juansequeda.com

On Dec 13, 2011, at 7:15 AM, Richard Cyganiak <richard@cyganiak.de> wrote:

> On 13 Dec 2011, at 05:34, Juan Sequeda wrote:
>>> In D2RQ, they are 1:1 mappings from literals to either URIs or literals.
>>> 
>>> The 1:1 restriction in both products has turned out to be a painful limitation, so I'm keen on relaxing it.
>> 
>> Forgive me if you have stated this before ... what are the limitations of the 1:1 mappings. What types of mappings are desired? Could you give me examples of different (desired) mappings.
> 
> For example let's say we're integrating data about spatial planning from multiple local authorities. For some of them, we can download recent planning applications as CSV files from their website; with others, we have to crawl their website; but some give us SQL dumps.
> 
> One of the SQL dumps may have a column PLANNING_APPLICATION.STATUS. Here's the documentation for the meaning of the codes in that column:
> 
> +----+-------------------------+
> | id | application status      |
> +----+-------------------------+
> |  0 | INCOMPLETED APPLICATION |
> |  1 | NEW APPLICATION         |
> |  2 | FURTHER INFORMATION     |
> |  3 | DECISION MADE           |
> |  4 | LEAVE TO APPEAL         |
> |  5 | APPEALED                |
> |  8 | WITHDRAWN               |
> |  9 | APPLICATION FINALISED   |
> | 10 | PRE-VALIDATION          |
> | 11 | DEEMED WITHDRAWN        |
> | 12 | APPEALED FINANCIAL      |
> | 13 | PENDING DECISION        |
> | 14 | UNKNOWN                 |
> +----+-------------------------+
> 
> Other local authorities use different codes to represent the same statuses. But worse, they might have different workflow, so they may have different statuses altogether. They may not have a “PRE-VALIDATION” step, or make no distinction between “WITHRAWN” and “DEEMED WITHDRAWN”. Thus, in the integrated RDF version we might want to map to a simpler scheme that only contains four values:
> 
> planning:appstatus-incomplete
> planning:appstatus-inprogress
> planning:appstatus-appealed
> planning:appstatus-final
> 
> This means we need to map several DB values to the same RDF value, so it's not 1:1. For example, 0 and 2 would both map to planning:appstatus-incomplete; 1, 10 and 13 would all map to planning:appstatus-inprogress, et cetera.
> 
> Another example was the “cuisine” example we used earlier when discussing various designs. The DB may distinguish between “thai” and “indian”, but in the output taxonomy of culinary styles those may be lumped together under “asian”. See the two examples here:
> 
>   http://www.w3.org/2001/sw/rdb2rdf/drafts/translation-tables-DERI2.html
> 
> Another example is this mapping from BibTeX types to BIBO classes. Several database values are mapped to the same BIBO class, for example both "mastersthesis" and "phdthesis" are mapped to bibo:Thesis as BIBO doesn't distinguish those two subclasses.
> 
>   article       => bibo:Article
>   book          => bibo:Book
>   booklet       => bibo:Book
>   conference    => bibo:Article
>   inbook        => bibo:Chapter
>   incollection  => bibo:DocumentPart
>   inproceedings => bibo:Article
>   manual        => bibo:Manual
>   mastersthesis => bibo:Thesis
>   misc          => bibo:Document
>   phdthesis     => bibo:Thesis
>   proceedings   => bibo:Proceedings
>   techreport    => bibo:Report
>   unpublished   => bibo:Document
> 
> In summary, if you want to use a shared standard classification, then you may find that the database model divides the universe into slightly different “chunks”, and it's not always possible to do a direct 1:1 translation.
> 
>>> Neither D2R Map nor D2RQ can map *from* URIs to something else. The use case that Souri put forward – generating URIs from the DB using a URI template, and then overriding only *some* of these URIs with a translation table – isn't supported in D2RQ. I've encountered no evidence that someone needs this, neither in interactions with D2RQ users, nor in reviewing actual published RDF data from other sources.
>> 
>> Souri, does this mean that you would like to generate a custom URI, and then map this custom URI to another existing URI? 
> 
> I hope I represent Souri's intentions correctly by saying “yes”. He gave an extended example here:
> http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2011Dec/0015.html
> 
> The key is that New York would end up with IRI http://www.nyc.gov because it's in the translation table, and Austin would end up with a IRI simply generated from the IRI template, so it would be http://www.city.austin.tx.us .
> 
> This could work, but see my comment below on what I think should be the behaviour for values not defined in the translation table; and see my response here:
> http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2011Dec/0016.html
> 
>>> Also, I think that this can't be the default behaviour of translation tables. If there is no mapping for a given database value, then the database value should be treated like a NULL (or one could perhaps argue that it should be treated as an error). Treating this by using the database value as-is in the RDF output would likely produce hard-to-detect errors when database values are forgotten or misspelled in the table, or added later on in the DB. I think I'd -1 this as the default behaviour. It might be ok as an optional alternate behaviour that can be chosen through some flag, but then again given that I don't see the use case, and that there are additional implementation complexities with this approach, we'd not be very likely to implement this behaviour in D2RQ.
> 
> 
> Best,
> Richard

Received on Tuesday, 13 December 2011 20:10:45 UTC