- From: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
- Date: Wed, 28 Apr 2010 11:36:56 -0400
- To: Richard Cyganiak <richard@cyganiak.de>
- Cc: Juan Sequeda <juanfederico@gmail.com>, "Eric Prud'hommeaux" <eric@w3.org>, public-rdb2rdf-wg@w3.org
Hi, Richard -- On Apr 27, 2010, at 04:49 PM, Richard Cyganiak wrote: > I'm trying to understand your position. The effort is appreciated. I'm really not being obstructionist for its own sake. > On 27 Apr 2010, at 10:31, Ted Thibodeau Jr wrote: >> Bringing RDB into RDF requires only that the schema of that RDB >> be mapped to a "direct" or "putative" ontology -- which *is* the >> correct term. > > Well, the charter is unfortunately not very explicit about what it > means to map a database to RDF. That is indeed unfortunate. One of the hazards of saying "we'll do *this* in *that* timeframe" is that the task doesn't always cooperate, and it seems that the task of defining this WG's job was one such. > Just to be explicit: Is your position that mapping to domain > ontologies such as FOAF, GoodRelations etc is out of scope of the > charter? Or is your position that it's merely not required to meet > the success criteria set out in the charter? My position is that *forcing* RDB schemas to map to domain ontologies is not necessary and will be counter-productive in the long run. My further position is that "blessing" any given domain ontologies, and further still, defining how a given transformation tool should determine whether a given RDB.table maps to RDF:Class is not only well beyond the charter, but also, as Souri said, an *enormous* task. (This is also, I think, where most if not all of the expressivity concerns come in.) >> This ontology serves only to unambiguously identify a single >> cell (table, column, row) within that schema. >> >> I suggest that then mapping that RDF into a "domain" ontology (e.g., >> SNOMED) is a separate concern -- which may be addressed in a couple >> of ways -- >> >> 1. replication with transformation >> 2. mapping ontologies >> >> >> The first means that you decide *once* how SNOMED corresponds to >> a given RDB schema -- and if that correspondence changes, you have >> to somehow discard all the triples that resulted from the original >> conversion and then re-convert the RDB data. >> >> The second means that you decide how you think SNOMED corresponds >> to your putative ontology, and create a "mapping" ontology -- >> which does little more than declare broaderClass, narrowerClass, >> equivalentClass, sameAs, and such. If you realize later that one >> of your mappings is wrong, you change this ontology -- everything >> else remains as it is. >> >> Note that #2 does not mandate either forward- or backward-chaining. >> You *can* work from #2 and replicate & transform, if you find that >> works better for your deployment scenario. You *can* use reasoning >> engines to work entirely dynamically, if that works better for you. >> >> Note that #1 *does* mandate forward-chaining. You *cannot* use >> a reasoning engine to revise the putative-to-domain mapping once >> replication & transformation has been done. > > I think I sort of agree with everything you said up to here. I think that's a good sign. >> For this simple reason, I strongly advise that we *not* combine >> putative-to-domain ontology mapping into the rdb2rdf scenario -- >> because it makes a decision which we haven't been chartered for, > > Here is where I lost you. Can you please say explicitly what that > decision is? I expressed myself poorly there. Re-expression a bit below... One decision would be "what domain ontology/ies do we bless?" Another is implied -- "if you don't map your RDB data to Domain ontologies, you aren't really exposing it as RDF." Do we want to declare a *method*, a *language*, a *syntax* for such local-ontology-to-domain-ontology mapping? That's fine, but I think what we deliver should be focused on "how do I define the mapping?" (perhaps something similar to GRDDL?) and not get into "how do I determine what the mapping should be?" But -- this local-ontology-to-domain-ontology mapping is a *second step*, which comes *after* the RDB schema is mapped to a putative/local ontology, and which, I believe, SHOULD generally come after the RDB data is transformed to RDF with that same local ontology (if indeed the RDB data is being replicated/transformed at all) -- and yes, I believe this is and should be OPTIONAL. So, re-expression -- I strongly advise that we not *conflate* putative-ontology-to- domain-ontology mapping with RDB-schema-to-putative-ontology mapping. The putative ontology is a vital element of RDB2RDF. Is this an implementation detail? In a way. I think the choice to conflate these two steps in any given tool which implements this standard is an implementation detail, which will prove my point in short order once users can choose between two tools -- one which forces all RDB schemas to map to Domain ontologies; and one which maps the RDB schema to a Local ontology with the option to further declare Local:x owl:sameAs Domain:y But I think the two steps *must not* be conflated in the standard, because that makes the local-ontology-to-domain-ontology mapping *mandatory*, and that is not acceptable to me, nor do I think it is workable in the context of this (or any) WG, even one with a delivery timeframe measured in decades. >> and which I believe we are perilously close to deciding in the >> worst possible way. > > What would be this worst possible option for the decision? That all RDB (and really, all) data must be mapped to a domain ontology to be considered (worthwhile as) RDF. Consider Juan's Scenario #4, for instance... I have a bunch of data in an RDB, and I'm pretty sure it would benefit *someone*'s analysis if it were available in RDF, so I want to make it available as such. *I'm* not doing the analysis, so do I know what domain ontology/ies it should be mapped to? Not a clue. Should the RDB2RDF standard *specify* ontology/ies to which all RDB data should be mapped? Loudly and repeatedly, I say no. Rather, my publication should use a full "local" ontology -- which simply maps table to class, column to attribute, cell content to value, and primary/foreign key relationships to class relationships. Others may look at my data and see clear domain ontology mappings which work for them, and which may work for others, and they should be able to publish these mappings. Still others may look at my data and see *different* but *no less clear* domain ontology mappings which they want (and should be easily able) to use. If the RDB2RDF publication path forces the RDB data into domain ontologies -- how can these last people *remap* the data, with their new and different ontology correspondence? I am not saying that such immediate mapping is always inappropriate, that a publisher cannot choose to say "this table in my schema is and always will be foaf:person" -- but I am saying that the RDB2RDF standard should not *force* the publisher to do so. Differently and possibly more explicitly put... I have a bunch of cartographic data in RDB. I discover DBpedia, and think that ontology is the one I should map to. So I do. Sophisticated cartographic workers familiar with RDF will know that there are other ontologies -- Freebase, Geonames, OpenCYC, etc. -- which do a much better job in many ways. If my original data were mapped to a local ontology (say, http://mymapdata.example.com/ontology/#), it would be very easy for the sophisticated user to ignore my local-to-DBpedia mapping (which is of course in its own named graph) and substitute their own local-to-Geonames+CYC+Freebase+theirCartOntology. If my original data is transformed directly into DBpedia classes -- there is no easy way to substitute the more sophisticated mapping. Is all of this such a giant problem if everyone is using backward- chaining all the way to the RDB? No -- *if* the sophisticated user can reach me and convince me to substitute their mapping for mine. But that's not the most common pattern in play, and it's not likely to become such soon, as much as I wish it would -- but I don't think it should be forced on people either! The big problem comes when the RDB2RDF transform is materialized, i.e., forward-chained, as is the most common pattern today, when people want to get the RDF dump (or crawl the SPARQL endpoint) and load it all in their local store, instead of issuing relevant queries against the existing SPARQL endpoint. Consider an example from Juan's Scenario #1, joining RDB to RDB. If my RDB2RDF mapping says -- mydb1.Contact foaf:person mydb2.Customer foaf:person -- and I've replicated all my RDB data as RDF, and later discover that Customer.name is actually filled with company names, while Contact.name is people ... how do I fix that, short of dropping and re-replicating with the new map? On the other hand, if my RDB2RDF mapping says -- mydb1.Contact ontology1:contact mydb2.Customer ontology2:customer -- it's easy for me to have statements that say -- { ontology1:contact owl:sameAs ontology2:customer . } { ontology1:contact owl:subClass foaf:person . } { ontology2:customer owl:subClass foaf:person . } There's also no *need* to say foaf:person anywhere. There's no need to know that FOAF exists at all. And if I make the same discovery -- I drop (or change) the mapping triples, and I'm done. In both of these, consider the possibility of columns which do not obviously or easily map to FOAF or any other known domain ontology. With the first option, those columns are apparently discarded or ignored. With the second, they are present, but known only by their local identity, e.g., ontology:contact#widget, until someone comes up with a new domainOntology:widget -- and hey, presto! -- { ontology:contact#widget owl:sameAs domainOntology:widget . } And once again -- there will be times and instances where it *is* appropriate to *choose* to make the RDB schema absolutely map to domain ontologies. The *ability to choose* is key. I hope this has made my concerns clearer? Be seeing you, Ted -- A: Yes. http://www.guckes.net/faq/attribution.html | Q: Are you sure? | | A: Because it reverses the logical flow of conversation. | | | Q: Why is top posting frowned upon? Ted Thibodeau, Jr. // voice +1-781-273-0900 x32 Evangelism & Support // mailto:tthibodeau@openlinksw.com // http://twitter.com/TallTed OpenLink Software, Inc. // http://www.openlinksw.com/ 10 Burlington Mall Road, Suite 265, Burlington MA 01803 http://www.openlinksw.com/weblogs/uda/ OpenLink Blogs http://www.openlinksw.com/weblogs/virtuoso/ http://www.openlinksw.com/blog/~kidehen/ Universal Data Access and Virtual Database Technology Providers
Received on Wednesday, 28 April 2010 15:37:30 UTC