- From: Juan Sequeda <juanfederico@gmail.com>
- Date: Mon, 3 May 2010 19:56:29 -0500
- To: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
- Cc: Richard Cyganiak <richard@cyganiak.de>, "Eric Prud'hommeaux" <eric@w3.org>, public-rdb2rdf-wg@w3.org
- Message-ID: <n2sf914914c1005031756i44871400x1e9b65b75074bc44@mail.gmail.com>
Nobody has commented on Ted's post in the last 6 days. What is going to happen on tomorrow's call? We do need to get this UC document out. How are we going to proceed? Juan Sequeda +1-575-SEQ-UEDA www.juansequeda.com On Wed, Apr 28, 2010 at 10:36 AM, Ted Thibodeau Jr < tthibodeau@openlinksw.com> wrote: > Hi, Richard -- > > > On Apr 27, 2010, at 04:49 PM, Richard Cyganiak wrote: > > I'm trying to understand your position. > > The effort is appreciated. > > I'm really not being obstructionist for its own sake. > > > > On 27 Apr 2010, at 10:31, Ted Thibodeau Jr wrote: > >> Bringing RDB into RDF requires only that the schema of that RDB > >> be mapped to a "direct" or "putative" ontology -- which *is* the > >> correct term. > > > > Well, the charter is unfortunately not very explicit about what it > > means to map a database to RDF. > > That is indeed unfortunate. One of the hazards of saying "we'll do > *this* in *that* timeframe" is that the task doesn't always cooperate, > and it seems that the task of defining this WG's job was one such. > > > > Just to be explicit: Is your position that mapping to domain > > ontologies such as FOAF, GoodRelations etc is out of scope of the > > charter? Or is your position that it's merely not required to meet > > the success criteria set out in the charter? > > My position is that *forcing* RDB schemas to map to domain ontologies > is not necessary and will be counter-productive in the long run. > > My further position is that "blessing" any given domain ontologies, > and further still, defining how a given transformation tool should > determine whether a given RDB.table maps to RDF:Class is not only > well beyond the charter, but also, as Souri said, an *enormous* task. > > (This is also, I think, where most if not all of the expressivity > concerns come in.) > > > >> This ontology serves only to unambiguously identify a single > >> cell (table, column, row) within that schema. > >> > >> I suggest that then mapping that RDF into a "domain" ontology (e.g., > >> SNOMED) is a separate concern -- which may be addressed in a couple > >> of ways -- > >> > >> 1. replication with transformation > >> 2. mapping ontologies > >> > >> > >> The first means that you decide *once* how SNOMED corresponds to > >> a given RDB schema -- and if that correspondence changes, you have > >> to somehow discard all the triples that resulted from the original > >> conversion and then re-convert the RDB data. > >> > >> The second means that you decide how you think SNOMED corresponds > >> to your putative ontology, and create a "mapping" ontology -- > >> which does little more than declare broaderClass, narrowerClass, > >> equivalentClass, sameAs, and such. If you realize later that one > >> of your mappings is wrong, you change this ontology -- everything > >> else remains as it is. > >> > >> Note that #2 does not mandate either forward- or backward-chaining. > >> You *can* work from #2 and replicate & transform, if you find that > >> works better for your deployment scenario. You *can* use reasoning > >> engines to work entirely dynamically, if that works better for you. > >> > >> Note that #1 *does* mandate forward-chaining. You *cannot* use > >> a reasoning engine to revise the putative-to-domain mapping once > >> replication & transformation has been done. > > > > I think I sort of agree with everything you said up to here. > > I think that's a good sign. > > > >> For this simple reason, I strongly advise that we *not* combine > >> putative-to-domain ontology mapping into the rdb2rdf scenario -- > >> because it makes a decision which we haven't been chartered for, > > > > Here is where I lost you. Can you please say explicitly what that > > decision is? > > I expressed myself poorly there. Re-expression a bit below... > > One decision would be "what domain ontology/ies do we bless?" > > Another is implied -- "if you don't map your RDB data to Domain > ontologies, you aren't really exposing it as RDF." > > Do we want to declare a *method*, a *language*, a *syntax* for > such local-ontology-to-domain-ontology mapping? That's fine, > but I think what we deliver should be focused on "how do I define > the mapping?" (perhaps something similar to GRDDL?) and not get > into "how do I determine what the mapping should be?" > > But -- this local-ontology-to-domain-ontology mapping is a > *second step*, which comes *after* the RDB schema is mapped to > a putative/local ontology, and which, I believe, SHOULD generally > come after the RDB data is transformed to RDF with that same local > ontology (if indeed the RDB data is being replicated/transformed > at all) -- and yes, I believe this is and should be OPTIONAL. > > So, re-expression -- > > I strongly advise that we not *conflate* putative-ontology-to- > domain-ontology mapping with RDB-schema-to-putative-ontology > mapping. The putative ontology is a vital element of RDB2RDF. > > Is this an implementation detail? In a way. > > I think the choice to conflate these two steps in any given tool > which implements this standard is an implementation detail, which > will prove my point in short order once users can choose between > two tools -- one which forces all RDB schemas to map to Domain > ontologies; and one which maps the RDB schema to a Local ontology > with the option to further declare Local:x owl:sameAs Domain:y > > But I think the two steps *must not* be conflated in the standard, > because that makes the local-ontology-to-domain-ontology mapping > *mandatory*, and that is not acceptable to me, nor do I think it > is workable in the context of this (or any) WG, even one with a > delivery timeframe measured in decades. > > > >> and which I believe we are perilously close to deciding in the > >> worst possible way. > > > > What would be this worst possible option for the decision? > > That all RDB (and really, all) data must be mapped to a domain > ontology to be considered (worthwhile as) RDF. > > Consider Juan's Scenario #4, for instance... > > I have a bunch of data in an RDB, and I'm pretty sure it would > benefit *someone*'s analysis if it were available in RDF, so > I want to make it available as such. > > *I'm* not doing the analysis, so do I know what domain ontology/ies > it should be mapped to? Not a clue. > > Should the RDB2RDF standard *specify* ontology/ies to which all > RDB data should be mapped? Loudly and repeatedly, I say no. > > Rather, my publication should use a full "local" ontology -- which > simply maps table to class, column to attribute, cell content to > value, and primary/foreign key relationships to class relationships. > > Others may look at my data and see clear domain ontology mappings > which work for them, and which may work for others, and they should > be able to publish these mappings. Still others may look at my data > and see *different* but *no less clear* domain ontology mappings > which they want (and should be easily able) to use. > > If the RDB2RDF publication path forces the RDB data into domain > ontologies -- how can these last people *remap* the data, with their > new and different ontology correspondence? > > I am not saying that such immediate mapping is always inappropriate, > that a publisher cannot choose to say "this table in my schema is > and always will be foaf:person" -- but I am saying that the RDB2RDF > standard should not *force* the publisher to do so. > > Differently and possibly more explicitly put... > > I have a bunch of cartographic data in RDB. I discover DBpedia, > and think that ontology is the one I should map to. So I do. > > Sophisticated cartographic workers familiar with RDF will know > that there are other ontologies -- Freebase, Geonames, OpenCYC, > etc. -- which do a much better job in many ways. > > If my original data were mapped to a local ontology (say, > http://mymapdata.example.com/ontology/#), it would be very easy > for the sophisticated user to ignore my local-to-DBpedia mapping > (which is of course in its own named graph) and substitute their > own local-to-Geonames+CYC+Freebase+theirCartOntology. > > If my original data is transformed directly into DBpedia classes -- > there is no easy way to substitute the more sophisticated mapping. > > Is all of this such a giant problem if everyone is using backward- > chaining all the way to the RDB? No -- *if* the sophisticated user > can reach me and convince me to substitute their mapping for mine. > But that's not the most common pattern in play, and it's not likely > to become such soon, as much as I wish it would -- but I don't think > it should be forced on people either! > > The big problem comes when the RDB2RDF transform is materialized, > i.e., forward-chained, as is the most common pattern today, when > people want to get the RDF dump (or crawl the SPARQL endpoint) and > load it all in their local store, instead of issuing relevant queries > against the existing SPARQL endpoint. > > > Consider an example from Juan's Scenario #1, joining RDB to RDB. > > If my RDB2RDF mapping says -- > > mydb1.Contact foaf:person > mydb2.Customer foaf:person > > -- and I've replicated all my RDB data as RDF, and later discover > that Customer.name is actually filled with company names, while > Contact.name is people ... how do I fix that, short of dropping > and re-replicating with the new map? > > On the other hand, if my RDB2RDF mapping says -- > > mydb1.Contact ontology1:contact > mydb2.Customer ontology2:customer > > -- it's easy for me to have statements that say -- > > { ontology1:contact owl:sameAs ontology2:customer . } > { ontology1:contact owl:subClass foaf:person . } > { ontology2:customer owl:subClass foaf:person . } > > There's also no *need* to say foaf:person anywhere. There's no > need to know that FOAF exists at all. > > And if I make the same discovery -- I drop (or change) the > mapping triples, and I'm done. > > In both of these, consider the possibility of columns which do > not obviously or easily map to FOAF or any other known domain > ontology. With the first option, those columns are apparently > discarded or ignored. With the second, they are present, but > known only by their local identity, e.g., ontology:contact#widget, > until someone comes up with a new domainOntology:widget -- and > hey, presto! -- > > { ontology:contact#widget owl:sameAs domainOntology:widget . } > > > And once again -- there will be times and instances where it > *is* appropriate to *choose* to make the RDB schema absolutely > map to domain ontologies. > > The *ability to choose* is key. > > > I hope this has made my concerns clearer? > > Be seeing you, > > Ted > > > > -- > A: Yes. http://www.guckes.net/faq/attribution.html > | Q: Are you sure? > | | A: Because it reverses the logical flow of conversation. > | | | Q: Why is top posting frowned upon? > > Ted Thibodeau, Jr. // voice +1-781-273-0900 x32 > Evangelism & Support // mailto:tthibodeau@openlinksw.com > // http://twitter.com/TallTed > OpenLink Software, Inc. // http://www.openlinksw.com/ > 10 Burlington Mall Road, Suite 265, Burlington MA 01803 > http://www.openlinksw.com/weblogs/uda/ > OpenLink Blogs http://www.openlinksw.com/weblogs/virtuoso/ > http://www.openlinksw.com/blog/~kidehen/ > Universal Data Access and Virtual Database Technology Providers > > > > >
Received on Tuesday, 4 May 2010 00:57:04 UTC