- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Tue, 4 May 2010 00:07:10 -0700
- To: Harry Halpin <hhalpin@w3.org>
- Cc: Ted Thibodeau Jr <tthibodeau@openlinksw.com>, ivan@w3.org, public-rdb2rdf-wg@w3.org
* Harry Halpin <hhalpin@w3.org> [2010-05-04 02:00+0100] > Apologies for the late reply, myself and EricP have been recovering from > the WWW2010 conference. While I only caught the last half of last's week > telecon due to the conflicting talk, I did notice that this point seemed > to derail our attempt at consensus on the use case and requirement > document. Let me explain why I think while Ted's point makes sense, it's a > possible we are just communicating at cross-purposes. i'm happy to do a direct mapping. (Sandro sold me on this term for the graph dictated by the relational structure.) > > Hi, Richard -- > > > > > > On Apr 27, 2010, at 04:49 PM, Richard Cyganiak wrote: > >> I'm trying to understand your position. > > > > The effort is appreciated. > > > > I'm really not being obstructionist for its own sake. > > > > > >> On 27 Apr 2010, at 10:31, Ted Thibodeau Jr wrote: > >>> Bringing RDB into RDF requires only that the schema of that RDB > >>> be mapped to a "direct" or "putative" ontology -- which *is* the > >>> correct term. > >> > >> Well, the charter is unfortunately not very explicit about what it > >> means to map a database to RDF. > > > > That is indeed unfortunate. One of the hazards of saying "we'll do > > *this* in *that* timeframe" is that the task doesn't always cooperate, > > and it seems that the task of defining this WG's job was one such. > > Charters are left vague on purpose, and the author of this charter was > Ashok and Ivan. We can ask them to clarify directly. However, I was under > the impression from our review of existing approaches that some kind of > mapping language that did more than a direct mapping (i.e. dump rows and > columns into RDF with no transformation would be necessary). As the straw > poll from the last telecon shows, all attendees except Ted (and possibly > Souri, who was not sure about the phrasing of the question) agreed that > such functionality was needed. > > > > > > >> Just to be explicit: Is your position that mapping to domain > >> ontologies such as FOAF, GoodRelations etc is out of scope of the > >> charter? Or is your position that it's merely not required to meet > >> the success criteria set out in the charter? > > > > My position is that *forcing* RDB schemas to map to domain ontologies > > is not necessary and will be counter-productive in the long run. > > > > My further position is that "blessing" any given domain ontologies, > > and further still, defining how a given transformation tool should > > determine whether a given RDB.table maps to RDF:Class is not only > > well beyond the charter, but also, as Souri said, an *enormous* task. > > I think this is a vocabulary mismatch. In particular, EricP's point was > that the data could be directly transformed (which we all agree on) and > then that further transforms could be necessary (i.e. done either using > the SQL view approach or via a series of SPARQL constructs). Juan > rephrased this direct transformation as "putative ontology" and then the > "further transforms" as a transform to a "domain ontology". > > However, this just gives authors of the R2ML file the ability to transform > their graph, it does not bless any particular domain ontology (such as > FOAF). The precise domain ontologies that the author wishes to map can be > specified in any way by the author of the R2ML file. > > My question (to database people in particular, such as Ashok, Ahmed, and > Daniel) is that does the database community understand "domain/putative > ontology" and use that terminology, or should we revert to "direct > transforms and further transforms" word choice? While I understand the derivation of "putative ontology" for what Sandro calls the "direct graph". Some relational schemas can be considered "widely accepted", at least within their domain. However, I think "putative ontology" leads to lots of opportunities for misunderstanding. Initially, the direct graph is a graph and the putative ontology is a description of a graph, so we would have to say "graph described by the putative ontology" to avoid a category error. Many RDF folks think of "ontologies" as documents or conceptual models with lots of owl: assertions, contrasted with "schemas" for the same pair of things with rdfs: assertions. RDB folks have talked about ontologies since 1993 [Gruber93] (predating RDF by 5 years). This term has been used both as a constituent of and an alternative to metadata repositories like ISO 11179. Ultimately, I like to follow the advice of Guus Schreiber and avoid "the 'o' word". [Gruber93] http://tomgruber.org/writing/onto-design.htm > > (This is also, I think, where most if not all of the expressivity > > concerns come in.) > > > > > >>> This ontology serves only to unambiguously identify a single > >>> cell (table, column, row) within that schema. > >>> > >>> I suggest that then mapping that RDF into a "domain" ontology (e.g., > >>> SNOMED) is a separate concern -- which may be addressed in a couple > >>> of ways -- > >>> > >>> 1. replication with transformation > >>> 2. mapping ontologies > >>> > >>> > >>> The first means that you decide *once* how SNOMED corresponds to > >>> a given RDB schema -- and if that correspondence changes, you have > >>> to somehow discard all the triples that resulted from the original > >>> conversion and then re-convert the RDB data. > >>> > >>> The second means that you decide how you think SNOMED corresponds > >>> to your putative ontology, and create a "mapping" ontology -- > >>> which does little more than declare broaderClass, narrowerClass, > >>> equivalentClass, sameAs, and such. If you realize later that one > >>> of your mappings is wrong, you change this ontology -- everything > >>> else remains as it is. > > I think this approach could be done with OWL as a separate step from R2ML, > but if someone > wanted to include some of these mappings into R2ML I would not be forbid > them. However, I think no-one has brought that approach up per se. The > question is whether or not that mapping should/can be done in SQL or via > SPARQL constructs or (...) in R2ML. I think that expressivity requirement > should remain in the use-cases, although we should be agnostic about what > exact approach may be used. We should probably only require that some > approach be allowed, i.e. a portable set of SQL for view-based > transformation, or a set of SPARQL constructs. Anything beyond that (i.e. > reasoning) we should probably allow that to be a separate step *after* > R2ML deployment to get the direct mapping/putative ontlogy. > > > >>> > >>> Note that #2 does not mandate either forward- or backward-chaining. > >>> You *can* work from #2 and replicate & transform, if you find that > >>> works better for your deployment scenario. You *can* use reasoning > >>> engines to work entirely dynamically, if that works better for you. > >>> > >>> Note that #1 *does* mandate forward-chaining. You *cannot* use > >>> a reasoning engine to revise the putative-to-domain mapping once > >>> replication & transformation has been done. > >> > >> I think I sort of agree with everything you said up to here. > > > > I think that's a good sign. > > > > > >>> For this simple reason, I strongly advise that we *not* combine > >>> putative-to-domain ontology mapping into the rdb2rdf scenario -- > >>> because it makes a decision which we haven't been chartered for, > >> > >> Here is where I lost you. Can you please say explicitly what that > >> decision is? > > > > I expressed myself poorly there. Re-expression a bit below... > > > > One decision would be "what domain ontology/ies do we bless?" > > Again, blessing any local ontology is out-of-scope, although what is in > scope is a way of making identifiers re-usable, which I imagine will > easily be some kind of API call that given a string identifier returns a > possible linked data URI. > > > > > Another is implied -- "if you don't map your RDB data to Domain > > ontologies, you aren't really exposing it as RDF." > > > > Do we want to declare a *method*, a *language*, a *syntax* for > > such local-ontology-to-domain-ontology mapping? That's fine, > > but I think what we deliver should be focused on "how do I define > > the mapping?" (perhaps something similar to GRDDL?) and not get > > into "how do I determine what the mapping should be?" > > If such a mapping is done in SQL or SPARQL, which I imagine R2ML will > require database vendors to support, I see no harm done and a lot to be > gained. Again, reasoning i.e. even very simple use of OWL, may be too > much. > > > > > But -- this local-ontology-to-domain-ontology mapping is a > > *second step*, which comes *after* the RDB schema is mapped to > > a putative/local ontology, and which, I believe, SHOULD generally > > come after the RDB data is transformed to RDF with that same local > > ontology (if indeed the RDB data is being replicated/transformed > > at all) -- and yes, I believe this is and should be OPTIONAL. > > > > Of course it should be OPTIONAL, but people in the Working Group have made > strong cases for allowing the R2ML file to contain either SQL or SPARQL > constructs. Of course, using those statements can be OPTIONAL, but the > spec should probably mandate that allowing some simple transforms be > REQUIRED. > > > So, re-expression -- > > > > I strongly advise that we not *conflate* putative-ontology-to- > > domain-ontology mapping with RDB-schema-to-putative-ontology > > mapping. The putative ontology is a vital element of RDB2RDF. > > > > Is this an implementation detail? In a way. > > > > I think the choice to conflate these two steps in any given tool > > which implements this standard is an implementation detail, which > > will prove my point in short order once users can choose between > > two tools -- one which forces all RDB schemas to map to Domain > > ontologies; and one which maps the RDB schema to a Local ontology > > with the option to further declare Local:x owl:sameAs Domain:y > > > > But remember many R2ML clients may not support OWL and/or any mappings > outside SPARQL/SQL. We should give them enough power to do a reasonable > mapping job without using external reasoners. > > > But I think the two steps *must not* be conflated in the standard, > > because that makes the local-ontology-to-domain-ontology mapping > > *mandatory*, and that is not acceptable to me, nor do I think it > > is workable in the context of this (or any) WG, even one with a > > delivery timeframe measured in decades. > > We can keep these two steps *separate* in the standard, but I think it > would be a good idea to include elementary data transformation > capabilities using SPARQL and SQL in R2ML. > > Again, I think Ted - we're not requiring specific mappings, but a baseline > way to make more complex mappings if needed. I imagine everyone > implementing R2ML will also implement SQL and SPARQL obviously, so those > seem reasonable to me. > > That's a basic portability requirement I think, which is exactly why a > standard is needed in this area, so people can make a real-world mapping > between relational and RDF data, and then move databases as needed without > having vendor lock-in. > > > > > > > >>> and which I believe we are perilously close to deciding in the > >>> worst possible way. > >> > >> What would be this worst possible option for the decision? > > > > That all RDB (and really, all) data must be mapped to a domain > > ontology to be considered (worthwhile as) RDF. > > > > Of course this is true and such RDF would be worthwhile. > > > Consider Juan's Scenario #4, for instance... > > > > I have a bunch of data in an RDB, and I'm pretty sure it would > > benefit *someone*'s analysis if it were available in RDF, so > > I want to make it available as such. > > > > *I'm* not doing the analysis, so do I know what domain ontology/ies > > it should be mapped to? Not a clue. > > > > No-one is requiring it be mapped to particular ontologies, but the > database vendor may want it exposed using a particular ontology, which > they should have the freedom to specify in their R2ML file. > > > Should the RDB2RDF standard *specify* ontology/ies to which all > > RDB data should be mapped? Loudly and repeatedly, I say no. > > > > Rather, my publication should use a full "local" ontology -- which > > simply maps table to class, column to attribute, cell content to > > value, and primary/foreign key relationships to class relationships. > > > > This is one way to do it, but others may view that data as a mess in RDF, > as often relational data as a raw dump into RDF is. > > > Others may look at my data and see clear domain ontology mappings > > which work for them, and which may work for others, and they should > > be able to publish these mappings. Still others may look at my data > > and see *different* but *no less clear* domain ontology mappings > > which they want (and should be easily able) to use. > > > > Again, saying that further expressivity could be necessary does not > require that further mappings not be made. > > > If the RDB2RDF publication path forces the RDB data into domain > > ontologies -- how can these last people *remap* the data, with their > > new and different ontology correspondence? > > They can remap from the domain ontologies obviously, using whatever > technique they want. > > > > > I am not saying that such immediate mapping is always inappropriate, > > that a publisher cannot choose to say "this table in my schema is > > and always will be foaf:person" -- but I am saying that the RDB2RDF > > standard should not *force* the publisher to do so. > > It would not force them to, but give them the capability to make such a > mapping if they so chose. They could also just do a direct dump. Either > are fine with me, but I'm pro-giving the users of R2ML a bit more power > than restricting them to direct dumps to RDF. > > > > > Differently and possibly more explicitly put... > > > > I have a bunch of cartographic data in RDB. I discover DBpedia, > > and think that ontology is the one I should map to. So I do. > > > > Sophisticated cartographic workers familiar with RDF will know > > that there are other ontologies -- Freebase, Geonames, OpenCYC, > > etc. -- which do a much better job in many ways. > > > > If my original data were mapped to a local ontology (say, > > http://mymapdata.example.com/ontology/#), it would be very easy > > for the sophisticated user to ignore my local-to-DBpedia mapping > > (which is of course in its own named graph) and substitute their > > own local-to-Geonames+CYC+Freebase+theirCartOntology. > > > > If my original data is transformed directly into DBpedia classes -- > > there is no easy way to substitute the more sophisticated mapping. > > > > Is all of this such a giant problem if everyone is using backward- > > chaining all the way to the RDB? No -- *if* the sophisticated user > > can reach me and convince me to substitute their mapping for mine. > > But that's not the most common pattern in play, and it's not likely > > to become such soon, as much as I wish it would -- but I don't think > > it should be forced on people either! > > > > The big problem comes when the RDB2RDF transform is materialized, > > i.e., forward-chained, as is the most common pattern today, when > > people want to get the RDF dump (or crawl the SPARQL endpoint) and > > load it all in their local store, instead of issuing relevant queries > > against the existing SPARQL endpoint. > > > > > > Consider an example from Juan's Scenario #1, joining RDB to RDB. > > > > If my RDB2RDF mapping says -- > > > > mydb1.Contact foaf:person > > mydb2.Customer foaf:person > > > > -- and I've replicated all my RDB data as RDF, and later discover > > that Customer.name is actually filled with company names, while > > Contact.name is people ... how do I fix that, short of dropping > > and re-replicating with the new map? > > > > On the other hand, if my RDB2RDF mapping says -- > > > > mydb1.Contact ontology1:contact > > mydb2.Customer ontology2:customer > > > > -- it's easy for me to have statements that say -- > > > > { ontology1:contact owl:sameAs ontology2:customer . } > > { ontology1:contact owl:subClass foaf:person . } > > { ontology2:customer owl:subClass foaf:person . } > > > > There's also no *need* to say foaf:person anywhere. There's no > > need to know that FOAF exists at all. > > > > And if I make the same discovery -- I drop (or change) the > > mapping triples, and I'm done. > > > > In both of these, consider the possibility of columns which do > > not obviously or easily map to FOAF or any other known domain > > ontology. With the first option, those columns are apparently > > discarded or ignored. With the second, they are present, but > > known only by their local identity, e.g., ontology:contact#widget, > > until someone comes up with a new domainOntology:widget -- and > > hey, presto! -- > > > > { ontology:contact#widget owl:sameAs domainOntology:widget . } > > > > > > And once again -- there will be times and instances where it > > *is* appropriate to *choose* to make the RDB schema absolutely > > map to domain ontologies. > > > > The *ability to choose* is key. > > Yes, that's fine. But the ability to not force the users of the data to do > unnecessary work when the a simple and preferred mapping is known by the > database owner is not necessarily a bad thing, but actually a good thing I > think. > > > > > > > I hope this has made my concerns clearer? > > > > Be seeing you, > > > > Ted > > > > > > > > -- > > A: Yes. http://www.guckes.net/faq/attribution.html > > | Q: Are you sure? > > | | A: Because it reverses the logical flow of conversation. > > | | | Q: Why is top posting frowned upon? > > > > Ted Thibodeau, Jr. // voice +1-781-273-0900 x32 > > Evangelism & Support // mailto:tthibodeau@openlinksw.com > > // http://twitter.com/TallTed > > OpenLink Software, Inc. // http://www.openlinksw.com/ > > 10 Burlington Mall Road, Suite 265, Burlington MA 01803 > > http://www.openlinksw.com/weblogs/uda/ > > OpenLink Blogs http://www.openlinksw.com/weblogs/virtuoso/ > > http://www.openlinksw.com/blog/~kidehen/ > > Universal Data Access and Virtual Database Technology Providers > > > > > > > > > > > > > > -- -ericP
Received on Tuesday, 4 May 2010 07:07:46 UTC