Re: Role of the Ontology and Expressivity - to discuss on telcon from Eric Prud'hommeaux on 2010-05-04 (public-rdb2rdf-wg@w3.org from May 2010)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Tue, 4 May 2010 00:07:10 -0700
To: Harry Halpin <hhalpin@w3.org>
Cc: Ted Thibodeau Jr <tthibodeau@openlinksw.com>, ivan@w3.org, public-rdb2rdf-wg@w3.org
Message-ID: <20100504070709.GB4970@w3.org>
* Harry Halpin <hhalpin@w3.org> [2010-05-04 02:00+0100]
> Apologies for the late reply, myself and EricP have been recovering from
> the WWW2010 conference. While I only caught the last half of last's week
> telecon due to the conflicting talk, I did notice that this point seemed
> to derail our attempt at consensus on the use case and requirement
> document. Let me explain why I think while Ted's point makes sense, it's a
> possible we are just communicating at cross-purposes.

i'm happy to do a direct mapping. (Sandro sold me on this term for the graph dictated by the relational structure.)

> > Hi, Richard --
> >
> >
> > On Apr 27, 2010, at 04:49 PM, Richard Cyganiak wrote:
> >> I'm trying to understand your position.
> >
> > The effort is appreciated.
> >
> > I'm really not being obstructionist for its own sake.
> >
> >
> >> On 27 Apr 2010, at 10:31, Ted Thibodeau Jr wrote:
> >>> Bringing RDB into RDF requires only that the schema of that RDB
> >>> be mapped to a "direct" or "putative" ontology -- which *is* the
> >>> correct term.
> >>
> >> Well, the charter is unfortunately not very explicit about what it
> >> means to map a database to RDF.
> >
> > That is indeed unfortunate.  One of the hazards of saying "we'll do
> > *this* in *that* timeframe" is that the task doesn't always cooperate,
> > and it seems that the task of defining this WG's job was one such.
> 
> Charters are left vague on purpose, and the author of this charter was
> Ashok and Ivan. We can ask them to clarify directly. However, I was under
> the impression from our review of existing approaches that some kind of
> mapping language that did more than a direct mapping (i.e. dump rows and
> columns into RDF with no transformation would be necessary). As the straw
> poll from the last telecon shows, all attendees except Ted (and possibly
> Souri, who was not sure about the phrasing of the question) agreed that
> such functionality was needed.
> 
> >
> >
> >> Just to be explicit: Is your position that mapping to domain
> >> ontologies such as FOAF, GoodRelations etc is out of scope of the
> >> charter? Or is your position that it's merely not required to meet
> >> the success criteria set out in the charter?
> >
> > My position is that *forcing* RDB schemas to map to domain ontologies
> > is not necessary and will be counter-productive in the long run.
> >
> > My further position is that "blessing" any given domain ontologies,
> > and further still, defining how a given transformation tool should
> > determine whether a given RDB.table maps to RDF:Class is not only
> > well beyond the charter, but also, as Souri said, an *enormous* task.
> 
> I think this is a vocabulary mismatch. In particular, EricP's point was
> that the data could be directly transformed (which we all agree on) and
> then that further transforms could be necessary (i.e. done either using
> the SQL view approach or via a series of SPARQL constructs). Juan
> rephrased this direct transformation as "putative ontology" and then the
> "further transforms" as a transform to a "domain ontology".
> 
> However, this just gives authors of the R2ML file the ability to transform
> their graph, it does not bless any particular domain ontology (such as
> FOAF). The precise domain ontologies that the author wishes to map can be
> specified in any way by the author of the R2ML file.
> 
> My question (to database people in particular, such as Ashok, Ahmed, and
> Daniel) is that does the database community understand "domain/putative
> ontology" and use that terminology, or should we revert to "direct
> transforms and further transforms" word choice?

While I understand the derivation of "putative ontology" for what
Sandro calls the "direct graph". Some relational schemas can be
considered "widely accepted", at least within their domain. However,
I think "putative ontology" leads to lots of opportunities for
misunderstanding. Initially, the direct graph is a graph and the
putative ontology is a description of a graph, so we would have to say
"graph described by the putative ontology" to avoid a category error.

Many RDF folks think of "ontologies" as documents or conceptual models
with lots of owl: assertions, contrasted with "schemas" for the same
pair of things with rdfs: assertions.

RDB folks have talked about ontologies since 1993 [Gruber93]
(predating RDF by 5 years). This term has been used both as a
constituent of and an alternative to metadata repositories like ISO
11179.

Ultimately, I like to follow the advice of Guus Schreiber and avoid
"the 'o' word".

[Gruber93] http://tomgruber.org/writing/onto-design.htm


> > (This is also, I think, where most if not all of the expressivity
> > concerns come in.)
> >
> >
> >>> This ontology serves only to unambiguously identify a single
> >>> cell (table, column, row) within that schema.
> >>>
> >>> I suggest that then mapping that RDF into a "domain" ontology (e.g.,
> >>> SNOMED) is a separate concern -- which may be addressed in a couple
> >>> of ways --
> >>>
> >>> 1. replication with transformation
> >>> 2. mapping ontologies
> >>>
> >>>
> >>> The first means that you decide *once* how SNOMED corresponds to
> >>> a given RDB schema -- and if that correspondence changes, you have
> >>> to somehow discard all the triples that resulted from the original
> >>> conversion and then re-convert the RDB data.
> >>>
> >>> The second means that you decide how you think SNOMED corresponds
> >>> to your putative ontology, and create a "mapping" ontology --
> >>> which does little more than declare broaderClass, narrowerClass,
> >>> equivalentClass, sameAs, and such.  If you realize later that one
> >>> of your mappings is wrong, you change this ontology -- everything
> >>> else remains as it is.
> 
> I think this approach could be done with OWL as a separate step from R2ML,
> but if someone
> wanted to include some of these mappings into R2ML I would not be forbid
> them. However, I think no-one has brought that approach up per se. The
> question is whether or not that mapping should/can be done in SQL or via
> SPARQL constructs or (...) in R2ML. I think that expressivity requirement
> should remain in the use-cases, although we should be agnostic about what
> exact approach may be used. We should probably only require that some
> approach be allowed, i.e. a portable set of SQL for view-based
> transformation, or a set of SPARQL constructs. Anything beyond that (i.e.
> reasoning) we should probably allow that to be a separate step *after*
> R2ML deployment to get the direct mapping/putative ontlogy.
> 
> 
> >>>
> >>> Note that #2 does not mandate either forward- or backward-chaining.
> >>> You *can* work from #2 and replicate & transform, if you find that
> >>> works better for your deployment scenario.  You *can* use reasoning
> >>> engines to work entirely dynamically, if that works better for you.
> >>>
> >>> Note that #1 *does* mandate forward-chaining.  You *cannot* use
> >>> a reasoning engine to revise the putative-to-domain mapping once
> >>> replication & transformation has been done.
> >>
> >> I think I sort of agree with everything you said up to here.
> >
> > I think that's a good sign.
> >
> >
> >>> For this simple reason, I strongly advise that we *not* combine
> >>> putative-to-domain ontology mapping into the rdb2rdf scenario --
> >>> because it makes a decision which we haven't been chartered for,
> >>
> >> Here is where I lost you. Can you please say explicitly what that
> >> decision is?
> >
> > I expressed myself poorly there.  Re-expression a bit below...
> >
> > One decision would be "what domain ontology/ies do we bless?"
> 
> Again, blessing any local ontology is out-of-scope, although what is in
> scope is a way of making identifiers re-usable, which I imagine will
> easily be some kind of API call that given a string identifier returns a
> possible linked data URI.
> 
> >
> > Another is implied -- "if you don't map your RDB data to Domain
> > ontologies, you aren't really exposing it as RDF."
> >
> > Do we want to declare a *method*, a *language*, a *syntax* for
> > such local-ontology-to-domain-ontology mapping?  That's fine,
> > but I think what we deliver should be focused on "how do I define
> > the mapping?" (perhaps something similar to GRDDL?) and not get
> > into "how do I determine what the mapping should be?"
> 
> If such a mapping is done in SQL or SPARQL, which I imagine R2ML will
> require database vendors to support, I see no harm done and a lot to be
> gained. Again, reasoning i.e. even very simple use of OWL, may be too
> much.
> 
> >
> > But -- this local-ontology-to-domain-ontology mapping is a
> > *second step*, which comes *after* the RDB schema is mapped to
> > a putative/local ontology, and which, I believe, SHOULD generally
> > come after the RDB data is transformed to RDF with that same local
> > ontology (if indeed the RDB data is being replicated/transformed
> > at all) -- and yes, I believe this is and should be OPTIONAL.
> >
> 
> Of course it should be OPTIONAL, but people in the Working Group have made
> strong cases for allowing the R2ML file to contain either SQL or SPARQL
> constructs. Of course, using those statements can be OPTIONAL, but the
> spec should probably mandate that allowing some simple transforms be
> REQUIRED.
> 
> > So, re-expression --
> >
> > I strongly advise that we not *conflate* putative-ontology-to-
> > domain-ontology mapping with RDB-schema-to-putative-ontology
> > mapping.  The putative ontology is a vital element of RDB2RDF.
> >
> > Is this an implementation detail?  In a way.
> >
> > I think the choice to conflate these two steps in any given tool
> > which implements this standard is an implementation detail, which
> > will prove my point in short order once users can choose between
> > two tools -- one which forces all RDB schemas to map to Domain
> > ontologies; and one which maps the RDB schema to a Local ontology
> > with the option to further declare Local:x owl:sameAs Domain:y
> >
> 
> But remember many R2ML clients may not support OWL and/or any mappings
> outside SPARQL/SQL. We should give them enough power to do a reasonable
> mapping job without using external reasoners.
> 
> > But I think the two steps *must not* be conflated in the standard,
> > because that makes the local-ontology-to-domain-ontology mapping
> > *mandatory*, and that is not acceptable to me, nor do I think it
> > is workable in the context of this (or any) WG, even one with a
> > delivery timeframe measured in decades.
> 
> We can keep these two steps *separate* in the standard, but I think it
> would be a good idea to include elementary data transformation
> capabilities using SPARQL and SQL in R2ML.
> 
> Again, I think Ted - we're not requiring specific mappings, but a baseline
> way to make more complex mappings if needed. I imagine everyone
> implementing R2ML will also implement SQL and SPARQL obviously, so those
> seem reasonable to me.
> 
> That's a basic portability requirement I think, which is exactly why a
> standard is needed in this area, so people can make a real-world mapping
> between relational and RDF data, and then move databases as needed without
> having vendor lock-in.
> 
> 
> >
> >
> >>> and which I believe we are perilously close to deciding in the
> >>> worst possible way.
> >>
> >> What would be this worst possible option for the decision?
> >
> > That all RDB (and really, all) data must be mapped to a domain
> > ontology to be considered (worthwhile as) RDF.
> >
> 
> Of course this is true and such RDF would be worthwhile.
> 
> > Consider Juan's Scenario #4, for instance...
> >
> > I have a bunch of data in an RDB, and I'm pretty sure it would
> > benefit *someone*'s analysis if it were available in RDF, so
> > I want to make it available as such.
> >
> > *I'm* not doing the analysis, so do I know what domain ontology/ies
> > it should be mapped to?  Not a clue.
> >
> 
> No-one is requiring it be mapped to particular ontologies, but the
> database vendor may want it exposed using a particular ontology, which
> they should have the freedom to specify in their R2ML file.
> 
> > Should the RDB2RDF standard *specify* ontology/ies to which all
> > RDB data should be mapped?  Loudly and repeatedly, I say no.
> >
> > Rather, my publication should use a full "local" ontology -- which
> > simply maps table to class, column to attribute, cell content to
> > value, and primary/foreign key relationships to class relationships.
> >
> 
> This is one way to do it, but others may view that data as a mess in RDF,
> as often relational data as a raw dump into RDF is.
> 
> > Others may look at my data and see clear domain ontology mappings
> > which work for them, and which may work for others, and they should
> > be able to publish these mappings.  Still others may look at my data
> > and see *different* but *no less clear* domain ontology mappings
> > which they want (and should be easily able) to use.
> >
> 
> Again, saying that further expressivity could be necessary does not
> require that further mappings not be made.
> 
> > If the RDB2RDF publication path forces the RDB data into domain
> > ontologies -- how can these last people *remap* the data, with their
> > new and different ontology correspondence?
> 
> They can remap from the domain ontologies obviously, using whatever
> technique they want.
> 
> >
> > I am not saying that such immediate mapping is always inappropriate,
> > that a publisher cannot choose to say "this table in my schema is
> > and always will be foaf:person" -- but I am saying that the RDB2RDF
> > standard should not *force* the publisher to do so.
> 
> It would not force them to, but give them the capability to make such a
> mapping if they so chose. They could also just do a direct dump. Either
> are fine with me, but I'm pro-giving the users of R2ML a bit more power
> than restricting them to direct dumps to RDF.
> 
> >
> > Differently and possibly more explicitly put...
> >
> > I have a bunch of cartographic data in RDB.  I discover DBpedia,
> > and think that ontology is the one I should map to.  So I do.
> >
> > Sophisticated cartographic workers familiar with RDF will know
> > that there are other ontologies -- Freebase, Geonames, OpenCYC,
> > etc. -- which do a much better job in many ways.
> >
> > If my original data were mapped to a local ontology (say,
> > http://mymapdata.example.com/ontology/#), it would be very easy
> > for the sophisticated user to ignore my local-to-DBpedia mapping
> > (which is of course in its own named graph) and substitute their
> > own local-to-Geonames+CYC+Freebase+theirCartOntology.
> >
> > If my original data is transformed directly into DBpedia classes --
> > there is no easy way to substitute the more sophisticated mapping.
> >
> > Is all of this such a giant problem if everyone is using backward-
> > chaining all the way to the RDB?  No -- *if* the sophisticated user
> > can reach me and convince me to substitute their mapping for mine.
> > But that's not the most common pattern in play, and it's not likely
> > to become such soon, as much as I wish it would -- but I don't think
> > it should be forced on people either!
> >
> > The big problem comes when the RDB2RDF transform is materialized,
> > i.e., forward-chained, as is the most common pattern today, when
> > people want to get the RDF dump (or crawl the SPARQL endpoint) and
> > load it all in their local store, instead of issuing relevant queries
> > against the existing SPARQL endpoint.
> >
> >
> > Consider an example from Juan's Scenario #1, joining RDB to RDB.
> >
> > If my RDB2RDF mapping says --
> >
> >    mydb1.Contact     foaf:person
> >    mydb2.Customer    foaf:person
> >
> > -- and I've replicated all my RDB data as RDF, and later discover
> > that Customer.name is actually filled with company names, while
> > Contact.name is people ... how do I fix that, short of dropping
> > and re-replicating with the new map?
> >
> > On the other hand, if my RDB2RDF mapping says --
> >
> >    mydb1.Contact    ontology1:contact
> >    mydb2.Customer   ontology2:customer
> >
> > -- it's easy for me to have statements that say --
> >
> >    { ontology1:contact   owl:sameAs    ontology2:customer  . }
> >    { ontology1:contact   owl:subClass  foaf:person         . }
> >    { ontology2:customer  owl:subClass  foaf:person         . }
> >
> > There's also no *need* to say foaf:person anywhere.  There's no
> > need to know that FOAF exists at all.
> >
> > And if I make the same discovery -- I drop (or change) the
> > mapping triples, and I'm done.
> >
> > In both of these, consider the possibility of columns which do
> > not obviously or easily map to FOAF or any other known domain
> > ontology.  With the first option, those columns are apparently
> > discarded or ignored.  With the second, they are present, but
> > known only by their local identity, e.g., ontology:contact#widget,
> > until someone comes up with a new domainOntology:widget -- and
> > hey, presto! --
> >
> >    { ontology:contact#widget  owl:sameAs  domainOntology:widget . }
> >
> >
> > And once again -- there will be times and instances where it
> > *is* appropriate to *choose* to make the RDB schema absolutely
> > map to domain ontologies.
> >
> > The *ability to choose* is key.
> 
> Yes, that's fine. But the ability to not force the users of the data to do
> unnecessary work when the a simple and preferred mapping is known by the
> database owner is not necessarily a bad thing, but actually a good thing I
> think.
> 
> >
> >
> > I hope this has made my concerns clearer?
> >
> > Be seeing you,
> >
> > Ted
> >
> >
> >
> > --
> > A: Yes.                      http://www.guckes.net/faq/attribution.html
> > | Q: Are you sure?
> > | | A: Because it reverses the logical flow of conversation.
> > | | | Q: Why is top posting frowned upon?
> >
> > Ted Thibodeau, Jr.           //               voice +1-781-273-0900 x32
> > Evangelism & Support         //        mailto:tthibodeau@openlinksw.com
> >                              //              http://twitter.com/TallTed
> > OpenLink Software, Inc.      //              http://www.openlinksw.com/
> >         10 Burlington Mall Road, Suite 265, Burlington MA 01803
> >                                  http://www.openlinksw.com/weblogs/uda/
> > OpenLink Blogs              http://www.openlinksw.com/weblogs/virtuoso/
> >                                http://www.openlinksw.com/blog/~kidehen/
> >     Universal Data Access and Virtual Database Technology Providers
> >
> >
> >
> >
> >
> >
> 
> 

-- 
-ericP
Received on Tuesday, 4 May 2010 07:07:46 UTC