Re: Role of the Ontology and Expressivity - to discuss on telcon from Juan Sequeda on 2010-05-04 (public-rdb2rdf-wg@w3.org from May 2010)

From: Juan Sequeda <juanfederico@gmail.com>
Date: Mon, 3 May 2010 19:56:29 -0500
To: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Cc: Richard Cyganiak <richard@cyganiak.de>, "Eric Prud'hommeaux" <eric@w3.org>, public-rdb2rdf-wg@w3.org
Message-ID: <n2sf914914c1005031756i44871400x1e9b65b75074bc44@mail.gmail.com>
Nobody has commented on Ted's post in the last 6 days.

What is going to happen on tomorrow's call? We do need to get this UC
document out. How are we going to proceed?

Juan Sequeda
+1-575-SEQ-UEDA
www.juansequeda.com


On Wed, Apr 28, 2010 at 10:36 AM, Ted Thibodeau Jr <
tthibodeau@openlinksw.com> wrote:

> Hi, Richard --
>
>
> On Apr 27, 2010, at 04:49 PM, Richard Cyganiak wrote:
> > I'm trying to understand your position.
>
> The effort is appreciated.
>
> I'm really not being obstructionist for its own sake.
>
>
> > On 27 Apr 2010, at 10:31, Ted Thibodeau Jr wrote:
> >> Bringing RDB into RDF requires only that the schema of that RDB
> >> be mapped to a "direct" or "putative" ontology -- which *is* the
> >> correct term.
> >
> > Well, the charter is unfortunately not very explicit about what it
> > means to map a database to RDF.
>
> That is indeed unfortunate.  One of the hazards of saying "we'll do
> *this* in *that* timeframe" is that the task doesn't always cooperate,
> and it seems that the task of defining this WG's job was one such.
>
>
> > Just to be explicit: Is your position that mapping to domain
> > ontologies such as FOAF, GoodRelations etc is out of scope of the
> > charter? Or is your position that it's merely not required to meet
> > the success criteria set out in the charter?
>
> My position is that *forcing* RDB schemas to map to domain ontologies
> is not necessary and will be counter-productive in the long run.
>
> My further position is that "blessing" any given domain ontologies,
> and further still, defining how a given transformation tool should
> determine whether a given RDB.table maps to RDF:Class is not only
> well beyond the charter, but also, as Souri said, an *enormous* task.
>
> (This is also, I think, where most if not all of the expressivity
> concerns come in.)
>
>
> >> This ontology serves only to unambiguously identify a single
> >> cell (table, column, row) within that schema.
> >>
> >> I suggest that then mapping that RDF into a "domain" ontology (e.g.,
> >> SNOMED) is a separate concern -- which may be addressed in a couple
> >> of ways --
> >>
> >> 1. replication with transformation
> >> 2. mapping ontologies
> >>
> >>
> >> The first means that you decide *once* how SNOMED corresponds to
> >> a given RDB schema -- and if that correspondence changes, you have
> >> to somehow discard all the triples that resulted from the original
> >> conversion and then re-convert the RDB data.
> >>
> >> The second means that you decide how you think SNOMED corresponds
> >> to your putative ontology, and create a "mapping" ontology --
> >> which does little more than declare broaderClass, narrowerClass,
> >> equivalentClass, sameAs, and such.  If you realize later that one
> >> of your mappings is wrong, you change this ontology -- everything
> >> else remains as it is.
> >>
> >> Note that #2 does not mandate either forward- or backward-chaining.
> >> You *can* work from #2 and replicate & transform, if you find that
> >> works better for your deployment scenario.  You *can* use reasoning
> >> engines to work entirely dynamically, if that works better for you.
> >>
> >> Note that #1 *does* mandate forward-chaining.  You *cannot* use
> >> a reasoning engine to revise the putative-to-domain mapping once
> >> replication & transformation has been done.
> >
> > I think I sort of agree with everything you said up to here.
>
> I think that's a good sign.
>
>
> >> For this simple reason, I strongly advise that we *not* combine
> >> putative-to-domain ontology mapping into the rdb2rdf scenario --
> >> because it makes a decision which we haven't been chartered for,
> >
> > Here is where I lost you. Can you please say explicitly what that
> > decision is?
>
> I expressed myself poorly there.  Re-expression a bit below...
>
> One decision would be "what domain ontology/ies do we bless?"
>
> Another is implied -- "if you don't map your RDB data to Domain
> ontologies, you aren't really exposing it as RDF."
>
> Do we want to declare a *method*, a *language*, a *syntax* for
> such local-ontology-to-domain-ontology mapping?  That's fine,
> but I think what we deliver should be focused on "how do I define
> the mapping?" (perhaps something similar to GRDDL?) and not get
> into "how do I determine what the mapping should be?"
>
> But -- this local-ontology-to-domain-ontology mapping is a
> *second step*, which comes *after* the RDB schema is mapped to
> a putative/local ontology, and which, I believe, SHOULD generally
> come after the RDB data is transformed to RDF with that same local
> ontology (if indeed the RDB data is being replicated/transformed
> at all) -- and yes, I believe this is and should be OPTIONAL.
>
> So, re-expression --
>
> I strongly advise that we not *conflate* putative-ontology-to-
> domain-ontology mapping with RDB-schema-to-putative-ontology
> mapping.  The putative ontology is a vital element of RDB2RDF.
>
> Is this an implementation detail?  In a way.
>
> I think the choice to conflate these two steps in any given tool
> which implements this standard is an implementation detail, which
> will prove my point in short order once users can choose between
> two tools -- one which forces all RDB schemas to map to Domain
> ontologies; and one which maps the RDB schema to a Local ontology
> with the option to further declare Local:x owl:sameAs Domain:y
>
> But I think the two steps *must not* be conflated in the standard,
> because that makes the local-ontology-to-domain-ontology mapping
> *mandatory*, and that is not acceptable to me, nor do I think it
> is workable in the context of this (or any) WG, even one with a
> delivery timeframe measured in decades.
>
>
> >> and which I believe we are perilously close to deciding in the
> >> worst possible way.
> >
> > What would be this worst possible option for the decision?
>
> That all RDB (and really, all) data must be mapped to a domain
> ontology to be considered (worthwhile as) RDF.
>
> Consider Juan's Scenario #4, for instance...
>
> I have a bunch of data in an RDB, and I'm pretty sure it would
> benefit *someone*'s analysis if it were available in RDF, so
> I want to make it available as such.
>
> *I'm* not doing the analysis, so do I know what domain ontology/ies
> it should be mapped to?  Not a clue.
>
> Should the RDB2RDF standard *specify* ontology/ies to which all
> RDB data should be mapped?  Loudly and repeatedly, I say no.
>
> Rather, my publication should use a full "local" ontology -- which
> simply maps table to class, column to attribute, cell content to
> value, and primary/foreign key relationships to class relationships.
>
> Others may look at my data and see clear domain ontology mappings
> which work for them, and which may work for others, and they should
> be able to publish these mappings.  Still others may look at my data
> and see *different* but *no less clear* domain ontology mappings
> which they want (and should be easily able) to use.
>
> If the RDB2RDF publication path forces the RDB data into domain
> ontologies -- how can these last people *remap* the data, with their
> new and different ontology correspondence?
>
> I am not saying that such immediate mapping is always inappropriate,
> that a publisher cannot choose to say "this table in my schema is
> and always will be foaf:person" -- but I am saying that the RDB2RDF
> standard should not *force* the publisher to do so.
>
> Differently and possibly more explicitly put...
>
> I have a bunch of cartographic data in RDB.  I discover DBpedia,
> and think that ontology is the one I should map to.  So I do.
>
> Sophisticated cartographic workers familiar with RDF will know
> that there are other ontologies -- Freebase, Geonames, OpenCYC,
> etc. -- which do a much better job in many ways.
>
> If my original data were mapped to a local ontology (say,
> http://mymapdata.example.com/ontology/#), it would be very easy
> for the sophisticated user to ignore my local-to-DBpedia mapping
> (which is of course in its own named graph) and substitute their
> own local-to-Geonames+CYC+Freebase+theirCartOntology.
>
> If my original data is transformed directly into DBpedia classes --
> there is no easy way to substitute the more sophisticated mapping.
>
> Is all of this such a giant problem if everyone is using backward-
> chaining all the way to the RDB?  No -- *if* the sophisticated user
> can reach me and convince me to substitute their mapping for mine.
> But that's not the most common pattern in play, and it's not likely
> to become such soon, as much as I wish it would -- but I don't think
> it should be forced on people either!
>
> The big problem comes when the RDB2RDF transform is materialized,
> i.e., forward-chained, as is the most common pattern today, when
> people want to get the RDF dump (or crawl the SPARQL endpoint) and
> load it all in their local store, instead of issuing relevant queries
> against the existing SPARQL endpoint.
>
>
> Consider an example from Juan's Scenario #1, joining RDB to RDB.
>
> If my RDB2RDF mapping says --
>
>   mydb1.Contact     foaf:person
>   mydb2.Customer    foaf:person
>
> -- and I've replicated all my RDB data as RDF, and later discover
> that Customer.name is actually filled with company names, while
> Contact.name is people ... how do I fix that, short of dropping
> and re-replicating with the new map?
>
> On the other hand, if my RDB2RDF mapping says --
>
>   mydb1.Contact    ontology1:contact
>   mydb2.Customer   ontology2:customer
>
> -- it's easy for me to have statements that say --
>
>   { ontology1:contact   owl:sameAs    ontology2:customer  . }
>   { ontology1:contact   owl:subClass  foaf:person         . }
>   { ontology2:customer  owl:subClass  foaf:person         . }
>
> There's also no *need* to say foaf:person anywhere.  There's no
> need to know that FOAF exists at all.
>
> And if I make the same discovery -- I drop (or change) the
> mapping triples, and I'm done.
>
> In both of these, consider the possibility of columns which do
> not obviously or easily map to FOAF or any other known domain
> ontology.  With the first option, those columns are apparently
> discarded or ignored.  With the second, they are present, but
> known only by their local identity, e.g., ontology:contact#widget,
> until someone comes up with a new domainOntology:widget -- and
> hey, presto! --
>
>   { ontology:contact#widget  owl:sameAs  domainOntology:widget . }
>
>
> And once again -- there will be times and instances where it
> *is* appropriate to *choose* to make the RDB schema absolutely
> map to domain ontologies.
>
> The *ability to choose* is key.
>
>
> I hope this has made my concerns clearer?
>
> Be seeing you,
>
> Ted
>
>
>
> --
> A: Yes.                      http://www.guckes.net/faq/attribution.html
> | Q: Are you sure?
> | | A: Because it reverses the logical flow of conversation.
> | | | Q: Why is top posting frowned upon?
>
> Ted Thibodeau, Jr.           //               voice +1-781-273-0900 x32
> Evangelism & Support         //        mailto:tthibodeau@openlinksw.com
>                             //              http://twitter.com/TallTed
> OpenLink Software, Inc.      //              http://www.openlinksw.com/
>        10 Burlington Mall Road, Suite 265, Burlington MA 01803
>                                 http://www.openlinksw.com/weblogs/uda/
> OpenLink Blogs              http://www.openlinksw.com/weblogs/virtuoso/
>                               http://www.openlinksw.com/blog/~kidehen/
>    Universal Data Access and Virtual Database Technology Providers
>
>
>
>
>
Received on Tuesday, 4 May 2010 00:57:04 UTC