Re: Role of the Ontology and Expressivity - to discuss on telcon from Ted Thibodeau Jr on 2010-04-28 (public-rdb2rdf-wg@w3.org from April 2010)

From: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Date: Wed, 28 Apr 2010 11:36:56 -0400
To: Richard Cyganiak <richard@cyganiak.de>
Cc: Juan Sequeda <juanfederico@gmail.com>, "Eric Prud'hommeaux" <eric@w3.org>, public-rdb2rdf-wg@w3.org
Message-Id: <357C11AC-58B0-4D2E-9EB6-A8C35B3EA585@openlinksw.com>
Hi, Richard --


On Apr 27, 2010, at 04:49 PM, Richard Cyganiak wrote:
> I'm trying to understand your position.

The effort is appreciated.  

I'm really not being obstructionist for its own sake.


> On 27 Apr 2010, at 10:31, Ted Thibodeau Jr wrote:
>> Bringing RDB into RDF requires only that the schema of that RDB
>> be mapped to a "direct" or "putative" ontology -- which *is* the
>> correct term.
> 
> Well, the charter is unfortunately not very explicit about what it
> means to map a database to RDF.

That is indeed unfortunate.  One of the hazards of saying "we'll do
*this* in *that* timeframe" is that the task doesn't always cooperate,
and it seems that the task of defining this WG's job was one such.


> Just to be explicit: Is your position that mapping to domain
> ontologies such as FOAF, GoodRelations etc is out of scope of the 
> charter? Or is your position that it's merely not required to meet
> the success criteria set out in the charter?

My position is that *forcing* RDB schemas to map to domain ontologies 
is not necessary and will be counter-productive in the long run.

My further position is that "blessing" any given domain ontologies,
and further still, defining how a given transformation tool should 
determine whether a given RDB.table maps to RDF:Class is not only 
well beyond the charter, but also, as Souri said, an *enormous* task.

(This is also, I think, where most if not all of the expressivity 
concerns come in.)


>> This ontology serves only to unambiguously identify a single
>> cell (table, column, row) within that schema.
>> 
>> I suggest that then mapping that RDF into a "domain" ontology (e.g.,
>> SNOMED) is a separate concern -- which may be addressed in a couple
>> of ways --
>> 
>> 1. replication with transformation
>> 2. mapping ontologies
>> 
>> 
>> The first means that you decide *once* how SNOMED corresponds to
>> a given RDB schema -- and if that correspondence changes, you have
>> to somehow discard all the triples that resulted from the original
>> conversion and then re-convert the RDB data.
>> 
>> The second means that you decide how you think SNOMED corresponds
>> to your putative ontology, and create a "mapping" ontology --
>> which does little more than declare broaderClass, narrowerClass,
>> equivalentClass, sameAs, and such.  If you realize later that one
>> of your mappings is wrong, you change this ontology -- everything
>> else remains as it is.
>> 
>> Note that #2 does not mandate either forward- or backward-chaining.
>> You *can* work from #2 and replicate & transform, if you find that
>> works better for your deployment scenario.  You *can* use reasoning
>> engines to work entirely dynamically, if that works better for you.
>> 
>> Note that #1 *does* mandate forward-chaining.  You *cannot* use
>> a reasoning engine to revise the putative-to-domain mapping once
>> replication & transformation has been done.
> 
> I think I sort of agree with everything you said up to here.

I think that's a good sign.


>> For this simple reason, I strongly advise that we *not* combine
>> putative-to-domain ontology mapping into the rdb2rdf scenario --
>> because it makes a decision which we haven't been chartered for,
> 
> Here is where I lost you. Can you please say explicitly what that
> decision is?

I expressed myself poorly there.  Re-expression a bit below...

One decision would be "what domain ontology/ies do we bless?"

Another is implied -- "if you don't map your RDB data to Domain 
ontologies, you aren't really exposing it as RDF."

Do we want to declare a *method*, a *language*, a *syntax* for 
such local-ontology-to-domain-ontology mapping?  That's fine, 
but I think what we deliver should be focused on "how do I define 
the mapping?" (perhaps something similar to GRDDL?) and not get 
into "how do I determine what the mapping should be?"

But -- this local-ontology-to-domain-ontology mapping is a 
*second step*, which comes *after* the RDB schema is mapped to 
a putative/local ontology, and which, I believe, SHOULD generally 
come after the RDB data is transformed to RDF with that same local 
ontology (if indeed the RDB data is being replicated/transformed 
at all) -- and yes, I believe this is and should be OPTIONAL.

So, re-expression --

I strongly advise that we not *conflate* putative-ontology-to-
domain-ontology mapping with RDB-schema-to-putative-ontology
mapping.  The putative ontology is a vital element of RDB2RDF.

Is this an implementation detail?  In a way.  

I think the choice to conflate these two steps in any given tool 
which implements this standard is an implementation detail, which
will prove my point in short order once users can choose between 
two tools -- one which forces all RDB schemas to map to Domain
ontologies; and one which maps the RDB schema to a Local ontology
with the option to further declare Local:x owl:sameAs Domain:y

But I think the two steps *must not* be conflated in the standard,
because that makes the local-ontology-to-domain-ontology mapping 
*mandatory*, and that is not acceptable to me, nor do I think it
is workable in the context of this (or any) WG, even one with a
delivery timeframe measured in decades.


>> and which I believe we are perilously close to deciding in the
>> worst possible way.
> 
> What would be this worst possible option for the decision?

That all RDB (and really, all) data must be mapped to a domain 
ontology to be considered (worthwhile as) RDF.

Consider Juan's Scenario #4, for instance...  

I have a bunch of data in an RDB, and I'm pretty sure it would 
benefit *someone*'s analysis if it were available in RDF, so 
I want to make it available as such.

*I'm* not doing the analysis, so do I know what domain ontology/ies 
it should be mapped to?  Not a clue.

Should the RDB2RDF standard *specify* ontology/ies to which all 
RDB data should be mapped?  Loudly and repeatedly, I say no.

Rather, my publication should use a full "local" ontology -- which 
simply maps table to class, column to attribute, cell content to 
value, and primary/foreign key relationships to class relationships.

Others may look at my data and see clear domain ontology mappings 
which work for them, and which may work for others, and they should 
be able to publish these mappings.  Still others may look at my data 
and see *different* but *no less clear* domain ontology mappings 
which they want (and should be easily able) to use.

If the RDB2RDF publication path forces the RDB data into domain 
ontologies -- how can these last people *remap* the data, with their
new and different ontology correspondence?

I am not saying that such immediate mapping is always inappropriate,
that a publisher cannot choose to say "this table in my schema is 
and always will be foaf:person" -- but I am saying that the RDB2RDF 
standard should not *force* the publisher to do so.

Differently and possibly more explicitly put...

I have a bunch of cartographic data in RDB.  I discover DBpedia, 
and think that ontology is the one I should map to.  So I do.

Sophisticated cartographic workers familiar with RDF will know 
that there are other ontologies -- Freebase, Geonames, OpenCYC, 
etc. -- which do a much better job in many ways.

If my original data were mapped to a local ontology (say, 
http://mymapdata.example.com/ontology/#), it would be very easy 
for the sophisticated user to ignore my local-to-DBpedia mapping 
(which is of course in its own named graph) and substitute their 
own local-to-Geonames+CYC+Freebase+theirCartOntology.

If my original data is transformed directly into DBpedia classes -- 
there is no easy way to substitute the more sophisticated mapping.

Is all of this such a giant problem if everyone is using backward-
chaining all the way to the RDB?  No -- *if* the sophisticated user
can reach me and convince me to substitute their mapping for mine.  
But that's not the most common pattern in play, and it's not likely 
to become such soon, as much as I wish it would -- but I don't think 
it should be forced on people either!

The big problem comes when the RDB2RDF transform is materialized,
i.e., forward-chained, as is the most common pattern today, when 
people want to get the RDF dump (or crawl the SPARQL endpoint) and 
load it all in their local store, instead of issuing relevant queries 
against the existing SPARQL endpoint.


Consider an example from Juan's Scenario #1, joining RDB to RDB.

If my RDB2RDF mapping says --

   mydb1.Contact     foaf:person
   mydb2.Customer    foaf:person

-- and I've replicated all my RDB data as RDF, and later discover 
that Customer.name is actually filled with company names, while 
Contact.name is people ... how do I fix that, short of dropping 
and re-replicating with the new map?

On the other hand, if my RDB2RDF mapping says --

   mydb1.Contact    ontology1:contact
   mydb2.Customer   ontology2:customer

-- it's easy for me to have statements that say --

   { ontology1:contact   owl:sameAs    ontology2:customer  . }
   { ontology1:contact   owl:subClass  foaf:person         . }
   { ontology2:customer  owl:subClass  foaf:person         . }

There's also no *need* to say foaf:person anywhere.  There's no 
need to know that FOAF exists at all.  

And if I make the same discovery -- I drop (or change) the
mapping triples, and I'm done.

In both of these, consider the possibility of columns which do
not obviously or easily map to FOAF or any other known domain 
ontology.  With the first option, those columns are apparently
discarded or ignored.  With the second, they are present, but
known only by their local identity, e.g., ontology:contact#widget,
until someone comes up with a new domainOntology:widget -- and
hey, presto! --

   { ontology:contact#widget  owl:sameAs  domainOntology:widget . }


And once again -- there will be times and instances where it 
*is* appropriate to *choose* to make the RDB schema absolutely 
map to domain ontologies.  

The *ability to choose* is key.


I hope this has made my concerns clearer?

Be seeing you,

Ted



--
A: Yes.                      http://www.guckes.net/faq/attribution.html
| Q: Are you sure?
| | A: Because it reverses the logical flow of conversation.
| | | Q: Why is top posting frowned upon?

Ted Thibodeau, Jr.           //               voice +1-781-273-0900 x32
Evangelism & Support         //        mailto:tthibodeau@openlinksw.com
                             //              http://twitter.com/TallTed
OpenLink Software, Inc.      //              http://www.openlinksw.com/
        10 Burlington Mall Road, Suite 265, Burlington MA 01803
                                 http://www.openlinksw.com/weblogs/uda/
OpenLink Blogs              http://www.openlinksw.com/weblogs/virtuoso/
                               http://www.openlinksw.com/blog/~kidehen/
    Universal Data Access and Virtual Database Technology Providers
Received on Wednesday, 28 April 2010 15:37:30 UTC