- From: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
- Date: Wed, 28 Apr 2010 11:36:56 -0400
- To: Richard Cyganiak <richard@cyganiak.de>
- Cc: Juan Sequeda <juanfederico@gmail.com>, "Eric Prud'hommeaux" <eric@w3.org>, public-rdb2rdf-wg@w3.org
Hi, Richard --
On Apr 27, 2010, at 04:49 PM, Richard Cyganiak wrote:
> I'm trying to understand your position.
The effort is appreciated.
I'm really not being obstructionist for its own sake.
> On 27 Apr 2010, at 10:31, Ted Thibodeau Jr wrote:
>> Bringing RDB into RDF requires only that the schema of that RDB
>> be mapped to a "direct" or "putative" ontology -- which *is* the
>> correct term.
>
> Well, the charter is unfortunately not very explicit about what it
> means to map a database to RDF.
That is indeed unfortunate. One of the hazards of saying "we'll do
*this* in *that* timeframe" is that the task doesn't always cooperate,
and it seems that the task of defining this WG's job was one such.
> Just to be explicit: Is your position that mapping to domain
> ontologies such as FOAF, GoodRelations etc is out of scope of the
> charter? Or is your position that it's merely not required to meet
> the success criteria set out in the charter?
My position is that *forcing* RDB schemas to map to domain ontologies
is not necessary and will be counter-productive in the long run.
My further position is that "blessing" any given domain ontologies,
and further still, defining how a given transformation tool should
determine whether a given RDB.table maps to RDF:Class is not only
well beyond the charter, but also, as Souri said, an *enormous* task.
(This is also, I think, where most if not all of the expressivity
concerns come in.)
>> This ontology serves only to unambiguously identify a single
>> cell (table, column, row) within that schema.
>>
>> I suggest that then mapping that RDF into a "domain" ontology (e.g.,
>> SNOMED) is a separate concern -- which may be addressed in a couple
>> of ways --
>>
>> 1. replication with transformation
>> 2. mapping ontologies
>>
>>
>> The first means that you decide *once* how SNOMED corresponds to
>> a given RDB schema -- and if that correspondence changes, you have
>> to somehow discard all the triples that resulted from the original
>> conversion and then re-convert the RDB data.
>>
>> The second means that you decide how you think SNOMED corresponds
>> to your putative ontology, and create a "mapping" ontology --
>> which does little more than declare broaderClass, narrowerClass,
>> equivalentClass, sameAs, and such. If you realize later that one
>> of your mappings is wrong, you change this ontology -- everything
>> else remains as it is.
>>
>> Note that #2 does not mandate either forward- or backward-chaining.
>> You *can* work from #2 and replicate & transform, if you find that
>> works better for your deployment scenario. You *can* use reasoning
>> engines to work entirely dynamically, if that works better for you.
>>
>> Note that #1 *does* mandate forward-chaining. You *cannot* use
>> a reasoning engine to revise the putative-to-domain mapping once
>> replication & transformation has been done.
>
> I think I sort of agree with everything you said up to here.
I think that's a good sign.
>> For this simple reason, I strongly advise that we *not* combine
>> putative-to-domain ontology mapping into the rdb2rdf scenario --
>> because it makes a decision which we haven't been chartered for,
>
> Here is where I lost you. Can you please say explicitly what that
> decision is?
I expressed myself poorly there. Re-expression a bit below...
One decision would be "what domain ontology/ies do we bless?"
Another is implied -- "if you don't map your RDB data to Domain
ontologies, you aren't really exposing it as RDF."
Do we want to declare a *method*, a *language*, a *syntax* for
such local-ontology-to-domain-ontology mapping? That's fine,
but I think what we deliver should be focused on "how do I define
the mapping?" (perhaps something similar to GRDDL?) and not get
into "how do I determine what the mapping should be?"
But -- this local-ontology-to-domain-ontology mapping is a
*second step*, which comes *after* the RDB schema is mapped to
a putative/local ontology, and which, I believe, SHOULD generally
come after the RDB data is transformed to RDF with that same local
ontology (if indeed the RDB data is being replicated/transformed
at all) -- and yes, I believe this is and should be OPTIONAL.
So, re-expression --
I strongly advise that we not *conflate* putative-ontology-to-
domain-ontology mapping with RDB-schema-to-putative-ontology
mapping. The putative ontology is a vital element of RDB2RDF.
Is this an implementation detail? In a way.
I think the choice to conflate these two steps in any given tool
which implements this standard is an implementation detail, which
will prove my point in short order once users can choose between
two tools -- one which forces all RDB schemas to map to Domain
ontologies; and one which maps the RDB schema to a Local ontology
with the option to further declare Local:x owl:sameAs Domain:y
But I think the two steps *must not* be conflated in the standard,
because that makes the local-ontology-to-domain-ontology mapping
*mandatory*, and that is not acceptable to me, nor do I think it
is workable in the context of this (or any) WG, even one with a
delivery timeframe measured in decades.
>> and which I believe we are perilously close to deciding in the
>> worst possible way.
>
> What would be this worst possible option for the decision?
That all RDB (and really, all) data must be mapped to a domain
ontology to be considered (worthwhile as) RDF.
Consider Juan's Scenario #4, for instance...
I have a bunch of data in an RDB, and I'm pretty sure it would
benefit *someone*'s analysis if it were available in RDF, so
I want to make it available as such.
*I'm* not doing the analysis, so do I know what domain ontology/ies
it should be mapped to? Not a clue.
Should the RDB2RDF standard *specify* ontology/ies to which all
RDB data should be mapped? Loudly and repeatedly, I say no.
Rather, my publication should use a full "local" ontology -- which
simply maps table to class, column to attribute, cell content to
value, and primary/foreign key relationships to class relationships.
Others may look at my data and see clear domain ontology mappings
which work for them, and which may work for others, and they should
be able to publish these mappings. Still others may look at my data
and see *different* but *no less clear* domain ontology mappings
which they want (and should be easily able) to use.
If the RDB2RDF publication path forces the RDB data into domain
ontologies -- how can these last people *remap* the data, with their
new and different ontology correspondence?
I am not saying that such immediate mapping is always inappropriate,
that a publisher cannot choose to say "this table in my schema is
and always will be foaf:person" -- but I am saying that the RDB2RDF
standard should not *force* the publisher to do so.
Differently and possibly more explicitly put...
I have a bunch of cartographic data in RDB. I discover DBpedia,
and think that ontology is the one I should map to. So I do.
Sophisticated cartographic workers familiar with RDF will know
that there are other ontologies -- Freebase, Geonames, OpenCYC,
etc. -- which do a much better job in many ways.
If my original data were mapped to a local ontology (say,
http://mymapdata.example.com/ontology/#), it would be very easy
for the sophisticated user to ignore my local-to-DBpedia mapping
(which is of course in its own named graph) and substitute their
own local-to-Geonames+CYC+Freebase+theirCartOntology.
If my original data is transformed directly into DBpedia classes --
there is no easy way to substitute the more sophisticated mapping.
Is all of this such a giant problem if everyone is using backward-
chaining all the way to the RDB? No -- *if* the sophisticated user
can reach me and convince me to substitute their mapping for mine.
But that's not the most common pattern in play, and it's not likely
to become such soon, as much as I wish it would -- but I don't think
it should be forced on people either!
The big problem comes when the RDB2RDF transform is materialized,
i.e., forward-chained, as is the most common pattern today, when
people want to get the RDF dump (or crawl the SPARQL endpoint) and
load it all in their local store, instead of issuing relevant queries
against the existing SPARQL endpoint.
Consider an example from Juan's Scenario #1, joining RDB to RDB.
If my RDB2RDF mapping says --
mydb1.Contact foaf:person
mydb2.Customer foaf:person
-- and I've replicated all my RDB data as RDF, and later discover
that Customer.name is actually filled with company names, while
Contact.name is people ... how do I fix that, short of dropping
and re-replicating with the new map?
On the other hand, if my RDB2RDF mapping says --
mydb1.Contact ontology1:contact
mydb2.Customer ontology2:customer
-- it's easy for me to have statements that say --
{ ontology1:contact owl:sameAs ontology2:customer . }
{ ontology1:contact owl:subClass foaf:person . }
{ ontology2:customer owl:subClass foaf:person . }
There's also no *need* to say foaf:person anywhere. There's no
need to know that FOAF exists at all.
And if I make the same discovery -- I drop (or change) the
mapping triples, and I'm done.
In both of these, consider the possibility of columns which do
not obviously or easily map to FOAF or any other known domain
ontology. With the first option, those columns are apparently
discarded or ignored. With the second, they are present, but
known only by their local identity, e.g., ontology:contact#widget,
until someone comes up with a new domainOntology:widget -- and
hey, presto! --
{ ontology:contact#widget owl:sameAs domainOntology:widget . }
And once again -- there will be times and instances where it
*is* appropriate to *choose* to make the RDB schema absolutely
map to domain ontologies.
The *ability to choose* is key.
I hope this has made my concerns clearer?
Be seeing you,
Ted
--
A: Yes. http://www.guckes.net/faq/attribution.html
| Q: Are you sure?
| | A: Because it reverses the logical flow of conversation.
| | | Q: Why is top posting frowned upon?
Ted Thibodeau, Jr. // voice +1-781-273-0900 x32
Evangelism & Support // mailto:tthibodeau@openlinksw.com
// http://twitter.com/TallTed
OpenLink Software, Inc. // http://www.openlinksw.com/
10 Burlington Mall Road, Suite 265, Burlington MA 01803
http://www.openlinksw.com/weblogs/uda/
OpenLink Blogs http://www.openlinksw.com/weblogs/virtuoso/
http://www.openlinksw.com/blog/~kidehen/
Universal Data Access and Virtual Database Technology Providers
Received on Wednesday, 28 April 2010 15:37:30 UTC