RE: Role of the Ontology and Expressivity - to discuss on telcon

>> My question (to database people in particular, such as Ashok, Ahmed, and
Daniel) is that does the database community understand "domain/putative ontology" and use that terminology, or should we revert to "direct transforms and further transforms" word choice?

I believe database people understand both Domain and Local Ontology terms.

I tend to agree with Harry in his answers. My additional comments: I see the value of RDF as means to an end and not the goal by itself.  The goal is really data integration that traditional database approaches (MDM) do not work very well.  My perspective is I will map RDB to RDF to use it in an application that is most probably involves data integration. We will do a service to the community if our mapping language allowed W3C (SPARQL) to support federation (federated queries) among multiple data sources on the fly.  In that context, I am not sure what is the value of local ontology mapping!

I am not following the implication that our mapping language needs to adopt any specific Domain ontology? Rather it needs to reference a user supplied definition of the target domain ontology.  How the mapping is done should not be an issue!

Let us discuss tomorrow an option of specifying in the mapping language to generate RDF that is based on the RDBMS schema only (local ontology approach).  In other words, the default is to map to a Domain Ontology supplied by the application (w/o specifying the how) and an option is just to generate the RDF based on local ontology approach.
Regards,

Ahmed


-----Original Message-----
From: public-rdb2rdf-wg-request@w3.org [mailto:public-rdb2rdf-wg-request@w3.org] On Behalf Of Harry Halpin
Sent: Monday, May 03, 2010 6:01 PM
To: Ted Thibodeau Jr
Cc: ivan@w3.org; public-rdb2rdf-wg@w3.org
Subject: Re: Role of the Ontology and Expressivity - to discuss on telcon

Apologies for the late reply, myself and EricP have been recovering from
the WWW2010 conference. While I only caught the last half of last's week
telecon due to the conflicting talk, I did notice that this point seemed
to derail our attempt at consensus on the use case and requirement
document. Let me explain why I think while Ted's point makes sense, it's a
possible we are just communicating at cross-purposes.

> Hi, Richard --
>
>
> On Apr 27, 2010, at 04:49 PM, Richard Cyganiak wrote:
>> I'm trying to understand your position.
>
> The effort is appreciated.
>
> I'm really not being obstructionist for its own sake.
>
>
>> On 27 Apr 2010, at 10:31, Ted Thibodeau Jr wrote:
>>> Bringing RDB into RDF requires only that the schema of that RDB
>>> be mapped to a "direct" or "putative" ontology -- which *is* the
>>> correct term.
>>
>> Well, the charter is unfortunately not very explicit about what it
>> means to map a database to RDF.
>
> That is indeed unfortunate.  One of the hazards of saying "we'll do
> *this* in *that* timeframe" is that the task doesn't always cooperate,
> and it seems that the task of defining this WG's job was one such.

Charters are left vague on purpose, and the author of this charter was
Ashok and Ivan. We can ask them to clarify directly. However, I was under
the impression from our review of existing approaches that some kind of
mapping language that did more than a direct mapping (i.e. dump rows and
columns into RDF with no transformation would be necessary). As the straw
poll from the last telecon shows, all attendees except Ted (and possibly
Souri, who was not sure about the phrasing of the question) agreed that
such functionality was needed.

>
>
>> Just to be explicit: Is your position that mapping to domain
>> ontologies such as FOAF, GoodRelations etc is out of scope of the
>> charter? Or is your position that it's merely not required to meet
>> the success criteria set out in the charter?
>
> My position is that *forcing* RDB schemas to map to domain ontologies
> is not necessary and will be counter-productive in the long run.
>
> My further position is that "blessing" any given domain ontologies,
> and further still, defining how a given transformation tool should
> determine whether a given RDB.table maps to RDF:Class is not only
> well beyond the charter, but also, as Souri said, an *enormous* task.

I think this is a vocabulary mismatch. In particular, EricP's point was
that the data could be directly transformed (which we all agree on) and
then that further transforms could be necessary (i.e. done either using
the SQL view approach or via a series of SPARQL constructs). Juan
rephrased this direct transformation as "putative ontology" and then the
"further transforms" as a transform to a "domain ontology".

However, this just gives authors of the R2ML file the ability to transform
their graph, it does not bless any particular domain ontology (such as
FOAF). The precise domain ontologies that the author wishes to map can be
specified in any way by the author of the R2ML file.

My question (to database people in particular, such as Ashok, Ahmed, and
Daniel) is that does the database community understand "domain/putative
ontology" and use that terminology, or should we revert to "direct
transforms and further transforms" word choice?

>
> (This is also, I think, where most if not all of the expressivity
> concerns come in.)
>
>
>>> This ontology serves only to unambiguously identify a single
>>> cell (table, column, row) within that schema.
>>>
>>> I suggest that then mapping that RDF into a "domain" ontology (e.g.,
>>> SNOMED) is a separate concern -- which may be addressed in a couple
>>> of ways --
>>>
>>> 1. replication with transformation
>>> 2. mapping ontologies
>>>
>>>
>>> The first means that you decide *once* how SNOMED corresponds to
>>> a given RDB schema -- and if that correspondence changes, you have
>>> to somehow discard all the triples that resulted from the original
>>> conversion and then re-convert the RDB data.
>>>
>>> The second means that you decide how you think SNOMED corresponds
>>> to your putative ontology, and create a "mapping" ontology --
>>> which does little more than declare broaderClass, narrowerClass,
>>> equivalentClass, sameAs, and such.  If you realize later that one
>>> of your mappings is wrong, you change this ontology -- everything
>>> else remains as it is.

I think this approach could be done with OWL as a separate step from R2ML,
but if someone
wanted to include some of these mappings into R2ML I would not be forbid
them. However, I think no-one has brought that approach up per se. The
question is whether or not that mapping should/can be done in SQL or via
SPARQL constructs or (...) in R2ML. I think that expressivity requirement
should remain in the use-cases, although we should be agnostic about what
exact approach may be used. We should probably only require that some
approach be allowed, i.e. a portable set of SQL for view-based
transformation, or a set of SPARQL constructs. Anything beyond that (i.e.
reasoning) we should probably allow that to be a separate step *after*
R2ML deployment to get the direct mapping/putative ontlogy.


>>>
>>> Note that #2 does not mandate either forward- or backward-chaining.
>>> You *can* work from #2 and replicate & transform, if you find that
>>> works better for your deployment scenario.  You *can* use reasoning
>>> engines to work entirely dynamically, if that works better for you.
>>>
>>> Note that #1 *does* mandate forward-chaining.  You *cannot* use
>>> a reasoning engine to revise the putative-to-domain mapping once
>>> replication & transformation has been done.
>>
>> I think I sort of agree with everything you said up to here.
>
> I think that's a good sign.
>
>
>>> For this simple reason, I strongly advise that we *not* combine
>>> putative-to-domain ontology mapping into the rdb2rdf scenario --
>>> because it makes a decision which we haven't been chartered for,
>>
>> Here is where I lost you. Can you please say explicitly what that
>> decision is?
>
> I expressed myself poorly there.  Re-expression a bit below...
>
> One decision would be "what domain ontology/ies do we bless?"

Again, blessing any local ontology is out-of-scope, although what is in
scope is a way of making identifiers re-usable, which I imagine will
easily be some kind of API call that given a string identifier returns a
possible linked data URI.

>
> Another is implied -- "if you don't map your RDB data to Domain
> ontologies, you aren't really exposing it as RDF."
>
> Do we want to declare a *method*, a *language*, a *syntax* for
> such local-ontology-to-domain-ontology mapping?  That's fine,
> but I think what we deliver should be focused on "how do I define
> the mapping?" (perhaps something similar to GRDDL?) and not get
> into "how do I determine what the mapping should be?"

If such a mapping is done in SQL or SPARQL, which I imagine R2ML will
require database vendors to support, I see no harm done and a lot to be
gained. Again, reasoning i.e. even very simple use of OWL, may be too
much.

>
> But -- this local-ontology-to-domain-ontology mapping is a
> *second step*, which comes *after* the RDB schema is mapped to
> a putative/local ontology, and which, I believe, SHOULD generally
> come after the RDB data is transformed to RDF with that same local
> ontology (if indeed the RDB data is being replicated/transformed
> at all) -- and yes, I believe this is and should be OPTIONAL.
>

Of course it should be OPTIONAL, but people in the Working Group have made
strong cases for allowing the R2ML file to contain either SQL or SPARQL
constructs. Of course, using those statements can be OPTIONAL, but the
spec should probably mandate that allowing some simple transforms be
REQUIRED.

> So, re-expression --
>
> I strongly advise that we not *conflate* putative-ontology-to-
> domain-ontology mapping with RDB-schema-to-putative-ontology
> mapping.  The putative ontology is a vital element of RDB2RDF.
>
> Is this an implementation detail?  In a way.
>
> I think the choice to conflate these two steps in any given tool
> which implements this standard is an implementation detail, which
> will prove my point in short order once users can choose between
> two tools -- one which forces all RDB schemas to map to Domain
> ontologies; and one which maps the RDB schema to a Local ontology
> with the option to further declare Local:x owl:sameAs Domain:y
>

But remember many R2ML clients may not support OWL and/or any mappings
outside SPARQL/SQL. We should give them enough power to do a reasonable
mapping job without using external reasoners.

> But I think the two steps *must not* be conflated in the standard,
> because that makes the local-ontology-to-domain-ontology mapping
> *mandatory*, and that is not acceptable to me, nor do I think it
> is workable in the context of this (or any) WG, even one with a
> delivery timeframe measured in decades.

We can keep these two steps *separate* in the standard, but I think it
would be a good idea to include elementary data transformation
capabilities using SPARQL and SQL in R2ML.

Again, I think Ted - we're not requiring specific mappings, but a baseline
way to make more complex mappings if needed. I imagine everyone
implementing R2ML will also implement SQL and SPARQL obviously, so those
seem reasonable to me.

That's a basic portability requirement I think, which is exactly why a
standard is needed in this area, so people can make a real-world mapping
between relational and RDF data, and then move databases as needed without
having vendor lock-in.


>
>
>>> and which I believe we are perilously close to deciding in the
>>> worst possible way.
>>
>> What would be this worst possible option for the decision?
>
> That all RDB (and really, all) data must be mapped to a domain
> ontology to be considered (worthwhile as) RDF.
>

Of course this is true and such RDF would be worthwhile.

> Consider Juan's Scenario #4, for instance...
>
> I have a bunch of data in an RDB, and I'm pretty sure it would
> benefit *someone*'s analysis if it were available in RDF, so
> I want to make it available as such.
>
> *I'm* not doing the analysis, so do I know what domain ontology/ies
> it should be mapped to?  Not a clue.
>

No-one is requiring it be mapped to particular ontologies, but the
database vendor may want it exposed using a particular ontology, which
they should have the freedom to specify in their R2ML file.

> Should the RDB2RDF standard *specify* ontology/ies to which all
> RDB data should be mapped?  Loudly and repeatedly, I say no.
>
> Rather, my publication should use a full "local" ontology -- which
> simply maps table to class, column to attribute, cell content to
> value, and primary/foreign key relationships to class relationships.
>

This is one way to do it, but others may view that data as a mess in RDF,
as often relational data as a raw dump into RDF is.

> Others may look at my data and see clear domain ontology mappings
> which work for them, and which may work for others, and they should
> be able to publish these mappings.  Still others may look at my data
> and see *different* but *no less clear* domain ontology mappings
> which they want (and should be easily able) to use.
>

Again, saying that further expressivity could be necessary does not
require that further mappings not be made.

> If the RDB2RDF publication path forces the RDB data into domain
> ontologies -- how can these last people *remap* the data, with their
> new and different ontology correspondence?

They can remap from the domain ontologies obviously, using whatever
technique they want.

>
> I am not saying that such immediate mapping is always inappropriate,
> that a publisher cannot choose to say "this table in my schema is
> and always will be foaf:person" -- but I am saying that the RDB2RDF
> standard should not *force* the publisher to do so.

It would not force them to, but give them the capability to make such a
mapping if they so chose. They could also just do a direct dump. Either
are fine with me, but I'm pro-giving the users of R2ML a bit more power
than restricting them to direct dumps to RDF.

>
> Differently and possibly more explicitly put...
>
> I have a bunch of cartographic data in RDB.  I discover DBpedia,
> and think that ontology is the one I should map to.  So I do.
>
> Sophisticated cartographic workers familiar with RDF will know
> that there are other ontologies -- Freebase, Geonames, OpenCYC,
> etc. -- which do a much better job in many ways.
>
> If my original data were mapped to a local ontology (say,
> http://mymapdata.example.com/ontology/#), it would be very easy
> for the sophisticated user to ignore my local-to-DBpedia mapping
> (which is of course in its own named graph) and substitute their
> own local-to-Geonames+CYC+Freebase+theirCartOntology.
>
> If my original data is transformed directly into DBpedia classes --
> there is no easy way to substitute the more sophisticated mapping.
>
> Is all of this such a giant problem if everyone is using backward-
> chaining all the way to the RDB?  No -- *if* the sophisticated user
> can reach me and convince me to substitute their mapping for mine.
> But that's not the most common pattern in play, and it's not likely
> to become such soon, as much as I wish it would -- but I don't think
> it should be forced on people either!
>
> The big problem comes when the RDB2RDF transform is materialized,
> i.e., forward-chained, as is the most common pattern today, when
> people want to get the RDF dump (or crawl the SPARQL endpoint) and
> load it all in their local store, instead of issuing relevant queries
> against the existing SPARQL endpoint.
>
>
> Consider an example from Juan's Scenario #1, joining RDB to RDB.
>
> If my RDB2RDF mapping says --
>
>    mydb1.Contact     foaf:person
>    mydb2.Customer    foaf:person
>
> -- and I've replicated all my RDB data as RDF, and later discover
> that Customer.name is actually filled with company names, while
> Contact.name is people ... how do I fix that, short of dropping
> and re-replicating with the new map?
>
> On the other hand, if my RDB2RDF mapping says --
>
>    mydb1.Contact    ontology1:contact
>    mydb2.Customer   ontology2:customer
>
> -- it's easy for me to have statements that say --
>
>    { ontology1:contact   owl:sameAs    ontology2:customer  . }
>    { ontology1:contact   owl:subClass  foaf:person         . }
>    { ontology2:customer  owl:subClass  foaf:person         . }
>
> There's also no *need* to say foaf:person anywhere.  There's no
> need to know that FOAF exists at all.
>
> And if I make the same discovery -- I drop (or change) the
> mapping triples, and I'm done.
>
> In both of these, consider the possibility of columns which do
> not obviously or easily map to FOAF or any other known domain
> ontology.  With the first option, those columns are apparently
> discarded or ignored.  With the second, they are present, but
> known only by their local identity, e.g., ontology:contact#widget,
> until someone comes up with a new domainOntology:widget -- and
> hey, presto! --
>
>    { ontology:contact#widget  owl:sameAs  domainOntology:widget . }
>
>
> And once again -- there will be times and instances where it
> *is* appropriate to *choose* to make the RDB schema absolutely
> map to domain ontologies.
>
> The *ability to choose* is key.

Yes, that's fine. But the ability to not force the users of the data to do
unnecessary work when the a simple and preferred mapping is known by the
database owner is not necessarily a bad thing, but actually a good thing I
think.

>
>
> I hope this has made my concerns clearer?
>
> Be seeing you,
>
> Ted
>
>
>
> --
> A: Yes.                      http://www.guckes.net/faq/attribution.html
> | Q: Are you sure?
> | | A: Because it reverses the logical flow of conversation.
> | | | Q: Why is top posting frowned upon?
>
> Ted Thibodeau, Jr.           //               voice +1-781-273-0900 x32
> Evangelism & Support         //        mailto:tthibodeau@openlinksw.com
>                              //              http://twitter.com/TallTed
> OpenLink Software, Inc.      //              http://www.openlinksw.com/
>         10 Burlington Mall Road, Suite 265, Burlington MA 01803
>                                  http://www.openlinksw.com/weblogs/uda/
> OpenLink Blogs              http://www.openlinksw.com/weblogs/virtuoso/
>                                http://www.openlinksw.com/blog/~kidehen/
>     Universal Data Access and Virtual Database Technology Providers
>
>
>
>
>
>

Received on Tuesday, 4 May 2010 04:43:08 UTC