Re: feedback on the current Use case & Requirements for RDB2RDF...

On May 9, 2010, at 01:55 PM, Eric Prud'hommeaux wrote:

> Hi Ahmed, thanks for the input. I'm trying to map these to specific textual suggestions, which the most likely to capture the details needed for a decision by the working group.
> 
> * Ezzat, Ahmed <Ahmed.Ezzat@hp.com> [2010-05-09 05:26+0000]
>> *       I thought a simple diagram like option-2 and option-3 in the diagram Juan sent out earlier seems reasonable to add.  P.S. option-1 is a special case of option-3.
> 
> LeeF had asked some questions teasing out the meaning of the diagrams. I was interested in the outcome of that conversation.

I think that not only is Option #1 (as shown [1], with the "Refined 
Local Ontology") a special case of Option #3, Option #2 is also, 
as it appears to me to be a compression and summation of Option #3, 
concealing the direct/local mapping which (I think we've agreed) 
takes place under the covers, even if the user is unaware and/or 
never sees it.

[1] http://userweb.cs.utexas.edu/~jsequeda/rdb2rdf/


>> *       In the relevant community people use local and domain ontology; why we are not using these terms to make it easier for readers?
> 
> #SHAPE refers to "shared" and "popular" ontologies, which i believe encompass domain ontologies as well as ontologies not focused on a specific domain. (e.g. the use of FOAF to represent people in an employees table).

In my experience, "domain ontology" has been generically used to 
mean a specialized vocabulary -- which may have been created on 
an ad-hoc basis, may be used by no-one except the current user, 
and may only be used for the current effort; or may be the fruit
of many human-years' work (e.g., FOAF, YAGO, CYC). 

It does *not* generally mean that the vocabulary is focused on any 
specific area of knowledge/data (e.g., geology, physics, medicine), 
nor that it has been vetted and/or adopted by any group -- though
it does include such vocabularies!

By this meaning (which I have found to be common in the ontological 
world), Juan's "Refined Local Ontology" is a "Domain Ontology."

(If we really press the point, even the "Local Ontology" is a 
Domain Ontology, which specializes on "this RDB schema on this 
date" or some such.)

That said ... 

If we consistently use "Local Ontology" to refer to the "Direct Map 
of RDB Schema to EAV+CR (i.e., RDF)" and "Domain Ontology" for all 
others, I think things will be understood -- especially if we say 
this is how we're using these terms within this(these) document(s).


>> *       Why we are using graphs and labels terms vs RDF tuples and identifiers terms?
> 
> What's an RDF tuple?
> Per identifier vs. label, #LABELGEN has a description in terms of identifiers:
> [[
> LABELGEN - Label Generation
> RDF identifiers for objects in the conceptual model can, in some cases, be generated from a transformation of the schema and data in a tuple representing that conceptual model.
> ]]
> 
> I tried "Identifier Generation" instead of "Label Generation", but it I wouldn't know how to defend against the argument "you're not generating identifiers; you're using identifiers to generate labels." I thought that using the graph theory term was safer.

I would suggest that an RDF identifier is an HTTP URI, primarily 
meant for machine use.

I would also suggest that an SQL identifier is something like 
instance info + catalog.owner.table + primary-key/rowID.

I would further suggest that a label is a string, primarily 
meant for human use, which may often serve as a shortcut to 
an identifier, but being just a string, it is possible (even 
likely) to have ambiguities/conflicts between them.

In some cases, identifiers and labels will bear some (even strong) 
resemblance to each other.  This will not be true in all cases, 
as URIs may (and must be treated as if they are) opaque.


>> *       Section 3.1.4, I am not clear what we are trying to say regarding database connection?  RDBMS has its own notion and I suspect in SPARQL there is well defined notion of end point.  Is the mapping language is involved in mapping RDBMS connections?
> 
> D2R, for example, has connection information like:
> [[
> map:MyDatabase a d2rq:Database;
> d2rq:jdbcDSN "jdbc:mysql://localhost/mydb";
> d2rq:jdbcDriver "com.mysql.jdbc.Driver";
> d2rq:username "user";
> d2rq:password "password".
> ]]
> as does FeDeRate:
>  http://swobjects.svn.sourceforge.net/viewvc/swobjects/trunk/tests/7tm_receptors/flat/receptors.map
> 
> One could argue that this is a combination of a map and connection information (that we should define only the mapping language), but there is also a user benefit to being able to swap implementations and use the same configuration information. Perhaps cygri would refer to this as "standardizing httpd.conf". I'm ambivalent.

I feel strongly that the connection information as described 
here is an implementation detail.  For one thing, ODBC-based 
implementations (such as Virtuoso) cannot directly use the JDBC 
connection info (though a bridge driver does resolve that).

The two tools mentioned do both use JDBC for the connections,
but there are plenty of other paths to take, and I do not think
we should even appear to pick a horse in this race.

Even SPARQL endpoints, though typically accessed via HTTP, are not 
required to be so [2], and should not be assumed to be so.

[2] <http://www.w3.org/TR/rdf-sparql-protocol/#query-bindings-http>


Regards,

Ted



--
A: Yes.                      http://www.guckes.net/faq/attribution.html
| Q: Are you sure?
| | A: Because it reverses the logical flow of conversation.
| | | Q: Why is top posting frowned upon?

Ted Thibodeau, Jr.           //               voice +1-781-273-0900 x32
Evangelism & Support         //        mailto:tthibodeau@openlinksw.com
                             //              http://twitter.com/TallTed
OpenLink Software, Inc.      //              http://www.openlinksw.com/
        10 Burlington Mall Road, Suite 265, Burlington MA 01803
                                 http://www.openlinksw.com/weblogs/uda/
OpenLink Blogs              http://www.openlinksw.com/weblogs/virtuoso/
                               http://www.openlinksw.com/blog/~kidehen/
    Universal Data Access and Virtual Database Technology Providers

Received on Monday, 10 May 2010 15:36:23 UTC