Re: Additional info on proposal for cwm module to load RDBMS data from naudts guido on 2004-11-03 (public-cwm-talk@w3.org from October to December 2004)

From: naudts guido <naudts_vannoten@yahoo.com>
Date: Wed, 3 Nov 2004 08:28:29 -0800 (PST)
To: "Jones, David H" <david.h.jones@boeing.com>, public-cwm-talk@w3.org
Cc: naudts guido <naudts_vannoten@yahoo.com>
Message-ID: <20041103162829.34856.qmail@web40514.mail.yahoo.com>
Hallo,
I've put some remarks after your lines.
I actually made two proposals: one for existing db's
and one for specific triplestores. Triplestores
recieve triples and respond to queries by sending
triples: in a triple store schema's, tables and
columns are not important: all resources are
identified uniquely by their uri that was stored in
the db. 
Guido.
--- "Jones, David H" <david.h.jones@boeing.com> wrote:

> This emails give additional background and
> information on a proposal for an interface between
> cwm and RDBMSs.  In addition I will compare this
> proposal with ideas contributed by Guido Naudts,
> which I believe suggest a tighter integration.
> 
> Motivation:
> 
> The motivation for the proposal for a cwm module to
> load RDBMS data is an interoperability scenario
> where there are many heterogeneous data sources
> storing related information.  This information is
> used in a variety of business processes that need to
> combine data from different source, perform some
> reasoning/calculations, make some decision, and
> possibly update one or more of the data sources.
> 
> The goal of the RDB proposal is to support the
> loading of rdb records into the cwm triple store. 
> Once loaded into the store, various things could be
> done:
> 	-	Save as n3/rdf for publishing purposes
> 	-	Translate portions of the store to conform to one
> or more external ontologies.
> 	-	Do general reasoning to support semi-automated
> task execution
> 	-	Explicitly update the database by doing sql
> insert/update operation.
> 
> The data loaded in cwm is essentially a snapshot of
> the data source, and there is not effort to
> synchronize data between loads.
> 
> There are obvious limitations in the size of the
> data that could be loaded into an in-memory triple
> store.  It is assumed that it is the user's
> responsibility to load data within the constraints
> of their computer.
> 
> In the next section I try to contrast differences
> between what Guido and I are envisaging:
> 
> 	-	I am assuming that a person using this builtin
> would want to see the rdb data as instances of one
> or more classes. The user provides the class name to
> handle cases where the query has a join.
I'm not sure about the form in which you see the data
returned? Do you expect to get back a list of
resources or a list of triples? If you speak of
classes I expect that you want to recieve a list of
resources?
> 
> 	-	My proposal creates property names by
> concatenating class name and column name. This
> handles collisions where two tables may have the
> same column name (a rather common occurrence). It is
> also possible to have identical table name/column
> name in different schemas of the same database. 
> This could be handled in 2 ways:
> 		-	Prepend the schema name to the class and column
> name
> 		-    Create a different connection with a
> different base uri.
> 		Since this duplication in different schemas would
> be the exception rather than the rule, I would
> suggest the 2nd choice.
When handling existing db's I propose a different uri
for each table. The retirved resources (if not
constants) are prefixed with the uri (uri:something).
> 
> 	-	My proposal is intended to support loading of the
> current triple store from rdb sources and (possibly)
> explicitly updating rdb sources from the triple
> store. I believe Guido is suggesting having an
> alternative rdb implementation of the RDFStore,
> similar to Jena.  
> 	-	In my proposal a URI is generate for each
> instance, based on the PK for the query.  This
> approach is somewhat restrictive, but produces
> stable URIs which can be used for graph
> superposition and classEquivalence statements. I
> believe Guido is suggesting creating anonymous
> triples when triples are loaded from the rdb.  
> 
> 	-    I am proposing using rdf/rdfs constructs to
> make results processible by a wider range of tools,
> and because owl constructs don't seem to be
> required.  Guido is proposing to use owl constructs.
> 
This is a good argument. On the other hand, using only
rdf(s) how will you check eg if something is really a
datatypeproperty? I mean, control on the input is
limited. However, of course, different engines can
(and will) probably exists for different purposes.

> I actually am not sure if database update is a
> reasonable goal.  Ideally this could be done with
> transaction management, so consistency could be
> guaranteed when updating multiple databases.  This
> seems like an unnecessarily complex feature in an
> experimental tool like cwm.  As an alternative, we
> could consider an update with no transaction
> management, or simply defer implementation of any
> update until a more compelling case is made for it.
>
I certainly want the possibility of writing my triples
from memory to a db. Updating remote existing db's is
maybe too much for CWM, however for a full blown
semantic web, it will be an absolute necessity. Think
of the example where an automatic payment is made by
you semantic web agent: you want to be sure that the
payment was really registered by the involved banks.
> In summary, my proposal has a limited scope with
> rather specific - and limited -- use cases.  I am
> assuming that no changes would be necessary to the
> internals of cwm.
I agree that you can achieve your proposal using only
builtins what is not possible with mine.  
> The proposal of Guido would
> implement a rdb triple store and support reasoning
> across triple stores. This would be a fairly tight
> integration of cwm and RDBMS.   It is unclear to me
> if his proposal includes dynamic queries to a
> separate database.
>
I did not speak of dynamic queries but I see no
problem with them. 
>
-----------------------------------------------------------------------------
> Example (with slight modification from previous
> email):
> Command line:
> Cwm rdb.n3 rdb-test.n3 --think > rdb-results.n3
> 
> 
>  <<rdb.n3>>  <<rdb-test.n3>>  <<rdb-results.n3>> 
> 
> Regards,
> 
> David H. Jones
> Boeing Phantom Works, 
> Mathematics & Computing Technology
> 425-865-6924 
> 425-865-2964 (FAX)
> david.h.jones@boeing.com
> 
> 
> 

> ATTACHMENT part 2 application/octet-stream
name=rdb.n3


> ATTACHMENT part 3 application/octet-stream
name=rdb-test.n3


> ATTACHMENT part 4 application/octet-stream
name=rdb-results.n3




		
__________________________________ 
Do you Yahoo!? 
Check out the new Yahoo! Front Page. 
www.yahoo.com
Received on Wednesday, 3 November 2004 16:29:00 UTC