Re: RDB2RDF WG agenda for 2010-10-19 meeting 1600 UTC

* Alexandre Bertails <bertails@w3.org> [2010-10-17 23:45-0400]
> Hi Paul,
> 
> my comment are in line.
> 
> On Sun, 2010-10-17 at 16:36 -0500, Paul Tyson wrote:
> > On Sun, 2010-10-17 at 14:24 -0400, Alexandre Bertails wrote:
> > > On Fri, 2010-10-15 at 15:57 -0700, ashok malhotra wrote:
> > > > I will not be able to make the call on Tuesday.
> > > > Can we agree that Eric's description is accurate and correct?
> > > 
> > > Eric, I like your document. Giving a concrete syntax for the input is a
> > > nice thing to do to make it explicit so that any RDBMS user can
> > > understand what your document is about. By the way, may I suggest you to
> > > precise that you need only two subsets of SQL:
> > > * the Data Definition Language to describe your database
> > > * the Data Manipulation Language that gives you the INSERT
> > > It's important as you don't need any query at that point.
> > > 
> > > As nobody defines mapping in terms of concrete syntax, I guess we will
> > > need some sort of abstraction (ie. AST) to reason about, one for the
> > > input (SQL Data Definition Language + Data Manipulation Language /
> > > INSERT) and one for the output (RDF). The latter is easy as we already
> > > have [1] as a W3C Recommendation. Maybe you already have something in
> > > mind for a SQL AST but here is an example of what one can expect from
> > > your first example:
> > > 
> > > [[
> > > CREATE TABLE Addresses (ID INT, city CHAR(10), state CHAR(2), PRIMARY
> > > KEY(ID))
> > > CREATE TABLE People (ID INT, fname CHAR(10), addr INT, PRIMARY KEY(ID),
> > > FOREIGN KEY(addr) REFERENCES Addresses(ID))
> > > INSERT INTO Addresses (ID, city, state) VALUES (18, "Cambridge", "MA")
> > > INSERT INTO People (ID, fname, addr) VALUES (7, "Bob", 18)
> > > INSERT INTO People (ID, fname, addr) VALUES (8, "Sue", NULL)
> > > ]]
> > > 
> > > would give something like that: (for better indentation/understanding,
> > > please see the attached file)
> > > 
> > > [[
> > > database(relation(name("Addresses"),
> > >                   header(attribute("ID") → type(int)×PrimaryKey,
> > >                          attribute("city") → type(char),
> > >                          attribute("state") → type(char)),
> > >                   data(tuple("ID" → 18, "city" → "Cambridge", "state" →
> > > "MA")))
> > >          relation(name("People"),
> > >                   header(attribute("ID") → type(int)×PrimaryKey,
> > >                          attribute("fname") → type(char),
> > >                          attribute("addr") →
> > > type(int)×ForeignKey("Adresses", "ID")),
> > >                   data(tuple("ID" → 7, "fname" → "bob", "addr" → 18),
> > >                        tuple("ID" → 8, "fname" → "sue", "addr" →
> > > null))))
> > > ]]
> > 
> > Or why not use an RDF schema of the relational model?  See attached
> > rdb-schema.ttl for one concept of the schema, and rdb-ex.ttl for the
> > database examples.  The schema is based on the simplest form of
> > relational model, which would probably support most use cases.  It could
> > as well be put into the terminology of SQL-2008 using SQL-schema,
> > SQL-table, -column, -row, etc.
> 
> I can understand your proposition to use RDF to describe the database in
> two different ways:
> 1. your RDF *is already* the result of the direct mapping. Even if it's
> only a raw description, it's already some RDF from taken from the
> description of the database and the data inside. Then, you want to use
> some other Semantic Web technologies to modify this RDF. And I agree, it
> does the job.
> 2. this is only a description in RDF for the input. In that case, it's
> only a serialization among others, and it's not suitable to reason
> about. Think about the mapping itself: by definition [1], it's a
> function from RDB to RDF. To define such a function, you need some
> abstraction.

<rdb-ex.ttl> is quite cool. Is there code to play with which generates
such data?

Per Alexandre's point, I see some RDF Schema assertions and some
data. If the goal is to use the RDFS to document the process for
generating the data, I agree we need more; basically, we need to
capture the logic of the program which generates such schema+data.

Juan has dug into how much RDFS/OWL you can derive from a relational
schema; I expect he'll have lots of interesting contributions for you.

As to the data expression, I contrast (eliding some [data]type arcs)
two representations of the first example in
  http://www.w3.org/2010/10/12-Direct-Tests
:
[[
<http://www.w3.org/2001/sw/rdb2rdf/r2rml/DBex/People#body> 
  rdfs:member
    [rdfs:member [ rdb:value 7 ; a ex:People.ID ] ,
                 [ rdb:value "Bob" ; a ex:People.fname ] ,
              [ rdb:value 18 ; a ex:Addresses.ID ] ],
    [rdfs:member [ rdb:value 8 ; rdb:type ex:People.ID ],
                 [ rdb:value "Sue" ; a ex:People.fname ] ] .

<http://www.w3.org/2001/sw/rdb2rdf/r2rml/DBex/Addresses#body> 
  rdfs:member 
    [rdfs:member [ rdb:value 18 ; a ex:Addresses.ID ] ,
                 [ rdb:value "Cambridge" ; a ex:Addresses.city ] ,
   [ rdb:value "MA" ; a ex:Addresses.state ] ] .
]] vs.
[[
@prefix People: <http://foo.example/DB/People#> .
@prefix Addresses: <http://foo.example/DB/Addresses#> .
@prefix P7: <http://foo.example/DB/People/ID.7> .
@prefix P8: <http://foo.example/DB/People/ID.8> .
@prefix A18: <http://foo.example/DB/Addresses/ID.18#_>

P7:_ People:ID 7 ; People:fname "Bob" ; A18:_ .
P8:_ People:ID 8 ; People:fname "Bob" .
A18:_ Addresses:ID 18 ; Addresses:city "Cambridge" ; Addresses:state "MA" .
]]

There are some minor differences in the formulaic creation of nodes
which affect the number of namespaces.

The first is much larger (27 vs. 8), mostly because it is of the form
  ?tupleID rdfs:member [ rdb:value ?val ; a ?attr ]
instead of
  ?tupleID ?attr ?val

The first re-uses standard properties from RDFS, good SemWeb practice.

IMO, the latter is more intuitive (though this is an age-old argument
about generic properties.


> We have to use the right tools for the right job. So the question this
> WG has to answer first is: are we using the right tools right now?
> Again, let's think about what a mapping actually is.
> 
> The RDB2RDF mapping is a *function* from RDB (the input) to RDF (the
> output).
> 
> A good candidate for an RDB concrete syntax is the common SQL language
> (description + data manipulation) as it's already out there and widely
> known. Maybe there is already an AST for that, *widely accepted* by the
> community, the researchers or written in a document as a standard. I
> don't know. If it's not the case, we have to decide which features we
> want and write our own AST that we *can actually manipulate* in the
> mapping/function.
> 
> We are lucky as the RDF abstract syntax is *already* defined in [2]. The
> AST is explicitly given in plain English.
> 
> > I hesitate to include constraints in this schema.  They would be better
> > expressed in RIF.
> 
> My feeling is I also want to see these constraints in the output. It
> would be a shame to throw them away.

I expect this area could grow a lot. Richard Cyganiac has use cases
which require identifiers for the tables and attributes. A minimal
relational schema description like:
[[
<http://www.w3.org/2001/sw/rdb2rdf/r2rml/DBex/People#header> 
    rdfs:member ex:People.ID .
]] or [[
    <http://foo.example/DB/People#_> rdfs:member People:ID .
]]
should let the community hang whatever assertions they'd need on these
identifiers. Richard, would that meet your needs?


> > > Actually, this little example raises some questions. See below.
> > > 
> > > Does R2ML want to map the Data Definition Language to some generated
> > > ontology?
> > 
> > I share Alexandre's confusion on this point.  The example doesn't define
> > a good target ontology.
> > 
> > I have only recently started following the group again.  At the end of
> > the XG I thought the direction was to use RIF to handle the mapping into
> > a well-defined domain ontology.  I couldn't find in the archives any
> > discussion about why RIF was not suitable.  Does someone have a pointer
> > to the relevant discussions?
> > 
> > > 
> > > If we take SQL types into account, do we want to know the constraints?
> > > That means: is type(char) enough or do I need type(char(10))?
> > 
> > It would be good to have a standard mechanism to refer to all the
> > standard SQL datatypes and datatype families (e.g., char(n)), perhaps
> > using OWL datatype definition facilities.
> 
> +1
> 
> > > Also, one can assume that the built AST comes from valid SQL CREATE
> > > TABLE / INSERT. But I think it would be safer to make some assumptions
> > > explicit, like for example: "for any tuple, for any (attribute → value)
> > > within this tuple, the declared type in the header and the actual type
> > > for the value MUST be the same". If it's not the case, well, the mapping
> > > does not make any sense :-) One can implement that as a light type
> > > system on top of the AST, or we can also decide to make it part of the
> > > mapping.
> > 
> > Again I would ask why not use RIF for expressing these constraints.
> 
> +1, but only for the output.
> 
> Alexandre.
> 
> [1] http://en.wikipedia.org/wiki/Map_%28mathematics%29
> [2]
> http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-Graph-syntax
> 
> 
> > 
> > Regards,
> > --Paul
> > 
> > > 
> > > > The semantics can then be expressed in Datalog.
> > > 
> > > I was wondering what people call "semantics" for such a mapping and what
> > > can kind of statements they expect in the case of R2ML?
> > > 
> > > I was taught that datalog was just a subset of Prolog, used to define
> > > new relations from other relations using deductive logic. Here, we just
> > > want to go from one model (RDB) to another one (RDF).
> > > 
> > > Alexandre.
> > > 
> > > > All the best, Ashok
> > > > 
> > > > On 10/15/2010 12:48 PM, Eric Prud'hommeaux wrote:
> > > > > * Michael Hausenblas<michael.hausenblas@deri.org>  [2010-10-15 13:44+0100]
> > > > >> All,
> > > > >>
> > > > >> Below the agenda for our next week's meeting. We now focus on addressing the
> > > > >> remaining issues (such as document structure, etc.) and the mapping
> > > > >> semantics in a high-level, non-formal way. The goal is to publish the FPWD
> > > > >> next week.
> > > > > I'd also like to get some feedback on whether
> > > > >    http://www.w3.org/2010/10/12-Direct-Tests
> > > > > match the WG's expectations of what the generated graphs would look like.
> > > > >
> > > > >> Cheers,
> > > > >>        Michael
> > > > >>
> > > > >> ---------------------------------------------------------
> > > > >> AGENDA Teleconference
> > > > >> W3C RDB2RDF Working Group telephone conference 2010-10-19
> > > > >> ----------------------------------------------------------
> > > > >> Tuesday, 19 October *16:00-17:00 UTC* Local time:
> > > > >> http://www.timeanddate.com/worldclock/fixedtime.html?month=10&day=19&year=20
> > > > >> 10&hour=16&min=00&sec=0
> > > > >> Bridge US: +1-617-761-6200 (Zakim) Conference code : 7322733# (spells
> > > > >> "RDB2RDF")
> > > > >> Duration : 60 minutes
> > > > >> -------------------------------------------------------------------
> > > > >> IRC channel : #RDB2RDF on irc.w3.org:6665 W3C IRC Web Client :
> > > > >> http://www.w3.org/2001/01/cgi-irc
> > > > >> Zakim information : http://www.w3.org/2002/01/UsingZakim
> > > > >> Zakim bridge monitor : http://www.w3.org/1998/12/bridge/Zakim.html
> > > > >> Zakim IRC bot : http://www.w3.org/2001/12/zakim-irc-bot.html
> > > > >> -------------------------------------------------------------------
> > > > >>
> > > > >> Chair: Michael
> > > > >> Scribe: Zakim, pick a victim
> > > > >>
> > > > >> 1. Admin
> > > > >> PROPOSAL: Accept the minutes of last meeting, see
> > > > >> http://www.w3.org/2010/10/12-rdb2rdf-minutes.html
> > > > >>
> > > > >> Review open actions, see
> > > > >> http://www.w3.org/2001/sw/rdb2rdf/track/actions/open
> > > > >>
> > > > >> 2. FPWD "Relational Database to RDF Mapping Language"
> > > > >> http://www.w3.org/2001/sw/rdb2rdf/r2rml/
> > > > >>
> > > > >> Comments see following threads:
> > > > >> http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2010Oct/0028.html
> > > > >>
> > > > >> 3. AOB
> > > > >>
> > > > >> Cheers,
> > > > >>        Michael
> > > > >>
> > > > >> -- 
> > > > >> Dr. Michael Hausenblas
> > > > >> LiDRC - Linked Data Research Centre
> > > > >> DERI - Digital Enterprise Research Institute
> > > > >> NUIG - National University of Ireland, Galway
> > > > >> Ireland, Europe
> > > > >> Tel. +353 91 495730
> > > > >> http://linkeddata.deri.ie/
> > > > >> http://sw-app.org/about.html
> > > > >>
> > > > >>
> > > > >>
> > > > 
> > > > 
> > > 
> 
> 

-- 
-ericP

Received on Monday, 18 October 2010 18:23:02 UTC