Exposing Relational Data as Linked Data from Ashok Malhotra on 2013-12-06 (public-rdb2rdf-comments@w3.org from December 2013)

From: Ashok Malhotra <ashok.malhotra@oracle.com>
Date: Thu, 05 Dec 2013 19:49:04 -0500
To: public-rdb2rdf-comments@w3.org
Message-ID: <52A11F00.9040405@oracle.com>
This is an edited version of a conversation between Eric P, Alexandre Bertails and Ashok Malhotra.  I'm posting it here for archival purposes.

Ashok:
There has been a suggestion that we expose Relational data as Linked data.
Seems like an obvious idea.

Alexandre:
Yes, I'm pretty sure Eric already had that in mind. I certainly did
think about it several times

Hopefully, we were able to secure Eric's idea to use relative URIs for
the mapping, so this is possible

Also, the row nodes being mapped to uri with '/' should make things
easier.

So we just need to map the database to the stem URI and the Direct
Mapping should unfold naturally.

=====

I think we'll need to define to the POST and DELETE requests in terms
of SQL queries. Not difficult but it will require some thinking.

Also, if PATCH gets to the specification (or is part of another
specification), this will be a bit more tricky, depending on the
expressive power of PATCH.

Eric:
I'd say that the DM was written to be compatible with a set of practices which have at various times been called "RDF", "Semantic Web" and "Linked Data". I'd characterize your interesting proposal below as "publishing tables as LDPCs".

Hmm, looking at ex 1, using "/…" as the stem supplied to the DM, we have:
<http://www.w3.org/TR/rdb-direct-mapping/#lead-ex>:

    @prefix xsd:<http://www.w3.org/2001/XMLSchema#>   .

    </…/People/ID=7> rdf:type </…/People> .
    </…/People/ID=7> </…/People#ID> 7 .
    </…/People/ID=7> </…/People#fname> "Bob" .
    </…/People/ID=7> </…/People#addr> 18 .
    </…/People/ID=7> </…/People#ref-addr> </…/Addresses/ID=18> .
    </…/People/ID=8> rdf:type </…/People> .
    </…/People/ID=8> </…/People#ID> 8 .
    </…/People/ID=8> </…/People#fname> "Sue" .
    </…/Addresses/ID=18> rdf:type </…/Addresses> .
    </…/Addresses/ID=18> </…/Addresses#ID> 18 .
    </…/Addresses/ID=18> </…/Addresses#city> "Cambridge" .
    </…/Addresses/ID=18> </…/Addresses#state> "MA" .

, we'd have two containers: </…/People> and </…/Addresses>.  the row
identifiers can legitimately be considered information resources so we
can use an ldp:DirectContainer.

The data in /…/People is:
    @prefix dcterms:<http://purl.org/dc/terms/>   .
    @prefix dm:<http://www.w3.org/TR/rdb-direct-mapping/#>   .
    @prefix ldp:<http://www.w3.org/ns/ldp#>   .
    </…/People>
       a ldp:DirectContainer ;
       ldp:containerResource </…/People> ;
       ldp:containsRelation dm:row ;
       dm:row </…/People/ID=7>, </…/People/ID=8> .

and   /…/Addresses has:
    </…/Addresses>
       a ldp:DirectContainer ;
       ldp:containerResource </…/Addresses> ;
       ldp:containsRelation dm:row ;
       dm:row </…/Addresses/ID=18> .
The data in /…/People/ID=7 is
    </…/People/ID=7> rdf:type </…/People> .
    </…/People/ID=7> </…/People#ID> 7 .
    </…/People/ID=7> </…/People#fname> "Bob" .
    </…/People/ID=7> </…/People#addr> 18 .
    </…/People/ID=7> </…/People#ref-addr> </…/Addresses/ID=18> .

(That unfortunate type arc is problematic, potentially forcing us to
find another name for the container.)

This has the effect of taking the one DM for a database and splitting
it into tables+rows graphs, nice for Tabulator or Marbles but makes
queries chatty. If we use a perhaps naively simple ('cause it supports
no changes or provenance) mapping of a set of resources to a set of
graphs with those same names, we can construct a not-too pathological
example like:

    # Who lives in the same city?
    # => what two People resources have addrs with the same /…/Addresses#city?
    PREFIX dcterms:<http://purl.org/dc/terms/>
    PREFIX dm:<http://www.w3.org/TR/rdb-direct-mapping/#>
    PREFIX ldp:<http://www.w3.org/ns/ldp#>

    SELECT ?name ?city WHERE {
      GRAPH </…/People> {
        ?People dm:row ?p1, ?p2
      }
      GRAPH ?p1 {
        ?p1 </…/People#fname> ?name ;
            </…/People#addr> ?refaddr1
      }
      GRAPH ?p2 {
        ?p2 </…/People#fname> ?name ;
            </…/People#addr> ?refaddr2
      }
      GRAPH ?refaddr1 {
        ?refaddr1 </…/Addresses#city> ?city
      }
      GRAPH ?refaddr2 {
        ?refaddr2 </…/Addresses#city> ?city
      }
    }

vs. the current one-graph view:

    SELECT ?name ?city WHERE {
      ?p1 </…/People#fname> ?name ;
          </…/People#addr> ?refaddr1
      ?p2 </…/People#fname> ?name ;
          </…/People#addr> ?refaddr2
      ?refaddr1 </…/Addresses#city> ?city
      ?refaddr2 </…/Addresses#city> ?city
    }

So that leaves with a browsing and update protocol which would
probably be useful when there was no need to synchronize multiple
records at the same time. It'd be interesting to try this out on some
of the OSLC use cases which are backed by RDB systems which never
treat the server data as triples apart from printing LDP queries and
parsing updates out of PUTs.

Ashok:
I don't like1 table -> 1 container.  It would be better if we did the join on on the PK, FK
and created a single container but, perhaps, that's for the R2RML extension.

Eric:
Does that mean we turn:

      People             Addresses 
┌─────────────┐   ┌───────────────────┐
│ID│fname│addr│   │ID│   city   │state│
│7 │Sam  │17  │   │17│NYC       │NY   │
│7 │Bob  │18  │   │18│Cambridge │MA   │
│8 │Sue  │18  │   └──┴──────────┴─────┘
│9 │Joe  │18  │
└──┴─────┴────┘

into a join view (resolving potential name conflicts) like:
      PeopleAddresses
┌───────────────────────────┐
│ ID │fname│   city   │state│
│7-17│Sam  │NYC       │NY   │
│7-18│Bob  │Cambridge │MA   │
│8-18│Sue  │Cambridge │MA   │
│9-18│Joe  │Cambridge │MA   │
└────┴─────┴──────────┴─────┘

and assert LDPRs for these guys:
    </…/PeopleAddresses/ID=7-17>
    </…/PeopleAddresses/ID=7-18>
    </…/PeopleAddresses/ID=8-18>
    </…/PeopleAddresses/ID=9-18>

-- 
All the best, Ashok
Received on Friday, 6 December 2013 00:49:36 UTC