Re: ISSUE-9 Another question about Generate Blank Nodes

* ashok malhotra <ashok.malhotra@oracle.com> [2011-02-01 15:33-0800]
> Let me see if I understand the problem.
> 
> Suppose we have a table with no primary key and many columns.
> A triple would be generated for each column in the table and all the triples
> for a row would be anchored by a blank node.  Is this correct?

Yes, for
http://www.w3.org/2001/sw/rdb2rdf/wiki/R2RML_Test_Cases_v1#2duplicates0nulls
we have
┌┤IOUs├─┬───────┬────────┐
│ fname │ lname │ amount │
│ Bob   │ Smith │     30 │
│ Sue   │ Jones │     20 │
│ Bob   │ Smith │     30 │
└───────┴───────┴────────┘
The direct graph Sören proposes would look like:

<IOUs/fname=Bob,lname=Smith,amount=30:1#_> <IOUs#fname> "Bob" ;
                                           <IOUs#fname> "Smith" ;
                                           <IOUs#fname> 30.0 .
<IOUs/fname=Sue,lname=Jones,amount=20:1#_> <IOUs#fname> "Sue" ;
                                           <IOUs#fname> "Jones" ;
                                           <IOUs#fname> 20.0 .
<IOUs/fname=Bob,lname=Smith,amount=30:2#_> <IOUs#fname> "Bob" ;
                                           <IOUs#fname> "Smith" ;
                                           <IOUs#fname> 30.0 .


> So, the problem is that we have a blank node anchoring the triples for each
> row but, really, the blank nodes for each row represent different entities.  Is this correct?

Yes, with the caveat that they represent different rows in the
database. The extent to which these rows represent different entities
is a matter of database modeling, to be kept in mind when formulating
queries.


> If so, then I'm with Soeren.  We can improve the details of his solution but his
> direction seems right.

I believe a motivation is to be linked-data-friendly; that is to
identify things with URLs and to serve them when asked, as in
  GET <IOUs/fname=Bob,lname=Smith,amount=30:2#_> HTTP/1.0

The thing we specifically don't want to do is to give the world and
identifier which we have no way of resolving, either in response to
a GET, or in response to a SPARQL query e.g.:
  ASK { <IOUs/fname=Bob,lname=Smith,amount=30:2#_> <IOUs#fname> 30.0 }

If I use an Oracle rownum to tweak one of a set of identical rows, or
if the rows are not identical and I use that to tweak the identifier,
I can no longer honor queries about this row, even though the row
still exists. I believe it is easier to honor the LD requirements if
we tie the RDF identifiers directly to the SQL identifiers, and don't
advertise identifiers otherwise.


> All the best, Ashok
> 
> On 2/1/2011 2:11 PM, Sören Auer wrote:
> >Hi all,
> >
> >In todays telco several people (including Souri and me) supported the idea to abandon the use of blank notes. Is there any fundamental reason (beside philosopical views) to use blank nodes?
> >If not I suggest we just generate IRIs for all resources. Of course this does not yet solve the problem of how they should be created, but we could follow the following strategy:
> >
> >* if there is a candidate key use the candidate key,
> >* if there is no candidate key, but an internal row identifier (e.g. Virtuoso has such one always) use this row identifier,
> >* if nether one exists, generate an identifier using a hash function over all values of the row + an incremented counter in case duplicate rows exist
> >
> >Wouldn't this be a simple and effective solution to the problem?
> >
> >Best,
> >
> >Sören
> >

-- 
-ericP

Received on Wednesday, 2 February 2011 01:16:56 UTC