Re: Addressing ISSUE-64 and ISSUE-65 from Michael Hausenblas on 2011-08-16 (public-rdb2rdf-wg@w3.org from August 2011)

From: Michael Hausenblas <michael.hausenblas@deri.org>
Date: Tue, 16 Aug 2011 13:36:16 +0100
To: Eric Prud'hommeaux <eric@w3.org>, Juan Sequeda <juanfederico@gmail.com>
Cc: W3C RDB2RDF <public-rdb2rdf-wg@w3.org>
Message-Id: <B6352FDD-7AF3-47CC-A884-BBDDD16BDD1A@deri.org>
> I believe Juan, Marcelo and myself now all endorse 1:
>
> PROPOSE to close ISSUE-64 noting that the current DM definition  
> generates triples for all foreign keys even if they are on the same  
> columns.
>
> PROPOSE to close ISSUE-65 noting that attempting to unify the  
> treatment of literal triples over unary foreign keys marginally  
> complicates the definition <http://localhost/2001/sw/rdb2rdf/directMapping/explicitFK#definition 
> > and either breaks the clustering of table predicates in a single  
> namespace or introduces ','s into localnames, which are difficult to  
> represent in SPARQL and Turtle.


Very good. Thank you.

Cheers,
 Michael
--
Dr. Michael Hausenblas, Research Fellow
LiDRC - Linked Data Research Centre
DERI - Digital Enterprise Research Institute
NUIG - National University of Ireland, Galway
Ireland, Europe
Tel. +353 91 495730
http://linkeddata.deri.ie/
http://sw-app.org/about.html

On 16 Aug 2011, at 13:34, Eric Prud'hommeaux wrote:

> * Juan Sequeda <juanfederico@gmail.com> [2011-08-10 11:49-0500]
>> Below is a conversation I started with Eric which involves ISSUE-64  
>> and
>> ISSUE-65.
>>
>> Basically there are 3 options
>>
>> 1) Ignore ISSUE-64 and ISSUE-65
>> 2) Address ISSUE-65 and ignore ISSUE-64
>> 3) Address both issues.
>>
>> Each of these options have advantages/disadvantages. Eric is added  
>> more
>> comments on this.
>>
>> David, Souri,
>>
>> can you give me a real use-case where there is a need of multiple  
>> foreign
>> keys from the same columns.
>>
>>
>> At this moment, I'm leaning towards Option 2. Eric is leaning  
>> towards Option
>> 1.
>
> I believe Juan, Marcelo and myself now all endorse 1:
>
> PROPOSE to close ISSUE-64 noting that the current DM definition  
> generates triples for all foreign keys even if they are on the same  
> columns.
>
> PROPOSE to close ISSUE-65 noting that attempting to unify the  
> treatment of literal triples over unary foreign keys marginally  
> complicates the definition <http://localhost/2001/sw/rdb2rdf/directMapping/explicitFK#definition 
> > and either breaks the clustering of table predicates in a single  
> namespace or introduces ','s into localnames, which are difficult to  
> represent in SPARQL and Turtle.
>
>
>> Looking forward to this discussion to see if we can resolve this  
>> quickly.
>>
>> With this, I guess my ACTION-152 is closed.
>>
>> Juan Sequeda
>> +1-575-SEQ-UEDA
>> www.juansequeda.com
>>
>>
>> ---------- Forwarded message ----------
>> From: Eric Prud'hommeaux <eric@w3.org>
>> Date: Wed, Aug 10, 2011 at 6:10 AM
>> Subject: Re: Our different options
>> To: Juan Sequeda <juanfederico@gmail.com>
>>
>>
>> whoops, sorry, fell asleep before checking mail again.
>>
>> * Juan Sequeda <juanfederico@gmail.com> [2011-08-09 18:30-0500]
>>> Eric,
>>>
>>> What do you think about this:
>>>
>>>
>>> Consider the following database
>>>
>>> Person(pid, name, addr)
>>> Address(aid, title)
>>>
>>> where addr of Person is a FK to aid of Address
>>>
>>> Person(1, John, 2)
>>> Address(2, Cambridge)
>>
>> I like to see stuff as tables (helps me visualize):
>> ┌┤Person├─────┬──────┐   
>> ┌┤Address├──────────┐
>> │ id │ name   │ addr │  │ aid │ title       │
>> │  1 │ "John" │    2 │  │   2 │ "Cambridge" │
>> └────┴────────┴──────┘   
>> └─────┴─────────────┘
>>
>> though I think we can use the example from the current spec which  
>> will
>> help later in the conversation because we can speak of the concepts
>> and specific spec changes to the spec in the same breath:
>>
>> People(7, Bob, 18)
>> Addresses(18, Cambridge)
>>
>> ┌┤People├─────┬──────┐   
>> ┌┤Addresses├───────┐
>> │ ID │ fname  │ addr │  │ ID │ city        │
>> │  7 │ "Bob"  │   18 │  │ 18 │ "Cambridge" │
>> └────┴────────┴──────┘   
>> └────┴─────────────┘
>>
>>
>>> Option 1:
>>>
>>> Do not address ISSUE-64 or ISSUE-65.
>>>
>>> Advantage:
>>>
>>> - Keeping the DM very simple
>>> - The IRI for all predicates will be very simple:
>> <tableName#AttributeName>
>>> - IRIs are *nice*, except for foreign key IRIs which are:
>>
>>                     except for n-ary foreign key IRIs | n>1, which  
>> require
>> ','s:
>>
>>> <tableName#AttributeName1,AttributeName2,...>
>>>
>>> Disadvantage:
>>> - Not addressing ISSUE-64 and ISSUE-65
>>>
>>>
>>> The triples are the following:
>>>
>>> <People/ID=7> <People#fname> "BoB" .
>>> <People/ID=7> <People#addr> <Addresses/ID=18>
>>> <Addresses/ID=18> <Addresses#city> Cambridge
>>>
>>>
>>> Option 2:
>>>
>>> Address ISSUE-65 but not ISSUE-64
>>>
>>> Advantage
>>> - Avoid doing a join in order to get a the value of the foreign key
>>> attribute
>>> - All IRIs *nice*
>>> - If a foreign key is multi-column, then we would have a *nice*
>>> IRI <People#Department> instead of an *ugly* IRI
>> <People#deptName,deptCity>
>>> (having all the columns in the foreign key in the IRI separated by  
>>> commas)
>>>
>>> Disadvantage
>>> - Need to create two different IRIs for predicates: literal and  
>>> reference
>>
>>  - Ambiguous if there's more than one foreign key to the same  
>> table, e.g.
>>
>> ┌ 
>> ┤ 
>> People 
>> ├ 
>> ─ 
>> ─ 
>> ─ 
>> ─ 
>> ─ 
>> ┬ 
>> ──────────┬──────────┐   
>> ┌┤Addresses├───────┐
>> │ ID │ fname  │ homeaddr │ workaddr │  │ ID │  
>> city        │
>> │  7 │ "Bob"  │       18 │       18 │  │ 18 │  
>> "Cambridge" │
>> └ 
>> ─ 
>> ─ 
>> ─ 
>> ─ 
>> ┴ 
>> ─ 
>> ─ 
>> ─ 
>> ─ 
>> ─ 
>> ─ 
>> ─ 
>> ─ 
>> ┴ 
>> ──────────┴──────────┘   
>> │ 23 │ "Arlington" │
>>                                       
>> └────┴─────────────┘
>> where (homeaddr) → (Addresses, (ID))
>>      (workaddr) → (Addresses, (ID))
>>
>> (can also be exemplified in one table, but it's arguably more  
>> awkward:
>> ┌ 
>> ┤ 
>> People 
>> ├ 
>> ─ 
>> ─ 
>> ─ 
>> ─ 
>> ─ 
>> ┬ 
>> ──────┬───────────────┐
>> │ ID │ fname  │ boss │ officeManager │
>> │  1 │ "Amy"  │    8 │            13 │
>> │  7 │ "Bob"  │    8 │            13 │
>> │  8 │ "Sue"  │    1 │            13 │
>> │ 13 │ "Tom"  │    1 │            13 │
>> └ 
>> ─ 
>> ─ 
>> ─ 
>> ─ 
>> ┴ 
>> ─ 
>> ─ 
>> ─ 
>> ─ 
>> ─ 
>> ─ 
>> ─ 
>> ─ 
>> ┴ 
>> ──────┴───────────────┘
>> where (boss) → (People, (ID))
>>      (officeManager) → (People, (ID))
>> )
>>
>> I believe that there are way more cases where a table has more than
>> one foreign key to the same table than that a table has and needs  
>> more
>> than one foreign key constrain on the same columns. In databases I've
>> touched in the last week, protein-protein interaction tables come  
>> to mind.
>>
>>
>>> predicate IRIs
>>> - Not as simple anymore, but still pretty simple
>>>
>>> The two predicate IRIs are:
>>>
>>> literal predicate IRI: <tableName#attributename>
>>> reference predicate IRI: <tableName#referenceTableName>
>>>
>>> The triples are the following:
>>>
>>> <People/ID=7> <People#fname> "BoB" .
>>> <People/ID=7> <People#addr> 2
>>> <People/ID=7> <People#Addresses> <Addresses/ID=18>
>>> <Addresses/ID=18> <Addresses#city> Cambridge
>>>
>>> Option 3:
>>>
>>> Address ISSUE-64 and ISSUE-65
>>>
>>> Advantage
>>> - Avoid doing a join in order to get a the value of the foreign key
>>> attribute
>>> - Address the following use case: same column sequence may be used  
>>> for
>>> multiple foreign key constraints
>>>
>>> Disadvantage
>>> - Need to create two different IRIs for predicates: literal and  
>>> reference
>>> predicate IRIs
>>> - reference predicate IRIs are complicated and ugly:
>>>
>>> <People,Department#deptName,name;deptCity,city>
>>> or maybe
>>> <People#Department;deptName,name;deptCity,city>
>>>
>>> -----
>>>
>>> The issues of having these ugly IRIs are in prefixes for sparql  
>>> queries.
>>> With option 2, I could have a prefix
>>>
>>> PREFIX ex: <http://www.example.com/vocab/People#>
>>>
>>> SELECT *
>>> WHERE{
>>> ?s ex:Addresses ?o
>>> }
>>>
>>> With option 1 or 3, I would need to have the entire IRI in the query
>>>
>>> ?s <http://www.example.com/vocab/People#deptName,deptCity> ?o
>>>
>>> or
>>>
>>> ?s <
>>> http://www.example.com/vocab/ 
>>> People#Department;deptName,name;deptCity,city
>>>
>>> ?o
>>>
>>>
>>> Eric... what do you think about this? I'm leaning towards option 2
>>
>> Very nice summary.
>>
>> I'm still leaning heavily towards 1. I think that the current
>> situation isn't bad when you have more than one foreign key on a
>> column list. Given an access control scenario:
>>
>> CREATE TABLE Principles (ID INT PRIMARY KEY, created STRING);
>> INSERT INTO Principles (ID, created) VALUES (2, "2011-09-10");
>> INSERT INTO Principles (ID, created) VALUES (3, "2011-09-10");
>> CREATE TABLE Users (ID INT PRIMARY KEY, name STRING, FOREIGN KEY (ID)
>> REFERENCES Principles(ID));
>> INSERT INTO Users (ID, name) VALUES (2, "Bob");
>> CREATE TABLE IPAddrs (ID INT PRIMARY KEY, ip STRING, FOREIGN KEY (ID)
>> REFERENCES Principles(ID));
>> INSERT INTO IPAddrs (ID, ip) VALUES (3, "81.23.2.200");
>> CREATE TABLE Roles (ID INT PRIMARY KEY, permissions STRING, FOREIGN  
>> KEY (ID)
>> REFERENCES Users(ID), FOREIGN KEY (ID) REFERENCES Principles(ID));
>> INSERT INTO Roles (ID, permissions) VALUES (2, "rwx");
>>
>>           ┌┤Principles├─────┐
>>           │ ID │ created    │
>>           │  2 │ 2011-09-10 │
>>           │  3 │ 2011-09-10 │
>>           └────┴────────────┘
>>           /    \
>> ┌┤Users├─────┐   
>> ┌┤IPAddrs├─────────┐
>> │ ID │ name  │  │ ID │ ip          │
>> │  2 │ "Bob" │  │  3 │ 18.23.2.200 │
>> └────┴───────┘   
>> └────┴─────────────┘
>>
>> ┌┤Roles├─────────────┐
>> │ user │ permissions │
>> │    2 │       "rwx" │
>> └──────┴─────────────┘
>>
>> Roles could be argued to be a foreign key to both Users and  
>> Principles
>> (though presumably, Users.ID already has a foreign key constraint on
>> Principles.ID so (Roles.user) → (Principles (ID)) is redundant). At
>> present, the DM gives you multiple arcs for the foreign key name  
>> (ID):
>>
>> <Roles/ID.2> a <Roles> ;
>>       <Roles#ID> <Users/ID.2> , <Principles/ID.2> ;
>>       <Roles#permissions> "rwx" .
>>
>> which is just about what you're telling the system with your two
>> foreign keys. BTW, you can go to
>> <http://this-db-really.does-not-exist.org/>
>> and enter the above DDL and an identity CONSTRUCT:
>>
>> CONSTRUCT {
>> ?s ?p ?o .
>> } WHERE {
>> ?s ?p ?o .
>> }
>>
>> to see this in action.
>>
>> As to having to do a join to get the values, I don't think it's worth
>> the added user burden to optimize scalar access to foreign key  
>> values.
>>
>> I've rolled the changes into a doc called
>> http://www.w3.org/2001/sw/rdb2rdf/directMapping/explicitFK
>> and reverted EGP modulo
>>
>> Feel free to forward this to Marcelo, rdb2rdf-wg, the IRS or the
>> selective service.
>>
>>
>>> Juan Sequeda
>>> +1-575-SEQ-UEDA
>>> www.juansequeda.com
>>
>> --
>> -ericP
>
> -- 
> -ericP
>
Received on Tuesday, 16 August 2011 12:36:50 UTC