Re: Addressing ISSUE-64 and ISSUE-65 from Richard Cyganiak on 2011-08-10 (public-rdb2rdf-wg@w3.org from August 2011)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Wed, 10 Aug 2011 19:22:32 +0100
To: Juan Sequeda <juanfederico@gmail.com>
Cc: public-rdb2rdf-wg@w3.org
Message-Id: <FF46493F-377A-4F6D-83A4-FB41E570921C@cyganiak.de>
Both ISSUE-64 and ISSUE-65 could be solved by creating the property URI for the foreign key triple in a different way.

The current property URI is:
http://base_uri/tablename#column1,column2

Souri's proposal is:
http://base_uri/tablename#column1,column2,referenced_table,column3,column4

Here's another proposal:
http://base_uri/constraint_name

All constraints in SQL are named. These names are implementation-dependent and often assigned automatically without the user's knowledge, so they may not make very nice properties. But on the other hand, this approach ensures uniqueness in a simple way, and it *is* actually the most direct and literal mapping from database to RDF, so it seems to be in the spirit of the direct mapping.

Best,
Richard
 

On 10 Aug 2011, at 17:49, Juan Sequeda wrote:

> Below is a conversation I started with Eric which involves ISSUE-64 and ISSUE-65.
> 
> Basically there are 3 options
> 
> 1) Ignore ISSUE-64 and ISSUE-65
> 2) Address ISSUE-65 and ignore ISSUE-64
> 3) Address both issues.
> 
> Each of these options have advantages/disadvantages. Eric is added more comments on this.
> 
> David, Souri,
> 
> can you give me a real use-case where there is a need of multiple foreign keys from the same columns. 
> 
> 
> At this moment, I'm leaning towards Option 2. Eric is leaning towards Option 1. 
> 
> Looking forward to this discussion to see if we can resolve this quickly.
> 
> With this, I guess my ACTION-152 is closed.
> 
> Juan Sequeda
> +1-575-SEQ-UEDA
> www.juansequeda.com
> 
> 
> ---------- Forwarded message ----------
> From: Eric Prud'hommeaux <eric@w3.org>
> Date: Wed, Aug 10, 2011 at 6:10 AM
> Subject: Re: Our different options
> To: Juan Sequeda <juanfederico@gmail.com>
> 
> 
> whoops, sorry, fell asleep before checking mail again.
> 
> * Juan Sequeda <juanfederico@gmail.com> [2011-08-09 18:30-0500]
> > Eric,
> >
> > What do you think about this:
> >
> >
> > Consider the following database
> >
> > Person(pid, name, addr)
> > Address(aid, title)
> >
> > where addr of Person is a FK to aid of Address
> >
> > Person(1, John, 2)
> > Address(2, Cambridge)
> 
> I like to see stuff as tables (helps me visualize):
> ┌┤Person├─────┬──────┐  ┌┤Address├──────────┐
> │ id │ name   │ addr │  │ aid │ title       │
> │  1 │ "John" │    2 │  │   2 │ "Cambridge" │
> └────┴────────┴──────┘  └─────┴─────────────┘
> 
> though I think we can use the example from the current spec which will
> help later in the conversation because we can speak of the concepts
> and specific spec changes to the spec in the same breath:
> 
> People(7, Bob, 18)
> Addresses(18, Cambridge)
> 
> ┌┤People├─────┬──────┐  ┌┤Addresses├───────┐
> │ ID │ fname  │ addr │  │ ID │ city        │
> │  7 │ "Bob"  │   18 │  │ 18 │ "Cambridge" │
> └────┴────────┴──────┘  └────┴─────────────┘
> 
> 
> > Option 1:
> >
> > Do not address ISSUE-64 or ISSUE-65.
> >
> > Advantage:
> >
> > - Keeping the DM very simple
> > - The IRI for all predicates will be very simple: <tableName#AttributeName>
> > - IRIs are *nice*, except for foreign key IRIs which are:
> 
>                     except for n-ary foreign key IRIs | n>1, which require ','s:
> 
> > <tableName#AttributeName1,AttributeName2,...>
> >
> > Disadvantage:
> >  - Not addressing ISSUE-64 and ISSUE-65
> >
> >
> > The triples are the following:
> >
> > <People/ID=7> <People#fname> "BoB" .
> > <People/ID=7> <People#addr> <Addresses/ID=18>
> > <Addresses/ID=18> <Addresses#city> Cambridge
> >
> >
> > Option 2:
> >
> > Address ISSUE-65 but not ISSUE-64
> >
> > Advantage
> > - Avoid doing a join in order to get a the value of the foreign key
> > attribute
> > - All IRIs *nice*
> > - If a foreign key is multi-column, then we would have a *nice*
> > IRI <People#Department> instead of an *ugly* IRI <People#deptName,deptCity>
> > (having all the columns in the foreign key in the IRI separated by commas)
> >
> > Disadvantage
> > - Need to create two different IRIs for predicates: literal and reference
> 
>  - Ambiguous if there's more than one foreign key to the same table, e.g.
> 
> ┌┤People├─────┬──────────┬──────────┐  ┌┤Addresses├───────┐
> │ ID │ fname  │ homeaddr │ workaddr │  │ ID │ city        │
> │  7 │ "Bob"  │       18 │       18 │  │ 18 │ "Cambridge" │
> └────┴────────┴──────────┴──────────┘  │ 23 │ "Arlington" │
>                                       └────┴─────────────┘
>  where (homeaddr) → (Addresses, (ID))
>       (workaddr) → (Addresses, (ID))
> 
> (can also be exemplified in one table, but it's arguably more awkward:
>  ┌┤People├─────┬──────┬───────────────┐
>  │ ID │ fname  │ boss │ officeManager │
>  │  1 │ "Amy"  │    8 │            13 │
>  │  7 │ "Bob"  │    8 │            13 │
>  │  8 │ "Sue"  │    1 │            13 │
>  │ 13 │ "Tom"  │    1 │            13 │
>  └────┴────────┴──────┴───────────────┘
>  where (boss) → (People, (ID))
>       (officeManager) → (People, (ID))
> )
> 
> I believe that there are way more cases where a table has more than
> one foreign key to the same table than that a table has and needs more
> than one foreign key constrain on the same columns. In databases I've
> touched in the last week, protein-protein interaction tables come to mind.
> 
> 
> > predicate IRIs
> > - Not as simple anymore, but still pretty simple
> >
> > The two predicate IRIs are:
> >
> > literal predicate IRI: <tableName#attributename>
> > reference predicate IRI: <tableName#referenceTableName>
> >
> > The triples are the following:
> >
> > <People/ID=7> <People#fname> "BoB" .
> > <People/ID=7> <People#addr> 2
> > <People/ID=7> <People#Addresses> <Addresses/ID=18>
> > <Addresses/ID=18> <Addresses#city> Cambridge
> >
> > Option 3:
> >
> > Address ISSUE-64 and ISSUE-65
> >
> > Advantage
> > - Avoid doing a join in order to get a the value of the foreign key
> > attribute
> > - Address the following use case: same column sequence may be used for
> > multiple foreign key constraints
> >
> > Disadvantage
> > - Need to create two different IRIs for predicates: literal and reference
> > predicate IRIs
> > - reference predicate IRIs are complicated and ugly:
> >
> > <People,Department#deptName,name;deptCity,city>
> > or maybe
> > <People#Department;deptName,name;deptCity,city>
> >
> > -----
> >
> > The issues of having these ugly IRIs are in prefixes for sparql queries.
> > With option 2, I could have a prefix
> >
> > PREFIX ex: <http://www.example.com/vocab/People#>
> >
> > SELECT *
> > WHERE{
> > ?s ex:Addresses ?o
> > }
> >
> > With option 1 or 3, I would need to have the entire IRI in the query
> >
> > ?s <http://www.example.com/vocab/People#deptName,deptCity> ?o
> >
> > or
> >
> > ?s <
> > http://www.example.com/vocab/People#Department;deptName,name;deptCity,city>
> > ?o
> >
> >
> > Eric... what do you think about this? I'm leaning towards option 2
> 
> Very nice summary.
> 
> I'm still leaning heavily towards 1. I think that the current
> situation isn't bad when you have more than one foreign key on a
> column list. Given an access control scenario:
> 
> CREATE TABLE Principles (ID INT PRIMARY KEY, created STRING);
> INSERT INTO Principles (ID, created) VALUES (2, "2011-09-10");
> INSERT INTO Principles (ID, created) VALUES (3, "2011-09-10");
> CREATE TABLE Users (ID INT PRIMARY KEY, name STRING, FOREIGN KEY (ID) REFERENCES Principles(ID));
> INSERT INTO Users (ID, name) VALUES (2, "Bob");
> CREATE TABLE IPAddrs (ID INT PRIMARY KEY, ip STRING, FOREIGN KEY (ID) REFERENCES Principles(ID));
> INSERT INTO IPAddrs (ID, ip) VALUES (3, "81.23.2.200");
> CREATE TABLE Roles (ID INT PRIMARY KEY, permissions STRING, FOREIGN KEY (ID) REFERENCES Users(ID), FOREIGN KEY (ID) REFERENCES Principles(ID));
> INSERT INTO Roles (ID, permissions) VALUES (2, "rwx");
> 
>            ┌┤Principles├─────┐
>            │ ID │ created    │
>            │  2 │ 2011-09-10 │
>            │  3 │ 2011-09-10 │
>            └────┴────────────┘
>            /    \
> ┌┤Users├─────┐  ┌┤IPAddrs├─────────┐
> │ ID │ name  │  │ ID │ ip          │
> │  2 │ "Bob" │  │  3 │ 18.23.2.200 │
> └────┴───────┘  └────┴─────────────┘
> 
> ┌┤Roles├─────────────┐
> │ user │ permissions │
> │    2 │       "rwx" │
> └──────┴─────────────┘
> 
> Roles could be argued to be a foreign key to both Users and Principles
> (though presumably, Users.ID already has a foreign key constraint on
> Principles.ID so (Roles.user) → (Principles (ID)) is redundant). At
> present, the DM gives you multiple arcs for the foreign key name (ID):
> 
>  <Roles/ID.2> a <Roles> ;
>        <Roles#ID> <Users/ID.2> , <Principles/ID.2> ;
>        <Roles#permissions> "rwx" .
> 
> which is just about what you're telling the system with your two
> foreign keys. BTW, you can go to
>  <http://this-db-really.does-not-exist.org/>
> and enter the above DDL and an identity CONSTRUCT:
> 
> CONSTRUCT {
>  ?s ?p ?o .
> } WHERE {
>  ?s ?p ?o .
> }
> 
> to see this in action.
> 
> As to having to do a join to get the values, I don't think it's worth
> the added user burden to optimize scalar access to foreign key values.
> 
> I've rolled the changes into a doc called
>  http://www.w3.org/2001/sw/rdb2rdf/directMapping/explicitFK
> and reverted EGP modulo
> 
> Feel free to forward this to Marcelo, rdb2rdf-wg, the IRS or the
> selective service.
> 
> 
> > Juan Sequeda
> > +1-575-SEQ-UEDA
> > www.juansequeda.com
> 
> --
> -ericP
>
Received on Wednesday, 10 August 2011 18:23:01 UTC