Addressing ISSUE-64 and ISSUE-65

Below is a conversation I started with Eric which involves ISSUE-64 and
ISSUE-65.

Basically there are 3 options

1) Ignore ISSUE-64 and ISSUE-65
2) Address ISSUE-65 and ignore ISSUE-64
3) Address both issues.

Each of these options have advantages/disadvantages. Eric is added more
comments on this.

David, Souri,

can you give me a real use-case where there is a need of multiple foreign
keys from the same columns.


At this moment, I'm leaning towards Option 2. Eric is leaning towards Option
1.

Looking forward to this discussion to see if we can resolve this quickly.

With this, I guess my ACTION-152 is closed.

Juan Sequeda
+1-575-SEQ-UEDA
www.juansequeda.com


---------- Forwarded message ----------
From: Eric Prud'hommeaux <eric@w3.org>
Date: Wed, Aug 10, 2011 at 6:10 AM
Subject: Re: Our different options
To: Juan Sequeda <juanfederico@gmail.com>


whoops, sorry, fell asleep before checking mail again.

* Juan Sequeda <juanfederico@gmail.com> [2011-08-09 18:30-0500]
> Eric,
>
> What do you think about this:
>
>
> Consider the following database
>
> Person(pid, name, addr)
> Address(aid, title)
>
> where addr of Person is a FK to aid of Address
>
> Person(1, John, 2)
> Address(2, Cambridge)

I like to see stuff as tables (helps me visualize):
┌┤Person├─────┬──────┐  ┌┤Address├──────────┐
│ id │ name   │ addr │  │ aid │ title       │
│  1 │ "John" │    2 │  │   2 │ "Cambridge" │
└────┴────────┴──────┘  └─────┴─────────────┘

though I think we can use the example from the current spec which will
help later in the conversation because we can speak of the concepts
and specific spec changes to the spec in the same breath:

People(7, Bob, 18)
Addresses(18, Cambridge)

┌┤People├─────┬──────┐  ┌┤Addresses├───────┐
│ ID │ fname  │ addr │  │ ID │ city        │
│  7 │ "Bob"  │   18 │  │ 18 │ "Cambridge" │
└────┴────────┴──────┘  └────┴─────────────┘


> Option 1:
>
> Do not address ISSUE-64 or ISSUE-65.
>
> Advantage:
>
> - Keeping the DM very simple
> - The IRI for all predicates will be very simple:
<tableName#AttributeName>
> - IRIs are *nice*, except for foreign key IRIs which are:

                     except for n-ary foreign key IRIs | n>1, which require
','s:

> <tableName#AttributeName1,AttributeName2,...>
>
> Disadvantage:
>  - Not addressing ISSUE-64 and ISSUE-65
>
>
> The triples are the following:
>
> <People/ID=7> <People#fname> "BoB" .
> <People/ID=7> <People#addr> <Addresses/ID=18>
> <Addresses/ID=18> <Addresses#city> Cambridge
>
>
> Option 2:
>
> Address ISSUE-65 but not ISSUE-64
>
> Advantage
> - Avoid doing a join in order to get a the value of the foreign key
> attribute
> - All IRIs *nice*
> - If a foreign key is multi-column, then we would have a *nice*
> IRI <People#Department> instead of an *ugly* IRI
<People#deptName,deptCity>
> (having all the columns in the foreign key in the IRI separated by commas)
>
> Disadvantage
> - Need to create two different IRIs for predicates: literal and reference

  - Ambiguous if there's more than one foreign key to the same table, e.g.

┌┤People├─────┬──────────┬──────────┐  ┌┤Addresses├───────┐
│ ID │ fname  │ homeaddr │ workaddr │  │ ID │ city        │
│  7 │ "Bob"  │       18 │       18 │  │ 18 │ "Cambridge" │
└────┴────────┴──────────┴──────────┘  │ 23 │ "Arlington" │
                                      └────┴─────────────┘
 where (homeaddr) → (Addresses, (ID))
      (workaddr) → (Addresses, (ID))

(can also be exemplified in one table, but it's arguably more awkward:
 ┌┤People├─────┬──────┬───────────────┐
 │ ID │ fname  │ boss │ officeManager │
 │  1 │ "Amy"  │    8 │            13 │
 │  7 │ "Bob"  │    8 │            13 │
 │  8 │ "Sue"  │    1 │            13 │
 │ 13 │ "Tom"  │    1 │            13 │
 └────┴────────┴──────┴───────────────┘
 where (boss) → (People, (ID))
      (officeManager) → (People, (ID))
)

I believe that there are way more cases where a table has more than
one foreign key to the same table than that a table has and needs more
than one foreign key constrain on the same columns. In databases I've
touched in the last week, protein-protein interaction tables come to mind.


> predicate IRIs
> - Not as simple anymore, but still pretty simple
>
> The two predicate IRIs are:
>
> literal predicate IRI: <tableName#attributename>
> reference predicate IRI: <tableName#referenceTableName>
>
> The triples are the following:
>
> <People/ID=7> <People#fname> "BoB" .
> <People/ID=7> <People#addr> 2
> <People/ID=7> <People#Addresses> <Addresses/ID=18>
> <Addresses/ID=18> <Addresses#city> Cambridge
>
> Option 3:
>
> Address ISSUE-64 and ISSUE-65
>
> Advantage
> - Avoid doing a join in order to get a the value of the foreign key
> attribute
> - Address the following use case: same column sequence may be used for
> multiple foreign key constraints
>
> Disadvantage
> - Need to create two different IRIs for predicates: literal and reference
> predicate IRIs
> - reference predicate IRIs are complicated and ugly:
>
> <People,Department#deptName,name;deptCity,city>
> or maybe
> <People#Department;deptName,name;deptCity,city>
>
> -----
>
> The issues of having these ugly IRIs are in prefixes for sparql queries.
> With option 2, I could have a prefix
>
> PREFIX ex: <http://www.example.com/vocab/People#>
>
> SELECT *
> WHERE{
> ?s ex:Addresses ?o
> }
>
> With option 1 or 3, I would need to have the entire IRI in the query
>
> ?s <http://www.example.com/vocab/People#deptName,deptCity> ?o
>
> or
>
> ?s <
> http://www.example.com/vocab/People#Department;deptName,name;deptCity,city
>
> ?o
>
>
> Eric... what do you think about this? I'm leaning towards option 2

Very nice summary.

I'm still leaning heavily towards 1. I think that the current
situation isn't bad when you have more than one foreign key on a
column list. Given an access control scenario:

CREATE TABLE Principles (ID INT PRIMARY KEY, created STRING);
INSERT INTO Principles (ID, created) VALUES (2, "2011-09-10");
INSERT INTO Principles (ID, created) VALUES (3, "2011-09-10");
CREATE TABLE Users (ID INT PRIMARY KEY, name STRING, FOREIGN KEY (ID)
REFERENCES Principles(ID));
INSERT INTO Users (ID, name) VALUES (2, "Bob");
CREATE TABLE IPAddrs (ID INT PRIMARY KEY, ip STRING, FOREIGN KEY (ID)
REFERENCES Principles(ID));
INSERT INTO IPAddrs (ID, ip) VALUES (3, "81.23.2.200");
CREATE TABLE Roles (ID INT PRIMARY KEY, permissions STRING, FOREIGN KEY (ID)
REFERENCES Users(ID), FOREIGN KEY (ID) REFERENCES Principles(ID));
INSERT INTO Roles (ID, permissions) VALUES (2, "rwx");

           ┌┤Principles├─────┐
           │ ID │ created    │
           │  2 │ 2011-09-10 │
           │  3 │ 2011-09-10 │
           └────┴────────────┘
           /    \
┌┤Users├─────┐  ┌┤IPAddrs├─────────┐
│ ID │ name  │  │ ID │ ip          │
│  2 │ "Bob" │  │  3 │ 18.23.2.200 │
└────┴───────┘  └────┴─────────────┘

┌┤Roles├─────────────┐
│ user │ permissions │
│    2 │       "rwx" │
└──────┴─────────────┘

Roles could be argued to be a foreign key to both Users and Principles
(though presumably, Users.ID already has a foreign key constraint on
Principles.ID so (Roles.user) → (Principles (ID)) is redundant). At
present, the DM gives you multiple arcs for the foreign key name (ID):

 <Roles/ID.2> a <Roles> ;
       <Roles#ID> <Users/ID.2> , <Principles/ID.2> ;
       <Roles#permissions> "rwx" .

which is just about what you're telling the system with your two
foreign keys. BTW, you can go to
 <http://this-db-really.does-not-exist.org/>
and enter the above DDL and an identity CONSTRUCT:

CONSTRUCT {
 ?s ?p ?o .
} WHERE {
 ?s ?p ?o .
}

to see this in action.

As to having to do a join to get the values, I don't think it's worth
the added user burden to optimize scalar access to foreign key values.

I've rolled the changes into a doc called
 http://www.w3.org/2001/sw/rdb2rdf/directMapping/explicitFK
and reverted EGP modulo

Feel free to forward this to Marcelo, rdb2rdf-wg, the IRS or the
selective service.


> Juan Sequeda
> +1-575-SEQ-UEDA
> www.juansequeda.com

--
-ericP

Received on Wednesday, 10 August 2011 16:50:20 UTC