Re: Addressing ISSUE-64 and ISSUE-65 from Juan Sequeda on 2011-08-17 (public-rdb2rdf-wg@w3.org from August 2011)

From: Juan Sequeda <juanfederico@gmail.com>
Date: Wed, 17 Aug 2011 15:36:03 -0500
To: W3C RDB2RDF <public-rdb2rdf-wg@w3.org>
Message-ID: <CAMVTWDxf7P=Fhe_jYHdxnGWzhXFwETvFba3u2S5E9u8GrttQwA@mail.gmail.com>
Group,

Per the last telcon, our mandate was to generate formula which avoid the odd
wrinkle of not exposing the scalar value for unary foreign keys,
except indirectly as an attribute of the object of the property named for
the foreign key column. All approaches that we conceived up involved either:


 • forming a name in a different namespace than the rest of the properties:

     <People,Addresses#addr,ID> vs. <People#addr>


 • adding punctuation to the localName which would have the same effect of
precluding the use of the namespace prefix:

     <People#addr,Addresses,ID> vs. <People#addr>


 • prefixing all IRI localnames with a syntactic discriminator:

     <People/ID=7> <People#Raddr> <Addresses/ID=18> .

     <People/ID=7> <People#Saddr> 18 .


All of these seemed to be more counterintuitive than the original assymetry
around unary foreign keys so we recommend that the spec stay as it is.
Notice that is still possible to know if a property IRI is for a foreign key
by looking at the object of the triple; if it's an IRI, then it's a foreign
key, otherwise, it's not.

We would like to send the DM off to last call. We plan to work on a separate
document mapping the relational schema to RDFS/OWL. A direct mapping of the
schema to RDFS/OWL would indicate that

<People#name> rdf:type owl:DatatypeProperty

<People#addr> rdf:type owl:ObjectProperty

Where anything that is an ObjectProperty comes from a foreign key.

This seems to be an adequate way of addressing this issue, instead of
relying on the creating of complex IRIs.


Juan Sequeda
+1-575-SEQ-UEDA
www.juansequeda.com


On Tue, Aug 16, 2011 at 7:36 AM, Michael Hausenblas <
michael.hausenblas@deri.org> wrote:

>
>  I believe Juan, Marcelo and myself now all endorse 1:
>>
>> PROPOSE to close ISSUE-64 noting that the current DM definition generates
>> triples for all foreign keys even if they are on the same columns.
>>
>> PROPOSE to close ISSUE-65 noting that attempting to unify the treatment of
>> literal triples over unary foreign keys marginally complicates the
>> definition <http://localhost/2001/sw/**rdb2rdf/directMapping/**
>> explicitFK#definition<http://localhost/2001/sw/rdb2rdf/directMapping/explicitFK#definition>>
>> and either breaks the clustering of table predicates in a single namespace
>> or introduces ','s into localnames, which are difficult to represent in
>> SPARQL and Turtle.
>>
>
>
> Very good. Thank you.
>
> Cheers,
>        Michael
> --
> Dr. Michael Hausenblas, Research Fellow
> LiDRC - Linked Data Research Centre
> DERI - Digital Enterprise Research Institute
> NUIG - National University of Ireland, Galway
> Ireland, Europe
> Tel. +353 91 495730
> http://linkeddata.deri.ie/
> http://sw-app.org/about.html
>
>
> On 16 Aug 2011, at 13:34, Eric Prud'hommeaux wrote:
>
>  * Juan Sequeda <juanfederico@gmail.com> [2011-08-10 11:49-0500]
>>
>>> Below is a conversation I started with Eric which involves ISSUE-64 and
>>> ISSUE-65.
>>>
>>> Basically there are 3 options
>>>
>>> 1) Ignore ISSUE-64 and ISSUE-65
>>> 2) Address ISSUE-65 and ignore ISSUE-64
>>> 3) Address both issues.
>>>
>>> Each of these options have advantages/disadvantages. Eric is added more
>>> comments on this.
>>>
>>> David, Souri,
>>>
>>> can you give me a real use-case where there is a need of multiple foreign
>>> keys from the same columns.
>>>
>>>
>>> At this moment, I'm leaning towards Option 2. Eric is leaning towards
>>> Option
>>> 1.
>>>
>>
>> I believe Juan, Marcelo and myself now all endorse 1:
>>
>> PROPOSE to close ISSUE-64 noting that the current DM definition generates
>> triples for all foreign keys even if they are on the same columns.
>>
>> PROPOSE to close ISSUE-65 noting that attempting to unify the treatment of
>> literal triples over unary foreign keys marginally complicates the
>> definition <http://localhost/2001/sw/**rdb2rdf/directMapping/**
>> explicitFK#definition<http://localhost/2001/sw/rdb2rdf/directMapping/explicitFK#definition>>
>> and either breaks the clustering of table predicates in a single namespace
>> or introduces ','s into localnames, which are difficult to represent in
>> SPARQL and Turtle.
>>
>>
>>  Looking forward to this discussion to see if we can resolve this quickly.
>>>
>>> With this, I guess my ACTION-152 is closed.
>>>
>>> Juan Sequeda
>>> +1-575-SEQ-UEDA
>>> www.juansequeda.com
>>>
>>>
>>> ---------- Forwarded message ----------
>>> From: Eric Prud'hommeaux <eric@w3.org>
>>> Date: Wed, Aug 10, 2011 at 6:10 AM
>>> Subject: Re: Our different options
>>> To: Juan Sequeda <juanfederico@gmail.com>
>>>
>>>
>>> whoops, sorry, fell asleep before checking mail again.
>>>
>>> * Juan Sequeda <juanfederico@gmail.com> [2011-08-09 18:30-0500]
>>>
>>>> Eric,
>>>>
>>>> What do you think about this:
>>>>
>>>>
>>>> Consider the following database
>>>>
>>>> Person(pid, name, addr)
>>>> Address(aid, title)
>>>>
>>>> where addr of Person is a FK to aid of Address
>>>>
>>>> Person(1, John, 2)
>>>> Address(2, Cambridge)
>>>>
>>>
>>> I like to see stuff as tables (helps me visualize):
>>> ┌┤Person├─────┬──────┐  ┌┤Address├──────────┐
>>> │ id │ name   │ addr │  │ aid │ title       │
>>> │  1 │ "John" │    2 │  │   2 │ "Cambridge" │
>>> └────┴────────┴──────┘  └─────┴─────────────┘
>>>
>>> though I think we can use the example from the current spec which will
>>> help later in the conversation because we can speak of the concepts
>>> and specific spec changes to the spec in the same breath:
>>>
>>> People(7, Bob, 18)
>>> Addresses(18, Cambridge)
>>>
>>> ┌┤People├─────┬──────┐  ┌┤Addresses├───────┐
>>> │ ID │ fname  │ addr │  │ ID │ city        │
>>> │  7 │ "Bob"  │   18 │  │ 18 │ "Cambridge" │
>>> └────┴────────┴──────┘  └────┴─────────────┘
>>>
>>>
>>>  Option 1:
>>>>
>>>> Do not address ISSUE-64 or ISSUE-65.
>>>>
>>>> Advantage:
>>>>
>>>> - Keeping the DM very simple
>>>> - The IRI for all predicates will be very simple:
>>>>
>>> <tableName#AttributeName>
>>>
>>>> - IRIs are *nice*, except for foreign key IRIs which are:
>>>>
>>>
>>>                    except for n-ary foreign key IRIs | n>1, which require
>>> ','s:
>>>
>>>  <tableName#AttributeName1,**AttributeName2,...>
>>>>
>>>> Disadvantage:
>>>> - Not addressing ISSUE-64 and ISSUE-65
>>>>
>>>>
>>>> The triples are the following:
>>>>
>>>> <People/ID=7> <People#fname> "BoB" .
>>>> <People/ID=7> <People#addr> <Addresses/ID=18>
>>>> <Addresses/ID=18> <Addresses#city> Cambridge
>>>>
>>>>
>>>> Option 2:
>>>>
>>>> Address ISSUE-65 but not ISSUE-64
>>>>
>>>> Advantage
>>>> - Avoid doing a join in order to get a the value of the foreign key
>>>> attribute
>>>> - All IRIs *nice*
>>>> - If a foreign key is multi-column, then we would have a *nice*
>>>> IRI <People#Department> instead of an *ugly* IRI
>>>>
>>> <People#deptName,deptCity>
>>>
>>>> (having all the columns in the foreign key in the IRI separated by
>>>> commas)
>>>>
>>>> Disadvantage
>>>> - Need to create two different IRIs for predicates: literal and
>>>> reference
>>>>
>>>
>>>  - Ambiguous if there's more than one foreign key to the same table, e.g.
>>>
>>> ┌┤People├─────┬──────────┬────**──────┐  ┌┤Addresses├───────┐
>>> │ ID │ fname  │ homeaddr │ workaddr │  │ ID │ city        │
>>> │  7 │ "Bob"  │       18 │       18 │  │ 18 │ "Cambridge" │
>>> └────┴────────┴──────────┴────**──────┘  │ 23 │ "Arlington" │
>>>                                     └────┴─────────────┘
>>> where (homeaddr) → (Addresses, (ID))
>>>     (workaddr) → (Addresses, (ID))
>>>
>>> (can also be exemplified in one table, but it's arguably more awkward:
>>> ┌┤People├─────┬──────┬────────**───────┐
>>> │ ID │ fname  │ boss │ officeManager │
>>> │  1 │ "Amy"  │    8 │            13 │
>>> │  7 │ "Bob"  │    8 │            13 │
>>> │  8 │ "Sue"  │    1 │            13 │
>>> │ 13 │ "Tom"  │    1 │            13 │
>>> └────┴────────┴──────┴────────**───────┘
>>> where (boss) → (People, (ID))
>>>     (officeManager) → (People, (ID))
>>> )
>>>
>>> I believe that there are way more cases where a table has more than
>>> one foreign key to the same table than that a table has and needs more
>>> than one foreign key constrain on the same columns. In databases I've
>>> touched in the last week, protein-protein interaction tables come to
>>> mind.
>>>
>>>
>>>  predicate IRIs
>>>> - Not as simple anymore, but still pretty simple
>>>>
>>>> The two predicate IRIs are:
>>>>
>>>> literal predicate IRI: <tableName#attributename>
>>>> reference predicate IRI: <tableName#referenceTableName>
>>>>
>>>> The triples are the following:
>>>>
>>>> <People/ID=7> <People#fname> "BoB" .
>>>> <People/ID=7> <People#addr> 2
>>>> <People/ID=7> <People#Addresses> <Addresses/ID=18>
>>>> <Addresses/ID=18> <Addresses#city> Cambridge
>>>>
>>>> Option 3:
>>>>
>>>> Address ISSUE-64 and ISSUE-65
>>>>
>>>> Advantage
>>>> - Avoid doing a join in order to get a the value of the foreign key
>>>> attribute
>>>> - Address the following use case: same column sequence may be used for
>>>> multiple foreign key constraints
>>>>
>>>> Disadvantage
>>>> - Need to create two different IRIs for predicates: literal and
>>>> reference
>>>> predicate IRIs
>>>> - reference predicate IRIs are complicated and ugly:
>>>>
>>>> <People,Department#deptName,**name;deptCity,city>
>>>> or maybe
>>>> <People#Department;deptName,**name;deptCity,city>
>>>>
>>>> -----
>>>>
>>>> The issues of having these ugly IRIs are in prefixes for sparql queries.
>>>> With option 2, I could have a prefix
>>>>
>>>> PREFIX ex: <http://www.example.com/vocab/**People#<http://www.example.com/vocab/People#>
>>>> >
>>>>
>>>> SELECT *
>>>> WHERE{
>>>> ?s ex:Addresses ?o
>>>> }
>>>>
>>>> With option 1 or 3, I would need to have the entire IRI in the query
>>>>
>>>> ?s <http://www.example.com/vocab/**People#deptName,deptCity<http://www.example.com/vocab/People#deptName,deptCity>>
>>>> ?o
>>>>
>>>> or
>>>>
>>>> ?s <
>>>> http://www.example.com/vocab/**People#Department;deptName,**
>>>> name;deptCity,city<http://www.example.com/vocab/People#Department;deptName,name;deptCity,city>
>>>>
>>>> ?o
>>>>
>>>>
>>>> Eric... what do you think about this? I'm leaning towards option 2
>>>>
>>>
>>> Very nice summary.
>>>
>>> I'm still leaning heavily towards 1. I think that the current
>>> situation isn't bad when you have more than one foreign key on a
>>> column list. Given an access control scenario:
>>>
>>> CREATE TABLE Principles (ID INT PRIMARY KEY, created STRING);
>>> INSERT INTO Principles (ID, created) VALUES (2, "2011-09-10");
>>> INSERT INTO Principles (ID, created) VALUES (3, "2011-09-10");
>>> CREATE TABLE Users (ID INT PRIMARY KEY, name STRING, FOREIGN KEY (ID)
>>> REFERENCES Principles(ID));
>>> INSERT INTO Users (ID, name) VALUES (2, "Bob");
>>> CREATE TABLE IPAddrs (ID INT PRIMARY KEY, ip STRING, FOREIGN KEY (ID)
>>> REFERENCES Principles(ID));
>>> INSERT INTO IPAddrs (ID, ip) VALUES (3, "81.23.2.200");
>>> CREATE TABLE Roles (ID INT PRIMARY KEY, permissions STRING, FOREIGN KEY
>>> (ID)
>>> REFERENCES Users(ID), FOREIGN KEY (ID) REFERENCES Principles(ID));
>>> INSERT INTO Roles (ID, permissions) VALUES (2, "rwx");
>>>
>>>          ┌┤Principles├─────┐
>>>          │ ID │ created    │
>>>          │  2 │ 2011-09-10 │
>>>          │  3 │ 2011-09-10 │
>>>          └────┴────────────┘
>>>          /    \
>>> ┌┤Users├─────┐  ┌┤IPAddrs├─────────┐
>>> │ ID │ name  │  │ ID │ ip          │
>>> │  2 │ "Bob" │  │  3 │ 18.23.2.200 │
>>> └────┴───────┘  └────┴─────────────┘
>>>
>>> ┌┤Roles├─────────────┐
>>> │ user │ permissions │
>>> │    2 │       "rwx" │
>>> └──────┴─────────────┘
>>>
>>> Roles could be argued to be a foreign key to both Users and Principles
>>> (though presumably, Users.ID already has a foreign key constraint on
>>> Principles.ID so (Roles.user) → (Principles (ID)) is redundant). At
>>> present, the DM gives you multiple arcs for the foreign key name (ID):
>>>
>>> <Roles/ID.2> a <Roles> ;
>>>      <Roles#ID> <Users/ID.2> , <Principles/ID.2> ;
>>>      <Roles#permissions> "rwx" .
>>>
>>> which is just about what you're telling the system with your two
>>> foreign keys. BTW, you can go to
>>> <http://this-db-really.does-**not-exist.org/<http://this-db-really.does-not-exist.org/>
>>> >
>>> and enter the above DDL and an identity CONSTRUCT:
>>>
>>> CONSTRUCT {
>>> ?s ?p ?o .
>>> } WHERE {
>>> ?s ?p ?o .
>>> }
>>>
>>> to see this in action.
>>>
>>> As to having to do a join to get the values, I don't think it's worth
>>> the added user burden to optimize scalar access to foreign key values.
>>>
>>> I've rolled the changes into a doc called
>>> http://www.w3.org/2001/sw/**rdb2rdf/directMapping/**explicitFK<http://www.w3.org/2001/sw/rdb2rdf/directMapping/explicitFK>
>>> and reverted EGP modulo
>>>
>>> Feel free to forward this to Marcelo, rdb2rdf-wg, the IRS or the
>>> selective service.
>>>
>>>
>>>  Juan Sequeda
>>>> +1-575-SEQ-UEDA
>>>> www.juansequeda.com
>>>>
>>>
>>> --
>>> -ericP
>>>
>>
>> --
>> -ericP
>>
>>
>
Received on Wednesday, 17 August 2011 20:36:52 UTC