Re: Addressing ISSUE-64 and ISSUE-65

* Richard Cyganiak <richard@cyganiak.de> [2011-08-17 23:01+0100]
> Juan,
> 
> You say that some proposals don't play well with namespace prefixes. You use this as an argument against these proposals. I think that's an invalid argument because namespaces are *already* entirely useless with the DM.
> 
> 1. Each table requires its own namespace, leading to an abundance of namespaces

In the use cases I've dealt with, this has been a feature rather than a bug. That is people:ID and addrs:ID are conveniently distinguished. Writing rules or queries is very intuitive with this partitioning:

    PREFIX ppl: <People#>
    PREFIX adr: <Addresses#>
    SELECT ?city WHERE { ?who ppl:fname "Bob" ;
                              ppl:addr ?addr .
                        ?addr adr:city ?city }


> 2. The DM is not written by humans but by machines. The machine has to generate the namespace prefix. The only thing it can really do is either use the table name, or use the (unreadable) ns0, ns1, ns2 pattern.

The DM will be queried by humans. It will also be transformed to common ontologies by rules written by humans.


> 3. Generating prefixes automatically from the table name leads to all sort of Fun with special characters. Basically, it is impossible because there are no escape mechanisms inside the *prefix*.

Most databases have only unary foreign keys. Most of these can be tranformed to common vocabularies with rules which don't mention node identifiers, e.g.

    PREFIX ppl: <People#>
    PREFIX foaf: <http://xmlns.com/foaf/0.1/>
    CONSTRUCT {
        ?who a foaf:Person ; foaf:givenName ?fname
    } WHERE { ?who ppl:fname ?fname }


> 4. The DM already produces many IRIs that cannot be abbreviated into prefixed names because they contain commas or equal signs in the local part.

Many queries and rules don't include specific node identifiers. For databases with only unary foreign keys, these queries can be tersely and conveniently expressed with the current algorithm.


> 5. Even if that's not the case, special characters in table and column names will often prevent abbreviation.

But again, most column names are BORING UPPER-CASE STRINGS (which fit the PN_LOCAL lexical pattern which SPARQL and Turtle use).


> 6. All RDF syntaxes that support prefixes, also support relative IRIs. Using the table name as a prefix is just as long as using a relative URI: People:addr vs. <People#addr>.

Perhaps it's a fault of the educators, but I've seen a surprisingly small number of SPARQL queries using base (like < 3%).


> 7. *All* URIs that can possibly occur in a DM graph can be nicely abbreviated with a single base URI.
> 
> My conclusion is that using prefixes with the DM is impossible to implement in any way that works and makes sense. Implementations that are interested in producing readable RDF should just use relative IRIs.
> 
> Therefore, I think you have not presented any valid arguments against using property IRIs such as these:
> 
>   <People,Addresses#addr,ID>
>   <People#addr,Addresses,ID>
>   <ref/People#addr>
> 
> Personally, I like the last option.

So worse than each table having its own namespace, each foreign key has again a novel namespace. 1 and 3 ensure that no foreign key will be in the same namespace as the other properties of the table. 2 renders many common queries like
    PREFIX ppl: <People#>
    PREFIX adr: <Addresses#>
    SELECT ?city WHERE { ?who ppl:fname "Bob" ;
                              ppl:addr ?addr .
                        ?addr adr:city ?city }
harder to write.

What is the justification for complicating these common cases? If it's just 'cause there's an exception in the spec, I don't see the trade-off justified at all. If it's to save an arc traversal { ?who ppl:addr ?addr . ?addr adr:ID ?id }, most graph patterns won't even touch db-encoding artifacts like the ID. If it's to save a SQL join, the relational schema already provides the DM processor already with sufficient info to not do the join (i.e. FOREIGN KEY (addr) REFRENCES Addresses (ID) ).


> Best,
> Richard
> 
> 
> On 17 Aug 2011, at 21:36, Juan Sequeda wrote:
> 
> > Group,
> > 
> > Per the last telcon, our mandate was to generate formula which avoid the odd wrinkle of not exposing the scalar value for unary foreign keys, except indirectly as an attribute of the object of the property named for the foreign key column. All approaches that we conceived up involved either:
> > 
> > 
> > 
> >  • forming a name in a different namespace than the rest of the properties:
> > 
> >      <People,Addresses#addr,ID> vs. <People#addr>
> > 
> > 
> > 
> >  • adding punctuation to the localName which would have the same effect of precluding the use of the namespace prefix:
> > 
> >      <People#addr,Addresses,ID> vs. <People#addr>
> > 
> > 
> > 
> >  • prefixing all IRI localnames with a syntactic discriminator:
> > 
> >      <People/ID=7> <People#Raddr> <Addresses/ID=18> .
> > 
> >      <People/ID=7> <People#Saddr> 18 .
> > 
> > 
> > 
> > All of these seemed to be more counterintuitive than the original assymetry around unary foreign keys so we recommend that the spec stay as it is. Notice that is still possible to know if a property IRI is for a foreign key by looking at the object of the triple; if it's an IRI, then it's a foreign key, otherwise, it's not.
> > 
> > We would like to send the DM off to last call. We plan to work on a separate document mapping the relational schema to RDFS/OWL. A direct mapping of the schema to RDFS/OWL would indicate that 
> > 
> > <People#name> rdf:type owl:DatatypeProperty
> > 
> > <People#addr> rdf:type owl:ObjectProperty
> > 
> > Where anything that is an ObjectProperty comes from a foreign key.
> > 
> > This seems to be an adequate way of addressing this issue, instead of relying on the creating of complex IRIs.
> > 
> > 
> > 
> > Juan Sequeda
> > +1-575-SEQ-UEDA
> > www.juansequeda.com
> > 
> > 
> > On Tue, Aug 16, 2011 at 7:36 AM, Michael Hausenblas <michael.hausenblas@deri.org> wrote:
> > 
> > I believe Juan, Marcelo and myself now all endorse 1:
> > 
> > PROPOSE to close ISSUE-64 noting that the current DM definition generates triples for all foreign keys even if they are on the same columns.
> > 
> > PROPOSE to close ISSUE-65 noting that attempting to unify the treatment of literal triples over unary foreign keys marginally complicates the definition <http://localhost/2001/sw/rdb2rdf/directMapping/explicitFK#definition> and either breaks the clustering of table predicates in a single namespace or introduces ','s into localnames, which are difficult to represent in SPARQL and Turtle.
> > 
> > 
> > Very good. Thank you.
> > 
> > Cheers,
> >        Michael
> > --
> > Dr. Michael Hausenblas, Research Fellow
> > LiDRC - Linked Data Research Centre
> > DERI - Digital Enterprise Research Institute
> > NUIG - National University of Ireland, Galway
> > Ireland, Europe
> > Tel. +353 91 495730
> > http://linkeddata.deri.ie/
> > http://sw-app.org/about.html
> > 
> > 
> > On 16 Aug 2011, at 13:34, Eric Prud'hommeaux wrote:
> > 
> > * Juan Sequeda <juanfederico@gmail.com> [2011-08-10 11:49-0500]
> > Below is a conversation I started with Eric which involves ISSUE-64 and
> > ISSUE-65.
> > 
> > Basically there are 3 options
> > 
> > 1) Ignore ISSUE-64 and ISSUE-65
> > 2) Address ISSUE-65 and ignore ISSUE-64
> > 3) Address both issues.
> > 
> > Each of these options have advantages/disadvantages. Eric is added more
> > comments on this.
> > 
> > David, Souri,
> > 
> > can you give me a real use-case where there is a need of multiple foreign
> > keys from the same columns.
> > 
> > 
> > At this moment, I'm leaning towards Option 2. Eric is leaning towards Option
> > 1.
> > 
> > I believe Juan, Marcelo and myself now all endorse 1:
> > 
> > PROPOSE to close ISSUE-64 noting that the current DM definition generates triples for all foreign keys even if they are on the same columns.
> > 
> > PROPOSE to close ISSUE-65 noting that attempting to unify the treatment of literal triples over unary foreign keys marginally complicates the definition <http://localhost/2001/sw/rdb2rdf/directMapping/explicitFK#definition> and either breaks the clustering of table predicates in a single namespace or introduces ','s into localnames, which are difficult to represent in SPARQL and Turtle.
> > 
> > 
> > Looking forward to this discussion to see if we can resolve this quickly.
> > 
> > With this, I guess my ACTION-152 is closed.
> > 
> > Juan Sequeda
> > +1-575-SEQ-UEDA
> > www.juansequeda.com
> > 
> > 
> > ---------- Forwarded message ----------
> > From: Eric Prud'hommeaux <eric@w3.org>
> > Date: Wed, Aug 10, 2011 at 6:10 AM
> > Subject: Re: Our different options
> > To: Juan Sequeda <juanfederico@gmail.com>
> > 
> > 
> > whoops, sorry, fell asleep before checking mail again.
> > 
> > * Juan Sequeda <juanfederico@gmail.com> [2011-08-09 18:30-0500]
> > Eric,
> > 
> > What do you think about this:
> > 
> > 
> > Consider the following database
> > 
> > Person(pid, name, addr)
> > Address(aid, title)
> > 
> > where addr of Person is a FK to aid of Address
> > 
> > Person(1, John, 2)
> > Address(2, Cambridge)
> > 
> > I like to see stuff as tables (helps me visualize):
> > ┌┤Person├─────┬──────┐  ┌┤Address├──────────┐
> > │ id │ name   │ addr │  │ aid │ title       │
> > │  1 │ "John" │    2 │  │   2 │ "Cambridge" │
> > └────┴────────┴──────┘  └─────┴─────────────┘
> > 
> > though I think we can use the example from the current spec which will
> > help later in the conversation because we can speak of the concepts
> > and specific spec changes to the spec in the same breath:
> > 
> > People(7, Bob, 18)
> > Addresses(18, Cambridge)
> > 
> > ┌┤People├─────┬──────┐  ┌┤Addresses├───────┐
> > │ ID │ fname  │ addr │  │ ID │ city        │
> > │  7 │ "Bob"  │   18 │  │ 18 │ "Cambridge" │
> > └────┴────────┴──────┘  └────┴─────────────┘
> > 
> > 
> > Option 1:
> > 
> > Do not address ISSUE-64 or ISSUE-65.
> > 
> > Advantage:
> > 
> > - Keeping the DM very simple
> > - The IRI for all predicates will be very simple:
> > <tableName#AttributeName>
> > - IRIs are *nice*, except for foreign key IRIs which are:
> > 
> >                    except for n-ary foreign key IRIs | n>1, which require
> > ','s:
> > 
> > <tableName#AttributeName1,AttributeName2,...>
> > 
> > Disadvantage:
> > - Not addressing ISSUE-64 and ISSUE-65
> > 
> > 
> > The triples are the following:
> > 
> > <People/ID=7> <People#fname> "BoB" .
> > <People/ID=7> <People#addr> <Addresses/ID=18>
> > <Addresses/ID=18> <Addresses#city> Cambridge
> > 
> > 
> > Option 2:
> > 
> > Address ISSUE-65 but not ISSUE-64
> > 
> > Advantage
> > - Avoid doing a join in order to get a the value of the foreign key
> > attribute
> > - All IRIs *nice*
> > - If a foreign key is multi-column, then we would have a *nice*
> > IRI <People#Department> instead of an *ugly* IRI
> > <People#deptName,deptCity>
> > (having all the columns in the foreign key in the IRI separated by commas)
> > 
> > Disadvantage
> > - Need to create two different IRIs for predicates: literal and reference
> > 
> >  - Ambiguous if there's more than one foreign key to the same table, e.g.
> > 
> > ┌┤People├─────┬──────────┬──────────┐  ┌┤Addresses├───────┐
> > │ ID │ fname  │ homeaddr │ workaddr │  │ ID │ city        │
> > │  7 │ "Bob"  │       18 │       18 │  │ 18 │ "Cambridge" │
> > └────┴────────┴──────────┴──────────┘  │ 23 │ "Arlington" │
> >                                     └────┴─────────────┘
> > where (homeaddr) → (Addresses, (ID))
> >     (workaddr) → (Addresses, (ID))
> > 
> > (can also be exemplified in one table, but it's arguably more awkward:
> > ┌┤People├─────┬──────┬───────────────┐
> > │ ID │ fname  │ boss │ officeManager │
> > │  1 │ "Amy"  │    8 │            13 │
> > │  7 │ "Bob"  │    8 │            13 │
> > │  8 │ "Sue"  │    1 │            13 │
> > │ 13 │ "Tom"  │    1 │            13 │
> > └────┴────────┴──────┴───────────────┘
> > where (boss) → (People, (ID))
> >     (officeManager) → (People, (ID))
> > )
> > 
> > I believe that there are way more cases where a table has more than
> > one foreign key to the same table than that a table has and needs more
> > than one foreign key constrain on the same columns. In databases I've
> > touched in the last week, protein-protein interaction tables come to mind.
> > 
> > 
> > predicate IRIs
> > - Not as simple anymore, but still pretty simple
> > 
> > The two predicate IRIs are:
> > 
> > literal predicate IRI: <tableName#attributename>
> > reference predicate IRI: <tableName#referenceTableName>
> > 
> > The triples are the following:
> > 
> > <People/ID=7> <People#fname> "BoB" .
> > <People/ID=7> <People#addr> 2
> > <People/ID=7> <People#Addresses> <Addresses/ID=18>
> > <Addresses/ID=18> <Addresses#city> Cambridge
> > 
> > Option 3:
> > 
> > Address ISSUE-64 and ISSUE-65
> > 
> > Advantage
> > - Avoid doing a join in order to get a the value of the foreign key
> > attribute
> > - Address the following use case: same column sequence may be used for
> > multiple foreign key constraints
> > 
> > Disadvantage
> > - Need to create two different IRIs for predicates: literal and reference
> > predicate IRIs
> > - reference predicate IRIs are complicated and ugly:
> > 
> > <People,Department#deptName,name;deptCity,city>
> > or maybe
> > <People#Department;deptName,name;deptCity,city>
> > 
> > -----
> > 
> > The issues of having these ugly IRIs are in prefixes for sparql queries.
> > With option 2, I could have a prefix
> > 
> > PREFIX ex: <http://www.example.com/vocab/People#>
> > 
> > SELECT *
> > WHERE{
> > ?s ex:Addresses ?o
> > }
> > 
> > With option 1 or 3, I would need to have the entire IRI in the query
> > 
> > ?s <http://www.example.com/vocab/People#deptName,deptCity> ?o
> > 
> > or
> > 
> > ?s <
> > http://www.example.com/vocab/People#Department;deptName,name;deptCity,city
> > 
> > ?o
> > 
> > 
> > Eric... what do you think about this? I'm leaning towards option 2
> > 
> > Very nice summary.
> > 
> > I'm still leaning heavily towards 1. I think that the current
> > situation isn't bad when you have more than one foreign key on a
> > column list. Given an access control scenario:
> > 
> > CREATE TABLE Principles (ID INT PRIMARY KEY, created STRING);
> > INSERT INTO Principles (ID, created) VALUES (2, "2011-09-10");
> > INSERT INTO Principles (ID, created) VALUES (3, "2011-09-10");
> > CREATE TABLE Users (ID INT PRIMARY KEY, name STRING, FOREIGN KEY (ID)
> > REFERENCES Principles(ID));
> > INSERT INTO Users (ID, name) VALUES (2, "Bob");
> > CREATE TABLE IPAddrs (ID INT PRIMARY KEY, ip STRING, FOREIGN KEY (ID)
> > REFERENCES Principles(ID));
> > INSERT INTO IPAddrs (ID, ip) VALUES (3, "81.23.2.200");
> > CREATE TABLE Roles (ID INT PRIMARY KEY, permissions STRING, FOREIGN KEY (ID)
> > REFERENCES Users(ID), FOREIGN KEY (ID) REFERENCES Principles(ID));
> > INSERT INTO Roles (ID, permissions) VALUES (2, "rwx");
> > 
> >          ┌┤Principles├─────┐
> >          │ ID │ created    │
> >          │  2 │ 2011-09-10 │
> >          │  3 │ 2011-09-10 │
> >          └────┴────────────┘
> >          /    \
> > ┌┤Users├─────┐  ┌┤IPAddrs├─────────┐
> > │ ID │ name  │  │ ID │ ip          │
> > │  2 │ "Bob" │  │  3 │ 18.23.2.200 │
> > └────┴───────┘  └────┴─────────────┘
> > 
> > ┌┤Roles├─────────────┐
> > │ user │ permissions │
> > │    2 │       "rwx" │
> > └──────┴─────────────┘
> > 
> > Roles could be argued to be a foreign key to both Users and Principles
> > (though presumably, Users.ID already has a foreign key constraint on
> > Principles.ID so (Roles.user) → (Principles (ID)) is redundant). At
> > present, the DM gives you multiple arcs for the foreign key name (ID):
> > 
> > <Roles/ID.2> a <Roles> ;
> >      <Roles#ID> <Users/ID.2> , <Principles/ID.2> ;
> >      <Roles#permissions> "rwx" .
> > 
> > which is just about what you're telling the system with your two
> > foreign keys. BTW, you can go to
> > <http://this-db-really.does-not-exist.org/>
> > and enter the above DDL and an identity CONSTRUCT:
> > 
> > CONSTRUCT {
> > ?s ?p ?o .
> > } WHERE {
> > ?s ?p ?o .
> > }
> > 
> > to see this in action.
> > 
> > As to having to do a join to get the values, I don't think it's worth
> > the added user burden to optimize scalar access to foreign key values.
> > 
> > I've rolled the changes into a doc called
> > http://www.w3.org/2001/sw/rdb2rdf/directMapping/explicitFK
> > and reverted EGP modulo
> > 
> > Feel free to forward this to Marcelo, rdb2rdf-wg, the IRS or the
> > selective service.
> > 
> > 
> > Juan Sequeda
> > +1-575-SEQ-UEDA
> > www.juansequeda.com
> > 
> > --
> > -ericP
> > 
> > -- 
> > -ericP
> > 
> > 
> > 
> 
> 

-- 
-ericP

Received on Wednesday, 17 August 2011 22:48:05 UTC