Re: Proposal for ISSUE-65

* Richard Cyganiak <richard@cyganiak.de> [2011-08-25 13:28+0100]
> On 23 Aug 2011, at 23:53, Juan Sequeda wrote:
> > HOWEVER, honestly, this in a way can be seen as a hack.
> 
> I wouldn't call it a hack. The two properties -- one based on a column, one based on a foreign key -- are two different things, so it's reasonable to model them as two different IRIs.
> 
> Quoting AWWW:
> [[
> Constraint: Assign distinct URIs to distinct resources.
> ]]
> http://www.w3.org/TR/webarch/#id-resources
> 
> > We would be
> > sticking the semantics inside the IRI which is really weird.
> > Nevertheless, it works.
> 
> No, we wouldn't be sticking the semantics inside the IRI. We would just give two different names to two different things.

This examples consistently compare against a model like
  <People/ID=5> <People#addr> 18 , <Addresses/ID=18> .
vs. the currently specified model which gives
  <People/ID=5> <People#addr> <Addresses/ID=18> .


> > I would still like to hear more use-cases and motivations to why we
> > should generate a literal triple for foreign key columns. From Souri's
> > initial email, I have:
> > 
> > - Uniformity: For multi-column foreign keys we are already creating
> > literal triples, so why not keep it uniform and do it for unary-column
> > foreign keys.
> 
> The case for Uniformity is stronger than that: All columns, always, are mapped in the same predictable way; with the single exception of unary foreign keys.

Given that all values are currently extractable, I think this needs to be cast as a use case that by includes a user and a learning curve:

  User Joe has understands the pattern of querying database columns, doesn't know the unary foreign key exception.
  He's querying a table of the outcomes of diseases to see which diseases have the worst outcomes.
  The disease identifier is a LOINC code, and forms a foreign key to a table of details about the disease.
  Joe selects the LOINC code and is surprised to get <Diseases#LOINC=22312-3> instead of "22312-3" (hep A).

Critical to this use case is that:
  The foreign key in a unary attribute (e.g. not a pair like ("LOINC", "22312-3")).
  The foreign key value is of interest on its own; it's not just an auto_increment value used only to convey the relational graph.
  The user doesn't know they have to (or is annoyed to have to) query:
    { ?encounter enc:test [ test:observedPathology [ path:LOINC ?loinc ] ] ;
                 enc:therapy [ therapy:outcome ?outcome ] }
    instead of:
    { ?encounter enc:test [ test:observedPathology ?loinc ] ;
                 enc:therapy [ therapy:outcome ?outcome ] }


> > - Performances: introduces need for unnecessary join with the parent
> > table to retrieve the value of the foreign key column.
> 
> I agree that performance shouldn't be a big deal. It's easy enough to recognize the case where a join is used to retrieve the ID, and optimize the join away.
> 
> Other reasons against having the exception:
> 
> 1. See above -- different things should have different IRIs. A single-column FK is not the same as a column.
> 
> 2. Some DB schemas don't contain explicit FKs. In this case, one has to do joins using the DM like this:
> 
> SELECT ?name ?city WHERE {
>     ?person <PERSON#NAME> ?name .
>     ?person <PERSON#ADDRESS> ?aid .
>     ?address <ADDRESS#CITYNAME> ?city .
>     ?address <ADDRESS#ID> ?aid .
> }
> 
> And this, while requiring one extra triple pattern, is actually the direct translation of how one does joins in SQL: by requiring that a referenced PK value is the same. So why does the DM stop me from using that approach if an FK happens to be declared?

It is my opinion that the current model is the most intuitive in terms of people understanding the graph structure embedded in the links in the relational database. I see this writing-SQL-as-SPARQL use case as worth some increased complexity in predicate labels, if we can find something still intuitive.


> 3. Adding an FK to a DB schema doesn't break SQL queries, so it shouldn't break SPARQL queries either.

This also I find worth some complexity.
An approach which syntactically distinguishes all predicates, while leaving them in the same namespace, would address this:

  SELECT ?name ?city WHERE {
      ?person <PERSON#LNAME> ?name .
      ?person <PERSON#LADDRESS> ?aid .
      ?address <ADDRESS#LCITYNAME> ?city .
      ?address <ADDRESS#LID> ?aid .
  }
(Note the 'L's preceding the column names.)
This is sort of ugly and unpleasant. Maybe we'll find something more attractive, but ultimately, if we'll have to sacrifice some simplicity if we want to eliminate the unary foreign key exception.


> 4. You decided not to handle many-to-many relationships in the DM. The arguments were that the DM should be kept predictable and super-simple, and there shouldn't be any exceptions in the DM. The counter-argument that this reduces the usability of the DM was rejected as irrelevant – the DM isn't supposed to be usable, because it's not used directly. So why are you now insisting on an exception that makes the DM less predictable, only on a usability argument? And on a trivial usability argument – it just means adding more namespaces if one wants to do everything with namespaces?

The DM is definitely supposed to be usable. The queries and rules we've used as examples were just as intuitive as the analogous SQL queries. Further, the DM is quite reasonably described in a small bit of RDFS and OWL, which is likely to be the languages that informs query builders and user-facing interactive browsers. Any approach which requires a table with N foreign keys to have N+1 schema documents appears to be poor Semantic Web practice.

The proposals for the many-to-many tables broke every query on those tables if something as innocuous as a timestamp were added to the table.


> Best,
> Richard

-- 
-ericP

Received on Saturday, 27 August 2011 22:52:38 UTC