Re: Proposal for ISSUE-65 from Juan Sequeda on 2011-08-25 (public-rdb2rdf-wg@w3.org from August 2011)

From: Juan Sequeda <juanfederico@gmail.com>
Date: Thu, 25 Aug 2011 13:39:46 -0500
To: Richard Cyganiak <richard@cyganiak.de>
Cc: "public-rdb2rdf-wg@w3.org" <public-rdb2rdf-wg@w3.org>
Message-Id: <F50315D7-7225-4BCD-AE69-32A901DD3179@gmail.com>
On Aug 25, 2011, at 1:24 PM, Richard Cyganiak <richard@cyganiak.de> wrote:

> On 25 Aug 2011, at 18:26, Juan Sequeda wrote:
>> On a side note, I believe that the DM must be usable. And it's going to be the solution for db users to generate RDF from their RDB in the quickest fashion.  They won't want to learn a mapping language, they want something automatic.
> 
> This is true for *some* users.

The future will tell us :)

> 
>> DM + SQL views has the same expresivity as R2RML (well almost) At the end of the automatic process, it's a simple string substitution from automatic generated IRIs to well known IRIs.
> 
> If you care about your instance identifiers then you'll have to translate from blank nodes to IRIs too (views don't have PKs).
> 

True

>> I've always been for having the many-to-many case. I believe that implementers will have the W3C DM standard and then and extension where many-to-many will be considered and the user can choose which tables/views to direct map. 
> 
> The thing is, adding many-to-many outside of the standard is easy because you just add some additional triples. You don't need to remove any triples. It doesn't introduce an exception.

I agree
> 
> Not so with the FK triples as currently proposed.

Yes so lets find a solution for ISSUE-65. Seems like we are almost there. 
> 
>> Having said that, I guess you agree with the proposal of generating two different IRIs for literal and reference properties which will allow to address ISSUE-65. 
>> 
>> Question is: how do we create those IRIs:
>> 
>> Literal: <Table#Lattr>
>> Reference: <Table#Rattr>
> 
> Perhaps:
> 
> Literal: <Table#attr>
> Reference: <Table#ref.attr>
> 
> and add the dot “.” to the list of characters that must be percent-encoded if they occur in column names. (Or say that “ref.” at the beginning of a column name must be turned into “ref%2E”, so you'd get “ref.ref%2Efoo” when you have an FK over column “ref.foo”.)
> 
> Advantage: the more common case (literal triples) remains simpler. I find this fairly intuitive *and* Eric can use prefixes. people:NAME vs people:ref.ADDRESS makes a certain amount of sense. (Again, ref.ADDRESS could be xxx.ADDRESS or fk.ADDRESS or join.ADDRESS or whatever works for the rest of the WG.)

Fine by me. 

Eric, comments?

> 
> Best,
> Richard
> 
> 
> 
> 
>> 
>> Is that enough? Are people happy with it?
>> 
>> Juan Sequeda
>> www.juansequeda.com
>> 
>> On Aug 25, 2011, at 7:28 AM, Richard Cyganiak <richard@cyganiak.de> wrote:
>> 
>>> On 23 Aug 2011, at 23:53, Juan Sequeda wrote:
>>>> HOWEVER, honestly, this in a way can be seen as a hack.
>>> 
>>> I wouldn't call it a hack. The two properties -- one based on a column, one based on a foreign key -- are two different things, so it's reasonable to model them as two different IRIs.
>>> 
>>> Quoting AWWW:
>>> [[
>>> Constraint: Assign distinct URIs to distinct resources.
>>> ]]
>>> http://www.w3.org/TR/webarch/#id-resources
>>> 
>>>> We would be
>>>> sticking the semantics inside the IRI which is really weird.
>>>> Nevertheless, it works.
>>> 
>>> No, we wouldn't be sticking the semantics inside the IRI. We would just give two different names to two different things.
>>> 
>>>> I would still like to hear more use-cases and motivations to why we
>>>> should generate a literal triple for foreign key columns. From Souri's
>>>> initial email, I have:
>>>> 
>>>> - Uniformity: For multi-column foreign keys we are already creating
>>>> literal triples, so why not keep it uniform and do it for unary-column
>>>> foreign keys.
>>> 
>>> The case for Uniformity is stronger than that: All columns, always, are mapped in the same predictable way; with the single exception of unary foreign keys.
>>> 
>>>> - Performances: introduces need for unnecessary join with the parent
>>>> table to retrieve the value of the foreign key column.
>>> 
>>> I agree that performance shouldn't be a big deal. It's easy enough to recognize the case where a join is used to retrieve the ID, and optimize the join away.
>>> 
>>> Other reasons against having the exception:
>>> 
>>> 1. See above -- different things should have different IRIs. A single-column FK is not the same as a column.
>>> 
>>> 2. Some DB schemas don't contain explicit FKs. In this case, one has to do joins using the DM like this:
>>> 
>>> SELECT ?name ?city WHERE {
>>>  ?person <PERSON#NAME> ?name .
>>>  ?person <PERSON#ADDRESS> ?aid .
>>>  ?address <ADDRESS#CITYNAME> ?city .
>>>  ?address <ADDRESS#ID> ?aid .
>>> }
>>> 
>>> And this, while requiring one extra triple pattern, is actually the direct translation of how one does joins in SQL: by requiring that a referenced PK value is the same. So why does the DM stop me from using that approach if an FK happens to be declared?
>>> 
>>> 3. Adding an FK to a DB schema doesn't break SQL queries, so it shouldn't break SPARQL queries either.
>>> 
>>> 4. You decided not to handle many-to-many relationships in the DM. The arguments were that the DM should be kept predictable and super-simple, and there shouldn't be any exceptions in the DM. The counter-argument that this reduces the usability of the DM was rejected as irrelevant – the DM isn't supposed to be usable, because it's not used directly. So why are you now insisting on an exception that makes the DM less predictable, only on a usability argument? And on a trivial usability argument – it just means adding more namespaces if one wants to do everything with namespaces?
>>> 
>>> Best,
>>> Richard
>> 
>
Received on Thursday, 25 August 2011 18:40:23 UTC