- From: Richard Cyganiak <richard@cyganiak.de>
- Date: Tue, 2 Nov 2010 20:35:54 +0000
- To: Juan Sequeda <juanfederico@gmail.com>
- Cc: Marcelo Arenas <marcelo.arenas1@gmail.com>, "Eric Prud'hommeaux" <eric@w3.org>, RDB2RDF WG <public-rdb2rdf-wg@w3.org>
Hi Juan,
Thanks for the reply! Some comments inline.
On 2 Nov 2010, at 19:19, Juan Sequeda wrote:
>> The approach in Section 2 defines URIs for columns and rows, but
>> not for
>> tables. This means one has to use hacks to do a SPARQL query for
>> all records
>> in a given table. The approach needs to define URIs for tables as
>> well, and
>> associate each row with the table it is from.
>
> If we understand correctly, we would need to create IRIs for Tables.
> Hence,
> there would be now three types of IRIs: Tuple, Columns and Tables.
Yes.
> However,
> if we are to create Table IRIs, then we also need to create a new
> type of
> triples: Table Triples:
>
> <TupleIRI, rdf:type, Table IRI>
>
> Do we agree?
I discussed this a bit with Eric at some point, and he had some
reservations about using rdf:type here because it could have
undesirable implications. I don't really have a strong opinion on the
choice of property. It could be rdf:type or some other property
especially defined for this task (xxx:table?). The important thing for
me: There should be a triple that relates a row to its table, to make
queries for all rows of a table easier. And all the important
components of a schema should have URIs, and tables are certainly
important, so they deserve a URI of their own.
> Column IRI
>
> baseURI/table/column i.e baseURI/person/name
Here, the current approach (baseURI/person#name) sort of makes sense
to me, because it slightly simplifies HTTP deployment.
> Multicolumn IRI
>
> baseURI/table/column1#column2#... i.e. baseURI/person/fname#lname
Hashes have a very special meaning in URI syntax, so I wouldn't use
them as generic separators. Having multiple hashes in a URI is almost
certainly a bad idea.
> Tuple IRI
>
> baseURI/table/column1:value i.e baseURI/person/id:12
The colon character is also quite special in URI syntax, and is
generally only used after the protocol part of the URI (http:...).
> Multicolumn Tuple IRI
>
> baseURI/table/column1:value1#column2:value2#... i.e
> baseURI/person/fname:Juan#lname:Sequeda
>
> This is our proposal. However, we are not aware of the best
> practices for
> IRIs. I propose that we open an Issue on "how to generate Table,
> Tuple and
> Column IRIs"
+1, we probably need to create an issue and first think about the
conditions that the solution has to satisfy. We need to take URI
syntax (RFC 3986), URI design best practices, and the requirements of
linked data deployment into account.
I haven't thought deeply about this, but spontaneously I would like to
see “=” for connecting column names to values, and “;” or “,” to
enumerate multiple items.
>> 2.2 is largely redundant as it only summarizes information that
>> follows in
>> more detail later. Thus the focus should be on giving a quick intro
>> to the
>> general idea, using simple language. The example is repeated twice
>> for no
>> reason.
>>
>
> We think it is important to state the different types of triples in
> the
> beginning. What it important is that somebody can initially figure
> out what
> the outcome is before diving into the whole document.
I'm ok with stating the different kinds of triples in 2.2.
>> The verbose textual rendering of the schema is unnecessary and
>> should be
>> removed. It says nothing that cannot be seen from the visual
>> representation.
>> Rather use that space for writing the table definition in SQL. Same
>> for
>> other places in the document where table schemas are spelled out
>> verbally.
>
> This is fine by me. But Marcelo would like to keep the verbose text.
> What do
> others think? But we should definitely have the SQL DDL
I would like to hear Marcelo's reasoning. If you have SQL DDL and a
visual rendering, then what does the text add?
>> I do not find the visual notation for unique keys and foreign keys
>> particularly clear. How about simply listing them underneath the
>> table?
>> “Foreign key: addr -> Addresses.ID”
>
> Could we consider taking away the visual notation for keys, and just
> have
> the table with data. We would also put in the SQL DDL and I'm
> wondering if
> this would be enough?
I think that would work for me, although I'd still have a slight
preference for *somehow* having the FKs and UKs present in the visual
rendering. Please keep the special color for the PK column(s), it is
helpful.
>> You write foreign keys as if they reference another *key*. I
>> believe that
>> doesn't reflect SQL. Foreign keys reference other *columns*. That's
>> the
>> mental model that a reader is going to have in their head, and
>> that's how it
>> should be presented in the spec.
>
> I agree. To make sure, what we currently have for example
> Address.PK, and we
> know that the PK of Address is ID, it should then be Address.ID (or
> something like that). Is that what you mean?
Exactly!
>> The content of 2.3.1 actually doesn't really match its title. The
>> title
>> talks about “information in PKs”. What follows is not only about
>> information
>> in PK columns.
>
> We will change the title. How about "Generating Triples from Primary
> Keys".
> Consequently, 2.3.2 could be "Generating Triples from Foreign Keys"
Well but 2.3.1 is not just about generating stuff from PKs! It also
deals with all the columns that are not involved in any key. That's my
complaint -- from the title you wouldn't be able to guess that this is
the section that handles the translation of normal columns to literals.
>> 2.3.2: The rules for referencing tables without PKs state that the
>> object
>> is the target row's Tuple IRI. Earlier you said that such tables
>> don't have
>> Tuple IRIs but blank nodes.
>
> When we describe a Tuple IRI, we give the case if a table doesn't
> have a
> primary key, then a blank node should be created. So in a way, it
> may be
> understood that a blank node is a Tuple IRI, which I know is
> incorrect. Can
> you suggest how we should go upon this.
In 2.2 you could introduce the concept of a “row RDF node”, which is
either a “row IRI” (what you now call tuple IRI) or a blank node. Then
you'd just have to state that the object of a reference triple is the
“row RDF node” of the target row, and refer to section 2.2 for
figuring out what the specific node would be.
>> I object to the representation of simple string literals as
>> "Cambridge"^^xsd:string. This should simply be "Cambridge". They are
>> equivalent under datatype semantics, so the simple form should be
>> used.
>
> We should create an issue on this: "Should a literal include xsd?"
> Should be
> discussed in group and come to a consensus.
+1
>> 18^^xsd:integer is not valid Turtle. This must either be
>> "18"^^xsd:integer,
>> or simply 18, which is just Turtle syntactic sugar for the former.
>> I would
>> highly prefer if the simple form was used throughout.
>>
>
> Yes, our mistake. However using simply 18 instead of having
> xsd:integer
> should be part of a group discussion. See previous comment about
> creating
> Issue
It's a different case from the previous one. "Foo" vs.
"Foo"^^xsd:string is actually a difference on the RDF graph level
(although in RDF semantics they are equivalent). 18 vs.
"18"^^xsd:integer are identical on the RDF graph level, it's just
syntactic sugar in Turtle.
>> I'd like to see this:
>>
>> <Addresses/ID=18> <Addresses#ID> 18 .
>> <Addresses/ID=18> <Addresses#city> "Cambridge" .
>> <Addresses/ID=18> <Addresses#state> "MA" .
>>
>
> Do you mean that we should define a prefix:
>
> @prefix base: <http://foo.example/DB/> .
>
> and then everywhere have
>
> <base:Addresses/ID=18> <base:Addresses#ID> 18 .
> <base:Addresses/ID=18> <base:Addresses#city> "Cambridge" .
> <base:Addresses/ID=18> <base:Addresses#state> "MA" .
No -- I mean just write relative URIs instead of absolute URIs. When
there is a well-defined base URI, then this makes a lot of sense.
See also:
http://www.w3.org/TeamSubmission/turtle/#uris
So you could write this:
@base <http://foo.example/DB/> .
<Addresses/ID=18> <Addresses#ID> 18 .
<Addresses/ID=18> <Addresses#city> "Cambridge" .
<Addresses/ID=18> <Addresses#state> "MA" .
That's just a shorter form for using full absolute URIs like <http://foo.example/DB/Addresses/ID=18
>. Given that the base URI is just defined once as an input to the
default mapping, you wouldn't have to repeat it for each example, but
just explain in the beginning where you currently introduce the “stem
URI” that throughout the document, the examples will contain relative
URIs, and these are to be understood as relative to the base URI.
Keep up the good work! Looking forward to an updated version!
Richard
>
>
>
>
>
>> If you do it right, RDF can be simple ;-)
>>
>>
> :)
>
>
>>
>> Again, great work, and I'm very happy to see this spec moving
>> forward and
>> like the direction it is taking.
>>
>
> Thanks for you very insightful and direct comments.
>
> Marcelo and I will be working on this in the next couple of days and
> let
> everybody know when we have an update. Please keep the comments
> coming!!!!!
>
>
>> Richard
>>
>>
>>
>>
>>> All the best,
>>>
>>> Marcelo
>>>
>>>
>>
>>
Received on Tuesday, 2 November 2010 20:36:31 UTC