- From: Alexandre Bertails <bertails@w3.org>
- Date: Thu, 04 Nov 2010 09:57:29 -0400
- To: Eric Prud'hommeaux <eric@w3.org>
- Cc: Richard Cyganiak <richard@cyganiak.de>, Juan Sequeda <juanfederico@gmail.com>, Marcelo Arenas <marcelo.arenas1@gmail.com>, RDB2RDF WG <public-rdb2rdf-wg@w3.org>
Here are my comments, based on Eric's document (I've not had a look on Juan/Marcelo's one). First of all, I propose we don't use the term "transformation". The title says it's a mapping, so can we please with "mapping" instead of "transformation"? And "transformation" makes people to think it's actually transformed, which is not the case. [ snip ] > > > Stem URI this should be called base URI , because that's a commonly > > > understood term, and it enables the explanation of URI generation as > > > resolution of a relative URI against a base URI. > > > > > > > Ok. We will change this. > > I prefer to keep stem URI distinct from the relative URI, at least for > FPWD. base, per 2396 <http://tools.ietf.org/html/rfc2396#section-1.4> > has a specific behavoir, for example, a relative URI <People/ID.7#_> > resolved against a base URI of <http://foo.example/DB> yields > <http://foo.example/People/ID.7#_>. If we remedy that with a leading > slash, </People/ID.7#_>, we get something resolved against the root. +1 > > > The document should use SQL terminology throughout. Relation, attribute and > > > tuple should be table, column and row, etc. > > > > > > > Ok. We will change this. > > I think this is addressed §1-3. §4 uses a more traditional > terminology, but relates it to SQL terms with the following text: > [[ > There are many models for databases in SQL literature; because the > Direct Mapping does not rely on column position, we use a model which > assumes a 1:1 correspondance between attribute (column name) and > value, i.e. a map. > Starting with a traditional model of a relational database we define a > Relation (a table) which has a name, a Header, Body and > primary/foreign key details. > The Body contains maps from attribute names to values and the Header > provides the datatypes to interpret those values. > ]] > > editorial suggestions? Here is mine: s/i.e. a map/i.e. a map from Attribute to Values/ Or just get rid of it, as you describe it in the next phrase. Question: in the formal definition you still use Relational terminology, is there a reason for that? > > > The approach in Section 2 defines URIs for columns and rows, but not for > > > tables. This means one has to use hacks to do a SPARQL query for all records > > > in a given table. The approach needs to define URIs for tables as well, and > > > associate each row with the table it is from. > > > > > > > If we understand correctly, we would need to create IRIs for Tables. Hence, > > there would be now three types of IRIs: Tuple, Columns and Tables. However, > > if we are to create Table IRIs, then we also need to create a new type of > > triples: Table Triples: > > > > <TupleIRI, rdf:type, Table IRI> > > > > Do we agree? > > @@ will take a bit of work @@ > > > > > From the current description, it is impossible to work out how URIs for > > > rows with multi-column primary keys would look like. What order? What > > > separator characters? > > > > > > > The order is the same order of the columns in the table. Marcelo and I have > > the following proposal for creating IRIs: > > > > Table IRI > > > > baseURI/table i.e baseURI/person > > > > Column IRI > > > > baseURI/table/column i.e baseURI/person/name > > > > Multicolumn IRI > > > > baseURI/table/column1#column2#... i.e. baseURI/person/fname#lname > > > > Tuple IRI > > > > baseURI/table/column1:value i.e baseURI/person/id:12 > > > > Multicolumn Tuple IRI > > > > baseURI/table/column1:value1#column2:value2#... i.e > > baseURI/person/fname:Juan#lname:Sequeda > > > > > > This is our proposal. However, we are not aware of the best practices for > > IRIs. I propose that we open an Issue on "how to generate Table, Tuple and > > Column IRIs" > > http://www.w3.org/2001/sw/rdb2rdf/directGraph/#rules > > now has text to specify the construction of IRIs in the predicate > position and the subject/object position. Editorial suggestions > welcome! (I'm not super-confident that there isn't a clearer way to > express this.) I suggest you emphasize the RDB cases the mapping reacts on: * non-NULL foreign key * non-NULL column value * Columns which are also the sole column in a foreign key * primary key is also a foreign key I believe that all the part about "concatenating" Strings is useless. It's hard to read and doesn't help in this informative section. You already have the same information in §5. [ snip ] > > > This is what Eric had originally. So he can comment on his decision. Again, > > we should create an Issue on this. > > I was following <http://www.w3.org/DesignIssues/RDB-RDF>, but decided to > shorten "personnel/employees/1234#item" to "personnel/employees/1234#_". > The choice of hash vs. slash is a real issue. I added an issue > [[ > hash-vs-slash: This edition of this document presumes slash > identifiers. LOD data identifiers tend to use slash, but that slightly > increases implementation burden and round trips. > ]] > http://www.w3.org/2001/sw/rdb2rdf/directGraph/#hash-vs-slash I suggest we ask Sandro on that point as he has a real expertise. [ snip ] > > > I do not find the visual notation for unique keys and foreign keys > > > particularly clear. How about simply listing them underneath the table? > > > Foreign key: addr -> Addresses.ID > > > > > > > Could we consider taking away the visual notation for keys, and just have > > the table with data. We would also put in the SQL DDL and I'm wondering if > > this would be enough? I don't like this visual notation either (maybe because it's not normalized?). I believe that the CREATE statements + the data in the visual representation (without key information) should be enough. > > I added an bit of a key before the first example. The "empty primary > key" example was, I believe, the worst of the lot. After struggling a > bit with a notation, I gave up and copied the XSD from > https://dvcs.w3.org/hg/FeDeRate/file/060df0861705/directmapping/src/test/scala/DirectMappingTest.scala#l105 > into > http://www.w3.org/2001/sw/rdb2rdf/directGraph/#ref-no-pk > > > > > > > > You write foreign keys as if they reference another *key*. I believe that > > > doesn't reflect SQL. Foreign keys reference other *columns*. That's the > > > mental model that a reader is going to have in their head, and that's how it > > > should be presented in the spec. > > > > > > > I agree. To make sure, what we currently have for example Address.PK, and we > > know that the PK of Address is ID, it should then be Address.ID (or > > something like that). Is that what you mean? > > Actually, I disagree; foreign keys specifically reference candidate > keys in other tables. If the system does not enforce that, and the > data in foreign keys matches more than 1 row in the referenced table, > then we have have a pretty different graph to represent. My temptation > is to start out conservative, and if we have energy and mandate, > represent these cases which I believe are non-compliant. +1 [ snip ] > > > I object to the representation of simple string literals as > > > "Cambridge"^^xsd:string. This should simply be "Cambridge". They are > > > equivalent under datatype semantics, so the simple form should be used. > > > > > > > We should create an issue on this: "Should a literal include xsd?" Should be > > discussed in group and come to a consensus. > > http://www.w3.org/2001/sw/rdb2rdf/directGraph/#literaltriples > now says > [[ > Per XML Datatypes for SQL Datatypes, string datatypes are expressed as > an RDF plain literal. > ]] Even if ^^xsd:string doesn't have to be part of the serialized RDF (for convenience reason), it will always be part of the RDF model. I personally prefer to see it for these reasons: * the datatype mapping is an important step in the direct mapping so making it explicit in the serialization is important IMO * it doesn't change the meaning > > > Again, please drop the concept of a stem URI and explain that the mapping > > > uses relative URIs which are resolved against an environment-provided base > > > URI. Instead of this: > > > > > > <http://foo.example/DB/Addresses/ID.18#_> < > > > http://foo.example/DB/Addresses#ID> 18^^xsd:integer . > > > <http://foo.example/DB/Addresses/ID.18#_> < > > > http://foo.example/DB/Addresses#city> "Cambridge"^^xsd:string . > > > <http://foo.example/DB/Addresses/ID.18#_> < > > > http://foo.example/DB/Addresses#state> "MA"^^xsd:string . > > > > > > I'd like to see this: > > > > > > <Addresses/ID=18> <Addresses#ID> 18 . > > > <Addresses/ID=18> <Addresses#city> "Cambridge" . > > > <Addresses/ID=18> <Addresses#state> "MA" . Same comments about datatypes than above. Alexandre. > Hmm, it's possible that this might all be do-able with relative URIs. > But we'd better think about this carefully. > For now, I've used turtle's @base attribute. > > > > Do you mean that we should define a prefix: > > > > @prefix base: <http://foo.example/DB/> . > > > > and then everywhere have > > > > <base:Addresses/ID=18> <base:Addresses#ID> 18 . > > <base:Addresses/ID=18> <base:Addresses#city> "Cambridge" . > > <base:Addresses/ID=18> <base:Addresses#state> "MA" . > > > > > > > > > > > > > If you do it right, RDF can be simple ;-) > > > > > > > > :) > > > > > > > > > > Again, great work, and I'm very happy to see this spec moving forward and > > > like the direction it is taking. > > > > > > > Thanks for you very insightful and direct comments. > > > > Marcelo and I will be working on this in the next couple of days and let > > everybody know when we have an update. Please keep the comments coming!!!!! > > > > > > > Richard > > > > > > > > > > > > > > >> All the best, > > >> > > >> Marcelo > > >> > > >> > > > > > >
Received on Thursday, 4 November 2010 13:57:30 UTC