- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Thu, 4 Nov 2010 01:44:29 -0400
- To: Richard Cyganiak <richard@cyganiak.de>
- Cc: Juan Sequeda <juanfederico@gmail.com>, Marcelo Arenas <marcelo.arenas1@gmail.com>, RDB2RDF WG <public-rdb2rdf-wg@w3.org>
Per Richard's suggestion, I've incorporated most of Richard's comments into Rev 1.45 of http://www.w3.org/2001/sw/rdb2rdf/directGraph/ . * Juan Sequeda <juanfederico@gmail.com> [2010-11-02 14:19-0500] > Richard, > > Great comments! Thanks a lot. I'm glad you took the time to read the whole > thing. You even found typos, which means that you really took a magnifying > glass with you. > > Looking forward to other comments. > > Marcelo and I just had a long meeting and we discussed each of your points. > Comments are inline. > > > On Tue, Nov 2, 2010 at 9:10 AM, Richard Cyganiak <richard@cyganiak.de>wrote: > > > Marcelo, Eric, > > > > I am commenting on Section 2 of > > http://www.w3.org/2001/sw/rdb2rdf/directGraph/alt > > $Id: alt.xml,v 1.2 2010/10/30 03:39:01 marenas Exp $ > > > > First of all, great work! Looks like we almost have an FPWD here. > > > > Detailed comments are below. A lot of it is editorial, but there are some > > substantial comments too, as well as some pointers to oversights. I would > > appreciate if each comment could be a) addressed in the text, or b) > > reflected as an @@Issue in the text, or c) replied to in a response to this > > email, or d) turned into an Issue in the W3C tracker. > > > > I'm commenting everything inline in this email. Some things would need to be > turned into Issues in the W3C tracker (how to do this?) > > > > > > > Stem URI this should be called base URI , because that's a commonly > > understood term, and it enables the explanation of URI generation as > > resolution of a relative URI against a base URI. > > > > Ok. We will change this. I prefer to keep stem URI distinct from the relative URI, at least for FPWD. base, per 2396 <http://tools.ietf.org/html/rfc2396#section-1.4> has a specific behavoir, for example, a relative URI <People/ID.7#_> resolved against a base URI of <http://foo.example/DB> yields <http://foo.example/People/ID.7#_>. If we remedy that with a leading slash, </People/ID.7#_>, we get something resolved against the root. > > The document should use SQL terminology throughout. Relation, attribute and > > tuple should be table, column and row, etc. > > > > Ok. We will change this. I think this is addressed §1-3. §4 uses a more traditional terminology, but relates it to SQL terms with the following text: [[ There are many models for databases in SQL literature; because the Direct Mapping does not rely on column position, we use a model which assumes a 1:1 correspondance between attribute (column name) and value, i.e. a map. Starting with a traditional model of a relational database we define a Relation (a table) which has a name, a Header, Body and primary/foreign key details. The Body contains maps from attribute names to values and the Header provides the datatypes to interpret those values. ]] editorial suggestions? > > The approach in Section 2 defines URIs for columns and rows, but not for > > tables. This means one has to use hacks to do a SPARQL query for all records > > in a given table. The approach needs to define URIs for tables as well, and > > associate each row with the table it is from. > > > > If we understand correctly, we would need to create IRIs for Tables. Hence, > there would be now three types of IRIs: Tuple, Columns and Tables. However, > if we are to create Table IRIs, then we also need to create a new type of > triples: Table Triples: > > <TupleIRI, rdf:type, Table IRI> > > Do we agree? @@ will take a bit of work @@ > > From the current description, it is impossible to work out how URIs for > > rows with multi-column primary keys would look like. What order? What > > separator characters? > > > > The order is the same order of the columns in the table. Marcelo and I have > the following proposal for creating IRIs: > > Table IRI > > baseURI/table i.e baseURI/person > > Column IRI > > baseURI/table/column i.e baseURI/person/name > > Multicolumn IRI > > baseURI/table/column1#column2#... i.e. baseURI/person/fname#lname > > Tuple IRI > > baseURI/table/column1:value i.e baseURI/person/id:12 > > Multicolumn Tuple IRI > > baseURI/table/column1:value1#column2:value2#... i.e > baseURI/person/fname:Juan#lname:Sequeda > > > This is our proposal. However, we are not aware of the best practices for > IRIs. I propose that we open an Issue on "how to generate Table, Tuple and > Column IRIs" http://www.w3.org/2001/sw/rdb2rdf/directGraph/#rules now has text to specify the construction of IRIs in the predicate position and the subject/object position. Editorial suggestions welcome! (I'm not super-confident that there isn't a clearer way to express this.) http://www.w3.org/2001/sw/rdb2rdf/directGraph/#multi-key reinforces that with: [[ • The predicate for this key is formed from the stem and "deptName_deptCity", reflecting the order of the column names in the foreign key. ]] > > I am uncomfortable with the use of the dot character as a separator in > > generated URIs. The character typically used in URIs to indicate a > > hierarchical relationship is "/". The character typically used to indicate > > key-value pairs is "=". > > > > See previous comment. We should open Issue on creating IRIs and discuss this > in group. accepted s/=/./ > > > > I am uncomfortable with the use of "#_" at the end of row URIs. I cannot > > see any precedent for that, so I cannot call it good practice. It is also > > unnecessary because the URI identifies a row in a database table and never a > > person/address/organization or whatever other real-world object. Rows in > > database tables are information resources and thus there is no problem at > > all with identifying them using a plain fragment-less URI. > > > > This is what Eric had originally. So he can comment on his decision. Again, > we should create an Issue on this. I was following <http://www.w3.org/DesignIssues/RDB-RDF>, but decided to shorten "personnel/employees/1234#item" to "personnel/employees/1234#_". The choice of hash vs. slash is a real issue. I added an issue [[ hash-vs-slash: This edition of this document presumes slash identifiers. LOD data identifiers tend to use slash, but that slightly increases implementation burden and round trips. ]] http://www.w3.org/2001/sw/rdb2rdf/directGraph/#hash-vs-slash > > Special characters in table names, column names and PK values need to be > > handled in the URI generation. > > > > Again... create an issue on this ( I sound like a broken record) http://www.w3.org/2001/sw/rdb2rdf/directGraph/#rules states that table names, attribute names and attribute lexical values are url-encoded per http://www.w3.org/TR/wsdl#_http:urlEncoded . > > 2.2 says: with an XML Schema datatype corresponding to the SQL datatype of > > that column . That obviously needs to be spelled out. There should be an > > extra section to this, Mapping SQL Datatypes to RDF Literals or something > > like that. The section can be a placeholder for FPWD, but should exist. > > > > Yes. There needs to be a table with the corresponding mapping added [[ ...the pairs of url-encoded column name '=' url-encoded column value, separated by a '_'. The column value is the lexical per SQL99 [SQL99]. ]] > > I do not find the visual notation for unique keys and foreign keys > > particularly clear. How about simply listing them underneath the table? > > Foreign key: addr -> Addresses.ID > > > > Could we consider taking away the visual notation for keys, and just have > the table with data. We would also put in the SQL DDL and I'm wondering if > this would be enough? I added an bit of a key before the first example. The "empty primary key" example was, I believe, the worst of the lot. After struggling a bit with a notation, I gave up and copied the XSD from https://dvcs.w3.org/hg/FeDeRate/file/060df0861705/directmapping/src/test/scala/DirectMappingTest.scala#l105 into http://www.w3.org/2001/sw/rdb2rdf/directGraph/#ref-no-pk > > > > You write foreign keys as if they reference another *key*. I believe that > > doesn't reflect SQL. Foreign keys reference other *columns*. That's the > > mental model that a reader is going to have in their head, and that's how it > > should be presented in the spec. > > > > I agree. To make sure, what we currently have for example Address.PK, and we > know that the PK of Address is ID, it should then be Address.ID (or > something like that). Is that what you mean? Actually, I disagree; foreign keys specifically reference candidate keys in other tables. If the system does not enforce that, and the data in foreign keys matches more than 1 row in the referenced table, then we have have a pretty different graph to represent. My temptation is to start out conservative, and if we have energy and mandate, represent these cases which I believe are non-compliant. [[ The columns in the referencing table must be the primary key or other candidate key in the referenced table. ]] — http://en.wikipedia.org/wiki/Foreign_key ¶1 > > The use of "_" as a separator between the column/value pairs in > > multi-column PK row URIs is a bad idea, because the underscore character is > > ubiquitous in table and column names. An obvious replacement would be ";". > > > > > Again.. we need to create an Issue about generating IRIs :P The prob here is that we don't want to step on either a valid fragment identifier (for e.g. turtle) or xml local name (for RDF/XML). I've left the "_" until we have a new idea. Note that url-encoding the column names and lexical values protects us from seing the "_"s in e.g. f_name Bob_Smith. > > http://foo.example/DB/Department#Manager -- why is Manager uppercase? > > > > typo ditto > > > > I object to the representation of simple string literals as > > "Cambridge"^^xsd:string. This should simply be "Cambridge". They are > > equivalent under datatype semantics, so the simple form should be used. > > > > We should create an issue on this: "Should a literal include xsd?" Should be > discussed in group and come to a consensus. http://www.w3.org/2001/sw/rdb2rdf/directGraph/#literaltriples now says [[ Per XML Datatypes for SQL Datatypes, string datatypes are expressed as an RDF plain literal. ]] > > Again, please drop the concept of a stem URI and explain that the mapping > > uses relative URIs which are resolved against an environment-provided base > > URI. Instead of this: > > > > <http://foo.example/DB/Addresses/ID.18#_> < > > http://foo.example/DB/Addresses#ID> 18^^xsd:integer . > > <http://foo.example/DB/Addresses/ID.18#_> < > > http://foo.example/DB/Addresses#city> "Cambridge"^^xsd:string . > > <http://foo.example/DB/Addresses/ID.18#_> < > > http://foo.example/DB/Addresses#state> "MA"^^xsd:string . > > > > I'd like to see this: > > > > <Addresses/ID=18> <Addresses#ID> 18 . > > <Addresses/ID=18> <Addresses#city> "Cambridge" . > > <Addresses/ID=18> <Addresses#state> "MA" . Hmm, it's possible that this might all be do-able with relative URIs. But we'd better think about this carefully. For now, I've used turtle's @base attribute. > Do you mean that we should define a prefix: > > @prefix base: <http://foo.example/DB/> . > > and then everywhere have > > <base:Addresses/ID=18> <base:Addresses#ID> 18 . > <base:Addresses/ID=18> <base:Addresses#city> "Cambridge" . > <base:Addresses/ID=18> <base:Addresses#state> "MA" . > > > > > > > If you do it right, RDF can be simple ;-) > > > > > :) > > > > > > Again, great work, and I'm very happy to see this spec moving forward and > > like the direction it is taking. > > > > Thanks for you very insightful and direct comments. > > Marcelo and I will be working on this in the next couple of days and let > everybody know when we have an update. Please keep the comments coming!!!!! > > > > Richard > > > > > > > > > >> All the best, > >> > >> Marcelo > >> > >> > > > > -- -ericP
Received on Thursday, 4 November 2010 05:45:15 UTC