Re: Detailed comments on new default mapping draft

Eric,

http://www.w3.org/2001/sw/rdb2rdf/directGraph/ is not working

Juan Sequeda
+1-575-SEQ-UEDA
www.juansequeda.com


On Thu, Nov 4, 2010 at 12:44 AM, Eric Prud'hommeaux <eric@w3.org> wrote:

> Per Richard's suggestion, I've incorporated most of Richard's comments
> into Rev 1.45 of http://www.w3.org/2001/sw/rdb2rdf/directGraph/ .
>
>
> * Juan Sequeda <juanfederico@gmail.com> [2010-11-02 14:19-0500]
> > Richard,
> >
> > Great comments! Thanks a lot. I'm glad you took the time to read the
> whole
> > thing. You even found typos, which means that you really took a
> magnifying
> > glass with you.
> >
> > Looking forward to other comments.
> >
> > Marcelo and I just had a long meeting and we discussed each of your
> points.
> > Comments are inline.
> >
> >
> > On Tue, Nov 2, 2010 at 9:10 AM, Richard Cyganiak <richard@cyganiak.de
> >wrote:
> >
> > > Marcelo, Eric,
> > >
> > > I am commenting on Section 2 of
> > > http://www.w3.org/2001/sw/rdb2rdf/directGraph/alt
> > > $Id: alt.xml,v 1.2 2010/10/30 03:39:01 marenas Exp $
> > >
> > > First of all, great work! Looks like we almost have an FPWD here.
> > >
> > > Detailed comments are below. A lot of it is editorial, but there are
> some
> > > substantial comments too, as well as some pointers to oversights. I
> would
> > > appreciate if each comment could be a) addressed in the text, or b)
> > > reflected as an @@Issue in the text, or c) replied to in a response to
> this
> > > email, or d) turned into an Issue in the W3C tracker.
> > >
> >
> > I'm commenting everything inline in this email. Some things would need to
> be
> > turned into Issues in the W3C tracker (how to do this?)
> >
> > >
> > >
> > >  Stem URI    this should be called  base URI , because that's a
> commonly
> > > understood term, and it enables the explanation of URI generation as
> > > resolution of a relative URI against a base URI.
> > >
> >
> > Ok. We will change this.
>
> I prefer to keep stem URI distinct from the relative URI, at least for
> FPWD. base, per 2396 <http://tools.ietf.org/html/rfc2396#section-1.4>
> has a specific behavoir, for example, a relative URI <People/ID.7#_>
> resolved against a base URI of <http://foo.example/DB> yields
> <http://foo.example/People/ID.7#_>. If we remedy that with a leading
> slash, </People/ID.7#_>, we get something resolved against the root.
>
>
> > > The document should use SQL terminology throughout. Relation, attribute
> and
> > > tuple should be table, column and row, etc.
> > >
> >
> > Ok. We will change this.
>
> I think this is addressed §1-3. §4 uses a more traditional
> terminology, but relates it to SQL terms with the following text:
> [[
> There are many models for databases in SQL literature; because the
> Direct Mapping does not rely on column position, we use a model which
> assumes a 1:1 correspondance between attribute (column name) and
> value, i.e. a map.
> Starting with a traditional model of a relational database we define a
> Relation (a table) which has a name, a Header, Body and
> primary/foreign key details.
> The Body contains maps from attribute names to values and the Header
> provides the datatypes to interpret those values.
> ]]
>
> editorial suggestions?
>
>
> > > The approach in Section 2 defines URIs for columns and rows, but not
> for
> > > tables. This means one has to use hacks to do a SPARQL query for all
> records
> > > in a given table. The approach needs to define URIs for tables as well,
> and
> > > associate each row with the table it is from.
> > >
> >
> > If we understand correctly, we would need to create IRIs for Tables.
> Hence,
> > there would be now three types of IRIs: Tuple, Columns and Tables.
> However,
> > if we are to create Table IRIs, then we also need to create a new type of
> > triples: Table Triples:
> >
> > <TupleIRI, rdf:type, Table IRI>
> >
> > Do we agree?
>
> @@ will take a bit of work @@
>
>
> > > From the current description, it is impossible to work out how URIs for
> > > rows with multi-column primary keys would look like. What order? What
> > > separator characters?
> > >
> >
> > The order is the same order of the columns in the table.  Marcelo and I
> have
> > the following proposal for creating IRIs:
> >
> > Table IRI
> >
> > baseURI/table i.e baseURI/person
> >
> > Column IRI
> >
> > baseURI/table/column i.e baseURI/person/name
> >
> > Multicolumn IRI
> >
> > baseURI/table/column1#column2#... i.e. baseURI/person/fname#lname
> >
> > Tuple IRI
> >
> > baseURI/table/column1:value  i.e baseURI/person/id:12
> >
> > Multicolumn Tuple IRI
> >
> > baseURI/table/column1:value1#column2:value2#... i.e
> > baseURI/person/fname:Juan#lname:Sequeda
> >
> >
> > This is our proposal. However, we are not aware of the best practices for
> > IRIs. I propose that we open an Issue on "how to generate Table, Tuple
> and
> > Column IRIs"
>
> http://www.w3.org/2001/sw/rdb2rdf/directGraph/#rules
>
> now has text to specify the construction of IRIs in the predicate
> position and the subject/object position. Editorial suggestions
> welcome! (I'm not super-confident that there isn't a clearer way to
> express this.)
>
> http://www.w3.org/2001/sw/rdb2rdf/directGraph/#multi-key reinforces
> that with:
> [[
> • The predicate for this key is formed from the stem and
> "deptName_deptCity",
>  reflecting the order of the column names in the foreign key.
> ]]
>
> > > I am uncomfortable with the use of the dot character as a separator in
> > > generated URIs. The character typically used in URIs to indicate a
> > > hierarchical relationship is "/". The character typically used to
> indicate
> > > key-value pairs is "=".
> > >
> >
> > See previous comment. We should open Issue on creating IRIs and discuss
> this
> > in group.
>
> accepted s/=/./
>
>
> > >
> > > I am uncomfortable with the use of "#_" at the end of row URIs. I
> cannot
> > > see any precedent for that, so I cannot call it good practice. It is
> also
> > > unnecessary because the URI identifies a row in a database table and
> never a
> > > person/address/organization or whatever other real-world object. Rows
> in
> > > database tables are information resources and thus there is no problem
> at
> > > all with identifying them using a plain fragment-less URI.
> > >
> > > This is what Eric had originally. So he can comment on his decision.
> Again,
> > we should create an Issue on this.
>
> I was following <http://www.w3.org/DesignIssues/RDB-RDF>, but decided to
> shorten "personnel/employees/1234#item" to "personnel/employees/1234#_".
> The choice of hash vs. slash is a real issue. I added an issue
> [[
> hash-vs-slash: This edition of this document presumes slash
> identifiers. LOD data identifiers tend to use slash, but that slightly
> increases implementation burden and round trips.
> ]]
> http://www.w3.org/2001/sw/rdb2rdf/directGraph/#hash-vs-slash
>
>
> > > Special characters in table names, column names and PK values need to
> be
> > > handled in the URI generation.
> > >
> >
> > Again... create an issue on this ( I sound like a broken record)
>
> http://www.w3.org/2001/sw/rdb2rdf/directGraph/#rules states that table
> names, attribute names and attribute lexical values are url-encoded
> per http://www.w3.org/TR/wsdl#_http:urlEncoded .
>
> > > 2.2 says:  with an XML Schema datatype corresponding to the SQL
> datatype of
> > > that column . That obviously needs to be spelled out. There should be
> an
> > > extra section to this,  Mapping SQL Datatypes to RDF Literals  or
> something
> > > like that. The section can be a placeholder for FPWD, but should exist.
> > >
> >
> > Yes. There needs to be a table with the corresponding mapping
>
> added [[
> ...the pairs of url-encoded column name '=' url-encoded column value,
> separated by a '_'. The column value is the lexical per SQL99 [SQL99].
> ]]
>
> > > I do not find the visual notation for unique keys and foreign keys
> > > particularly clear. How about simply listing them underneath the table?
> > >  Foreign key: addr -> Addresses.ID
> > >
> >
> > Could we consider taking away the visual notation for keys, and just have
> > the table with data. We would also put in the SQL DDL and I'm wondering
> if
> > this would be enough?
>
> I added an bit of a key before the first example. The "empty primary
> key" example was, I believe, the worst of the lot. After struggling a
> bit with a notation, I gave up and copied the XSD from
>
> https://dvcs.w3.org/hg/FeDeRate/file/060df0861705/directmapping/src/test/scala/DirectMappingTest.scala#l105
> into
>  http://www.w3.org/2001/sw/rdb2rdf/directGraph/#ref-no-pk
>
>
> > >
> > > You write foreign keys as if they reference another *key*. I believe
> that
> > > doesn't reflect SQL. Foreign keys reference other *columns*. That's the
> > > mental model that a reader is going to have in their head, and that's
> how it
> > > should be presented in the spec.
> > >
> >
> > I agree. To make sure, what we currently have for example Address.PK, and
> we
> > know that the PK of Address is ID, it should then be Address.ID (or
> > something like that). Is that what you mean?
>
> Actually, I disagree; foreign keys specifically reference candidate
> keys in other tables. If the system does not enforce that, and the
> data in foreign keys matches more than 1 row in the referenced table,
> then we have have a pretty different graph to represent. My temptation
> is to start out conservative, and if we have energy and mandate,
> represent these cases which I believe are non-compliant.
>
> [[
> The columns in the referencing table must be the primary key or other
> candidate key in the referenced table.
> ]] — http://en.wikipedia.org/wiki/Foreign_key ¶1
>
>
> > > The use of "_" as a separator between the column/value pairs in
> > > multi-column PK row URIs is a bad idea, because the underscore
> character is
> > > ubiquitous in table and column names. An obvious replacement would be
> ";".
> > >
> >
> >
> > Again.. we need to create an Issue about generating IRIs :P
>
> The prob here is that we don't want to step on either a valid fragment
> identifier (for e.g. turtle) or xml local name (for RDF/XML). I've left
> the "_" until we have a new idea.
>
> Note that url-encoding the column names and lexical values protects us
> from seing the "_"s in e.g. f_name Bob_Smith.
>
>
> > > http://foo.example/DB/Department#Manager -- why is Manager uppercase?
> > >
> >
> > typo
>
> ditto
>
> > >
> > > I object to the representation of simple string literals as
> > > "Cambridge"^^xsd:string. This should simply be "Cambridge". They are
> > > equivalent under datatype semantics, so the simple form should be used.
> > >
> >
> > We should create an issue on this: "Should a literal include xsd?" Should
> be
> > discussed in group and come to a consensus.
>
> http://www.w3.org/2001/sw/rdb2rdf/directGraph/#literaltriples
> now says
> [[
> Per XML Datatypes for SQL Datatypes, string datatypes are expressed as
> an RDF plain literal.
> ]]
>
>
> > > Again, please drop the concept of a stem URI and explain that the
> mapping
> > > uses relative URIs which are resolved against an environment-provided
> base
> > > URI. Instead of this:
> > >
> > > <http://foo.example/DB/Addresses/ID.18#_> <
> > > http://foo.example/DB/Addresses#ID> 18^^xsd:integer .
> > > <http://foo.example/DB/Addresses/ID.18#_> <
> > > http://foo.example/DB/Addresses#city> "Cambridge"^^xsd:string .
> > > <http://foo.example/DB/Addresses/ID.18#_> <
> > > http://foo.example/DB/Addresses#state> "MA"^^xsd:string .
> > >
> > > I'd like to see this:
> > >
> > > <Addresses/ID=18> <Addresses#ID> 18 .
> > > <Addresses/ID=18> <Addresses#city> "Cambridge" .
> > > <Addresses/ID=18> <Addresses#state> "MA" .
>
> Hmm, it's possible that this might all be do-able with relative URIs.
> But we'd better think about this carefully.
> For now, I've used turtle's @base attribute.
>
>
> > Do you mean that we should define a prefix:
> >
> > @prefix base: <http://foo.example/DB/> .
> >
> > and then everywhere have
> >
> > <base:Addresses/ID=18> <base:Addresses#ID> 18 .
> > <base:Addresses/ID=18> <base:Addresses#city> "Cambridge" .
> > <base:Addresses/ID=18> <base:Addresses#state> "MA" .
> >
> >
> >
> >
> >
> > > If you do it right, RDF can be simple ;-)
> > >
> > >
> > :)
> >
> >
> > >
> > > Again, great work, and I'm very happy to see this spec moving forward
> and
> > > like the direction it is taking.
> > >
> >
> > Thanks for you very insightful and direct comments.
> >
> > Marcelo and I will be working on this in the next couple of days and let
> > everybody know when we have an update. Please keep the comments
> coming!!!!!
> >
> >
> > > Richard
> > >
> > >
> > >
> > >
> > >> All the best,
> > >>
> > >> Marcelo
> > >>
> > >>
> > >
> > >
> --
> -ericP
>

Received on Thursday, 4 November 2010 13:06:59 UTC