Re: Addressing ISSUE-64 and ISSUE-65 from Juan Sequeda on 2011-08-22 (public-rdb2rdf-wg@w3.org from August 2011)

From: Juan Sequeda <juanfederico@gmail.com>
Date: Mon, 22 Aug 2011 16:07:22 -0500
To: Richard Cyganiak <richard@cyganiak.de>
Cc: "Eric Prud'hommeaux" <eric@w3.org>, W3C RDB2RDF <public-rdb2rdf-wg@w3.org>
Message-ID: <CAMVTWDyNZqTbW_4dHYLssZMayMxJjQzA3N0wVc-nTfrEyGVW9A@mail.gmail.com>
Richard,

I really appreciate your insight. If I understand correctly, your proposal
is:

1) Prefer to use relative IRIs (<People#fname>) instead of prefixes (
ppl:fname)
2) When it comes to generating an IRI for a foreign key, you prefer
<ref/People#addr>.
What is the IRI that you recommend for multi-attributes foreign keys?

Am I missing something? I just want to be clear on your position.


Juan Sequeda
+1-575-SEQ-UEDA
www.juansequeda.com


On Fri, Aug 19, 2011 at 7:28 AM, Richard Cyganiak <richard@cyganiak.de>wrote:

> On 17 Aug 2011, at 23:48, Eric Prud'hommeaux wrote:
> >> You say that some proposals don't play well with namespace prefixes. You
> use this as an argument against these proposals. I think that's an invalid
> argument because namespaces are *already* entirely useless with the DM.
> >>
> >> 1. Each table requires its own namespace, leading to an abundance of
> namespaces
> >
> > In the use cases I've dealt with, this has been a feature rather than a
> bug. That is people:ID and addrs:ID are conveniently distinguished. Writing
> rules or queries is very intuitive with this partitioning:
> >
> >    PREFIX ppl: <People#>
> >    PREFIX adr: <Addresses#>
> >    SELECT ?city WHERE { ?who ppl:fname "Bob" ;
> >                              ppl:addr ?addr .
> >                        ?addr adr:city ?city }
>
> Most databases don't have neat and intuitive table names like that. They
> have "OBX_MODEL_PPL2" and "OBX_SHP_ADR_MAIN". Once you look beyond the MySQL
> webapp market and look at enterprisey stuff, many database schemas aren't
> even hand-designed, but look like they dropped out of some CASE tool or
> other monstrosity. Actually coming up with a neat intuitive three-letter
> abbreviation for each of these tables is *hard*. It is extra work. Most
> users won't bother, because they can get the job done without inventing
> prefixes, and for fear that their neat prefix doesn't quite capture the
> meaning of the table (which they probably didn't design themselves and only
> half-understand).
>
> >> 2. The DM is not written by humans but by machines. The machine has to
> generate the namespace prefix. The only thing it can really do is either use
> the table name, or use the (unreadable) ns0, ns1, ns2 pattern.
> >
> > The DM will be queried by humans. It will also be transformed to common
> ontologies by rules written by humans.
>
> So you assume that the prefixes will be written by humans?
>
> I don't believe that.
>
> No one wants to enter several lines of boilerplate before they can run a
> query. Either the processor will pre-configure the prefixes (which again
> raises the problem of machine-generated prefixes), or users will just make
> do without prefixes.
>
> Writing the prefixes manually, on the other hand, requires an understanding
> of the URI scheme used by the DM. Once one has acquired that understanding,
> one can just as well forget about prefixes and use the URIs straight.
>
> >> 3. Generating prefixes automatically from the table name leads to all
> sort of Fun with special characters. Basically, it is impossible because
> there are no escape mechanisms inside the *prefix*.
> >
> > Most databases have only unary foreign keys.
>
> I doubt that. Many databases don't have any foreign keys at all. And many
> non-toy databases *do have* foreign keys with multiple columns.
>
> > Most of these can be tranformed to common vocabularies with rules which
> don't mention node identifiers, e.g.
> >
> >    PREFIX ppl: <People#>
> >    PREFIX foaf: <http://xmlns.com/foaf/0.1/>
> >    CONSTRUCT {
> >        ?who a foaf:Person ; foaf:givenName ?fname
> >    } WHERE { ?who ppl:fname ?fname }
>
> Have you tried a non-toy example? How did that go?
>
> >> 4. The DM already produces many IRIs that cannot be abbreviated into
> prefixed names because they contain commas or equal signs in the local part.
> >
> > Many queries and rules don't include specific node identifiers.
>
> It is very common for queries to ask for a specific identifier.
>
> > For databases with only unary foreign keys, these queries can be tersely
> and conveniently expressed with the current algorithm.
>
> Bullshit. Using namespaces in DM queries makes them more verbose, not more
> terse. Rewriting any of your examples here with relative URIs makes the
> queries more compact.
>
> >> 5. Even if that's not the case, special characters in table and column
> names will often prevent abbreviation.
> >
> > But again, most column names are BORING UPPER-CASE STRINGS (which fit the
> PN_LOCAL lexical pattern which SPARQL and Turtle use).
>
> Sure, BORING_UPPER_CASE_STRING works in 80% of all cases. Nevertheless,
> *every* user who does any real work with the DM will be confronted with
> situations where prefixed names don't work, so they will:
>
> - have to understand the relative URI approach anyway
> - be confused about why sometimes the one and sometimes the other is used
> - be confronted with unexpected errors when they try to use prefixed names
> but it doesn't work because there's some weird character in a column name
> - have to learn which characters are allowed in local names, so that they
> know whether to use prefixed name or relative URI when writing their queries
>
> And this is *in addition* to dealing with percent-encoding, which is
> confusing enough!
>
> I repeat this for you: When working with any non-toy database, you'll have
> to use the relative URI approach *anyway* in at least a few instances, so
> users have to learn that approach *anyway*, and have to learn what the heck
> the difference is, and when to use which.
>
> The relative URI approach works *always*, is *more terse*, and removes an
> entire layer of complexity.
>
> >> 6. All RDF syntaxes that support prefixes, also support relative IRIs.
> Using the table name as a prefix is just as long as using a relative URI:
> People:addr vs. <People#addr>.
> >
> > Perhaps it's a fault of the educators,
>
> Perhaps.
>
> > but I've seen a surprisingly small number of SPARQL queries using base
> (like < 3%).
>
> Well, the DM is quite different from your average RDF graph, so it's not
> surprising that queries against the DM will look different.
>
> BASE isn't even necessary. The processor can specify a default base.
>
> >> 7. *All* URIs that can possibly occur in a DM graph can be nicely
> abbreviated with a single base URI.
> >>
> >> My conclusion is that using prefixes with the DM is impossible to
> implement in any way that works and makes sense. Implementations that are
> interested in producing readable RDF should just use relative IRIs.
> >>
> >> Therefore, I think you have not presented any valid arguments against
> using property IRIs such as these:
> >>
> >>  <People,Addresses#addr,ID>
> >>  <People#addr,Addresses,ID>
> >>  <ref/People#addr>
> >>
> >> Personally, I like the last option.
> >
> > So worse than each table having its own namespace, each foreign key has
> again a novel namespace.
>
> Eh. My point is that *none* of them has a namespace declaration. You know,
> there is no law that states you can't use <IRIs> as property names in
> SPARQL.
>
> > 1 and 3 ensure that no foreign key will be in the same namespace as the
> other properties of the table. 2 renders many common queries like
> >    PREFIX ppl: <People#>
> >    PREFIX adr: <Addresses#>
> >    SELECT ?city WHERE { ?who ppl:fname "Bob" ;
> >                              ppl:addr ?addr .
> >                        ?addr adr:city ?city }
> > harder to write.
>
> How is the above harder to write than this?
>
>   SELECT ?city WHERE { ?who <People#fname> "Bob" ;
>                             <People#addr> ?addr .
>                       ?addr <Addresses#city> ?city }
>
> > What is the justification for complicating these common cases?
>
> I think of it as simplifying them. One less layer of complexity. More
> predictability. Less to learn.
>
> > If it's just 'cause there's an exception in the spec, I don't see the
> trade-off justified at all. If it's to save an arc traversal { ?who ppl:addr
> ?addr . ?addr adr:ID ?id }, most graph patterns won't even touch db-encoding
> artifacts like the ID. If it's to save a SQL join, the relational schema
> already provides the DM processor already with sufficient info to not do the
> join (i.e. FOREIGN KEY (addr) REFRENCES Addresses (ID) ).
>
> None of the above.
>
> It's to make the DM more predictable, easier to use, easier to teach and
> easier to read for real-world applications.
>
> Your entire argument hinges on the use of namespace prefixes, and since I
> believe that use of namespace prefixes with the DM is a bad idea, I simply
> don't find your argument compelling at all.
>
> You're optimizing for the “hello, world” case at the expense of real-world
> usability. You're pretending that funky characters in identifiers are a rare
> corner case that doesn't really happen and that you don't need to worry
> about. I'm sorry but that doesn't work. Believe me, I've tried that approach
> in D2RQ and it doesn't work. Our second-most frequent class of bugs over the
> years has been the result of me assuming, “oh no one would ever be so stupid
> to put *that* character into a column name, right?”
>
> Best,
> Richard
Received on Monday, 22 August 2011 21:08:20 UTC