- From: Juan Sequeda <juanfederico@gmail.com>
- Date: Mon, 22 Aug 2011 16:07:22 -0500
- To: Richard Cyganiak <richard@cyganiak.de>
- Cc: "Eric Prud'hommeaux" <eric@w3.org>, W3C RDB2RDF <public-rdb2rdf-wg@w3.org>
- Message-ID: <CAMVTWDyNZqTbW_4dHYLssZMayMxJjQzA3N0wVc-nTfrEyGVW9A@mail.gmail.com>
Richard, I really appreciate your insight. If I understand correctly, your proposal is: 1) Prefer to use relative IRIs (<People#fname>) instead of prefixes ( ppl:fname) 2) When it comes to generating an IRI for a foreign key, you prefer <ref/People#addr>. What is the IRI that you recommend for multi-attributes foreign keys? Am I missing something? I just want to be clear on your position. Juan Sequeda +1-575-SEQ-UEDA www.juansequeda.com On Fri, Aug 19, 2011 at 7:28 AM, Richard Cyganiak <richard@cyganiak.de>wrote: > On 17 Aug 2011, at 23:48, Eric Prud'hommeaux wrote: > >> You say that some proposals don't play well with namespace prefixes. You > use this as an argument against these proposals. I think that's an invalid > argument because namespaces are *already* entirely useless with the DM. > >> > >> 1. Each table requires its own namespace, leading to an abundance of > namespaces > > > > In the use cases I've dealt with, this has been a feature rather than a > bug. That is people:ID and addrs:ID are conveniently distinguished. Writing > rules or queries is very intuitive with this partitioning: > > > > PREFIX ppl: <People#> > > PREFIX adr: <Addresses#> > > SELECT ?city WHERE { ?who ppl:fname "Bob" ; > > ppl:addr ?addr . > > ?addr adr:city ?city } > > Most databases don't have neat and intuitive table names like that. They > have "OBX_MODEL_PPL2" and "OBX_SHP_ADR_MAIN". Once you look beyond the MySQL > webapp market and look at enterprisey stuff, many database schemas aren't > even hand-designed, but look like they dropped out of some CASE tool or > other monstrosity. Actually coming up with a neat intuitive three-letter > abbreviation for each of these tables is *hard*. It is extra work. Most > users won't bother, because they can get the job done without inventing > prefixes, and for fear that their neat prefix doesn't quite capture the > meaning of the table (which they probably didn't design themselves and only > half-understand). > > >> 2. The DM is not written by humans but by machines. The machine has to > generate the namespace prefix. The only thing it can really do is either use > the table name, or use the (unreadable) ns0, ns1, ns2 pattern. > > > > The DM will be queried by humans. It will also be transformed to common > ontologies by rules written by humans. > > So you assume that the prefixes will be written by humans? > > I don't believe that. > > No one wants to enter several lines of boilerplate before they can run a > query. Either the processor will pre-configure the prefixes (which again > raises the problem of machine-generated prefixes), or users will just make > do without prefixes. > > Writing the prefixes manually, on the other hand, requires an understanding > of the URI scheme used by the DM. Once one has acquired that understanding, > one can just as well forget about prefixes and use the URIs straight. > > >> 3. Generating prefixes automatically from the table name leads to all > sort of Fun with special characters. Basically, it is impossible because > there are no escape mechanisms inside the *prefix*. > > > > Most databases have only unary foreign keys. > > I doubt that. Many databases don't have any foreign keys at all. And many > non-toy databases *do have* foreign keys with multiple columns. > > > Most of these can be tranformed to common vocabularies with rules which > don't mention node identifiers, e.g. > > > > PREFIX ppl: <People#> > > PREFIX foaf: <http://xmlns.com/foaf/0.1/> > > CONSTRUCT { > > ?who a foaf:Person ; foaf:givenName ?fname > > } WHERE { ?who ppl:fname ?fname } > > Have you tried a non-toy example? How did that go? > > >> 4. The DM already produces many IRIs that cannot be abbreviated into > prefixed names because they contain commas or equal signs in the local part. > > > > Many queries and rules don't include specific node identifiers. > > It is very common for queries to ask for a specific identifier. > > > For databases with only unary foreign keys, these queries can be tersely > and conveniently expressed with the current algorithm. > > Bullshit. Using namespaces in DM queries makes them more verbose, not more > terse. Rewriting any of your examples here with relative URIs makes the > queries more compact. > > >> 5. Even if that's not the case, special characters in table and column > names will often prevent abbreviation. > > > > But again, most column names are BORING UPPER-CASE STRINGS (which fit the > PN_LOCAL lexical pattern which SPARQL and Turtle use). > > Sure, BORING_UPPER_CASE_STRING works in 80% of all cases. Nevertheless, > *every* user who does any real work with the DM will be confronted with > situations where prefixed names don't work, so they will: > > - have to understand the relative URI approach anyway > - be confused about why sometimes the one and sometimes the other is used > - be confronted with unexpected errors when they try to use prefixed names > but it doesn't work because there's some weird character in a column name > - have to learn which characters are allowed in local names, so that they > know whether to use prefixed name or relative URI when writing their queries > > And this is *in addition* to dealing with percent-encoding, which is > confusing enough! > > I repeat this for you: When working with any non-toy database, you'll have > to use the relative URI approach *anyway* in at least a few instances, so > users have to learn that approach *anyway*, and have to learn what the heck > the difference is, and when to use which. > > The relative URI approach works *always*, is *more terse*, and removes an > entire layer of complexity. > > >> 6. All RDF syntaxes that support prefixes, also support relative IRIs. > Using the table name as a prefix is just as long as using a relative URI: > People:addr vs. <People#addr>. > > > > Perhaps it's a fault of the educators, > > Perhaps. > > > but I've seen a surprisingly small number of SPARQL queries using base > (like < 3%). > > Well, the DM is quite different from your average RDF graph, so it's not > surprising that queries against the DM will look different. > > BASE isn't even necessary. The processor can specify a default base. > > >> 7. *All* URIs that can possibly occur in a DM graph can be nicely > abbreviated with a single base URI. > >> > >> My conclusion is that using prefixes with the DM is impossible to > implement in any way that works and makes sense. Implementations that are > interested in producing readable RDF should just use relative IRIs. > >> > >> Therefore, I think you have not presented any valid arguments against > using property IRIs such as these: > >> > >> <People,Addresses#addr,ID> > >> <People#addr,Addresses,ID> > >> <ref/People#addr> > >> > >> Personally, I like the last option. > > > > So worse than each table having its own namespace, each foreign key has > again a novel namespace. > > Eh. My point is that *none* of them has a namespace declaration. You know, > there is no law that states you can't use <IRIs> as property names in > SPARQL. > > > 1 and 3 ensure that no foreign key will be in the same namespace as the > other properties of the table. 2 renders many common queries like > > PREFIX ppl: <People#> > > PREFIX adr: <Addresses#> > > SELECT ?city WHERE { ?who ppl:fname "Bob" ; > > ppl:addr ?addr . > > ?addr adr:city ?city } > > harder to write. > > How is the above harder to write than this? > > SELECT ?city WHERE { ?who <People#fname> "Bob" ; > <People#addr> ?addr . > ?addr <Addresses#city> ?city } > > > What is the justification for complicating these common cases? > > I think of it as simplifying them. One less layer of complexity. More > predictability. Less to learn. > > > If it's just 'cause there's an exception in the spec, I don't see the > trade-off justified at all. If it's to save an arc traversal { ?who ppl:addr > ?addr . ?addr adr:ID ?id }, most graph patterns won't even touch db-encoding > artifacts like the ID. If it's to save a SQL join, the relational schema > already provides the DM processor already with sufficient info to not do the > join (i.e. FOREIGN KEY (addr) REFRENCES Addresses (ID) ). > > None of the above. > > It's to make the DM more predictable, easier to use, easier to teach and > easier to read for real-world applications. > > Your entire argument hinges on the use of namespace prefixes, and since I > believe that use of namespace prefixes with the DM is a bad idea, I simply > don't find your argument compelling at all. > > You're optimizing for the “hello, world” case at the expense of real-world > usability. You're pretending that funky characters in identifiers are a rare > corner case that doesn't really happen and that you don't need to worry > about. I'm sorry but that doesn't work. Believe me, I've tried that approach > in D2RQ and it doesn't work. Our second-most frequent class of bugs over the > years has been the result of me assuming, “oh no one would ever be so stupid > to put *that* character into a column name, right?” > > Best, > Richard
Received on Monday, 22 August 2011 21:08:20 UTC