- From: Michael Hausenblas <michael.hausenblas@deri.org>
- Date: Fri, 9 Sep 2011 09:11:14 +0100
- To: Ivan Herman <ivan@w3.org>
- Cc: Eric Prud'hommeaux <eric@w3.org>, public-rdb2rdf-wg@w3.org
> P.S. Honestly: this is the type of issue we could spend *weeks* > discussing, mainly because it has a distinct flavour of taste. We > should really try to avoid that:-) +1 Cheers, Michael -- Dr. Michael Hausenblas, Research Fellow LiDRC - Linked Data Research Centre DERI - Digital Enterprise Research Institute NUIG - National University of Ireland, Galway Ireland, Europe Tel. +353 91 495730 http://linkeddata.deri.ie/ http://sw-app.org/about.html On 9 Sep 2011, at 09:07, Ivan Herman wrote: > > On Sep 8, 2011, at 22:34 , Eric Prud'hommeaux wrote: > >> During the last meeting, we discussed picking a punctuation schema >> but >> asking the community for feedback on picking from a set of choices >> (perfectly legit in an LC document). > > Just an editorial issue. I think this WG must choose one of these > that reflect WG consensus. It is then perfectly legit to add a note > to the LC saying that alternative schemes are possible, that we > explicitly seek feed back on this, and point to a mail like this one > (or a wiki page) that lists the alternatives. But, again, we must > make a choice on this (last?) issue as soon as possible. > > As for myself, I must admit I do not have any strong feeling neither > pro or con with any of these schemes. As you say, there is a slider > here, which I can translate that there is no solution that covers > every requirement. So we have to take a compromise. If so, I can > personally live with any of these, as long as we have (finally!) > fixed it... > > Ivan > > P.S. Honestly: this is the type of issue we could spend *weeks* > discussing, mainly because it has a distinct flavour of taste. We > should really try to avoid that:-) > > > >> This can help us pick: >> >> >> = Problem = >> Define rules which create unambiguous identifiers for database rows, >> columns and references (foreign keys). >> Extra credit if they are easy to parse by human or machine and easy >> to express in SPARQL, Turtle, RIF, RDF/XML ("STRR" below). >> >> These URIs are composed from table and attribute names, attribute >> values, and miscelaneous punctuation. This email is about tweaking >> the punctuation to get the most simplicity in the most use cases. >> >> Rules in in <http://www.w3.org/2001/sw/rdb2rdf/directMapping/explicitFK >> >: >> Row IRI: base + table + '/' + attr¹ + '-' + val¹ + '.' … attrⁿ >> + '-' + valⁿ >> Column IRI: base + table + '#' + attr >> Reference IRI: base + table + '#' + 'ref-' + attr¹ + '.' … attrⁿ >> >> This uses the '-' separator between attributes in both row IRIs and >> reference IRIs. The attrⁿ/valⁿ separator is '.' (for simplicity >> in >> STRR). Outlining some popular choices: >> >> row IRI ref IRI >> ① attr¹-val¹.attrⁿ-valⁿ ref-attr¹.attrⁿ >> ② attr¹.val¹-attrⁿ.valⁿ ref-attr¹-attrⁿ >> ③ attr¹-val¹.attrⁿ-valⁿ ref-attr¹-attrⁿ >> ④ attr¹=val¹,attrⁿ=valⁿ ref-attr¹-attrⁿ >> ⑤ attr¹.val¹.attrⁿ.valⁿ ref.attr¹.attrⁿ >> >> >> = Examples = >> Given some tables with PKs: >> ┌┤Simple├────┬───────┐ >> ┌┤People├────┬─────────┐ >> ┌ >> ┤ >> Events >> ├ >> ─ >> ─ >> ─ >> ─ >> ┬ >> ────────────┬─────────┐ >> │┌pk┐│ │ │ >> │ >> ┌──────────pk────────┐│ >> │ >> ┌ >> ─ >> ─ >> ─ >> ─ >> pk >> ────┐│┌─────↬People.pk─────┐│ >> │ PK │ attrA │ attrB │ │ fname │ lname │ >> │ date │ orgfn │ orgln │ >> │ 1 │ valA1 │ valB2 │ │ "Bob" │ "Smith" │ │ >> 2012-01-01 │ "Bob" │ "Smith" │ >> │ 2 │ valA2 │ valB2 │ │ "Madonna" │ "" │ │ >> 2011-12-25 │ "Madonna" │ "" │ >> └────┴───────┴───────┘ >> │ "T in" │ "Ya-Li" │ │ 2012-04-06 │ "T in" │ >> "Ya-Li" │ >> │ "أكرم.عبد" │ "كور" │ │ >> 2011-10-01 │ "أكرم.عبد" │ "كور" │ >> >> └ >> ─ >> ───────────┴─────────┘ >> └ >> ─ >> ─ >> ─ >> ─ >> ─ >> ─ >> ─ >> ─ >> ─ >> ─ >> ─ >> ─ >> ┴ >> ────────────┴─────────┘ >> >> ┤Simple├ has your run-of-the-mill integer primary key and >> alphanumeric >> attribute names and values. ┤People├ and ┤Events├ have >> alphanum attribute >> names. (Attribute names which are not exclusively alpha-numeric are >> horrible no matter what; they don't help us descriminate our >> options.) >> >> == Example Row IRIs == >> We see these Row IRIs (eliding <base + ...>) for the first rows of >> these tables, given the choices of punctuation listed above. >> >> ① Simple/PK-1 │ People/fname-Bob.lname-Smith │ Events/ >> date-2012-01-01 >> ② Simple/PK.1 │ People/fname.Bob-lname.Smith │ Events/date. >> 2012%2D01%2D01 >> ③ Simple/PK.1 │ People/fname.Bob-lname.Smith │ Events/date. >> 2012%2D01%2D01 >> ④ Simple/PK=1 │ People/fname=Bob,lname=Smith │ Events/ >> date=2012-01-01 >> ⑤ Simple/PK.1 │ People/fname.Bob.lname.Smith │ Events/date. >> 2012-01-01 >> >> == Reference (predicate) IRIs == >> Reference (predicate) IRIs for ┤Simple├ are simple and boring: >> table#ref-attr . >> ┤Events├'s references to ┤People├ take to two attributes: >> >> ① Events/ref-orgfn.orgln >> ② Events/ref-orgfn-orgln >> ③ Events/ref-orgfn-orgln >> ④ Events/ref-orgfn-orgln >> ⑤ Events/ref.orgfn.orgln >> >> >> = What needs escaping = >> The character used to separate attr/value pairs dictates which >> characters require escaping in values. ②③ require escaping '-'s; >> ①⑤ requires escaping '.'s and ④ requires escaping ','s. Row >> identifiers for rows 3 and 4 of ┤People├ illustrate this: >> >> ① People/fname-T%20in.lname-Ya-Li │ People/fname-أكرم >> %2Dعبد.lname-كور >> ② People/fname.T%20in-lname.Ya%2DLi │ People/ >> fname.أكرم.عبد-lname%2Dكور >> ③ People/fname.T%20in-lname.Ya%2DLi │ People/ >> fname.أكرم.عبد-lname%2Dكور >> ④ People/fname=T%20in,lname=Ya-Li │ People/ >> fname=أكرم.عبد,lname=كور >> ⑤ People/fname.T%20in.lname.Ya-Li │ People/fname.أكرم >> %2Dعبد.lname.كور >> >> (We can also follow the HTML5, WSDL, ... url-encoding spec and >> turn ' ' into '+' instead of '%2D'.) >> >> >> = SPARQL, Turtle, RIF, RDF/XML = >> RDF Rules (RIF BLD, SPARQL CONSTRUCT) generally express patterns over >> predicates, without having to identify Row IRIs. Queries include Row >> identifiers a bit more (the savvy user or tool will select an entity >> by identifier rather than distinguishing attributes) and Turtle (the >> data) will of course include both. >> >> All of these languages allow the use of relative IRIs and prefixed >> names. A prefixed query of a People table for ① looks like: >> >> PREFIX pplinst: <http://hr.myco.example/2011/schemas/People/> >> PREFIX pplschm: <http://hr.myco.example/2011/schemas/People#> >> SELECT ?event >> WHERE { >> pplinst:fname-Bob.lname-Smith pplschm:atEvent ?event >> } >> >> And the relative IRI query looks like: >> >> BASE <http://hr.myco.example/2011/schemas/> >> SELECT ?event >> WHERE { >> <People/fname-Bob.lname-Smith> <People#atEvent> ?event >> } >> >> Extending the use case to gain some SemWeb utility, we join two >> databases, those of the HR and catering departments: >> >> PREFIX pplinst: <http://hr.myco.example/2011/schemas/People/> >> PREFIX pplschm: <http://hr.myco.example/2011/schemas/People#> >> PREFIX cater: <http://hr.myco.example/2011/schemas/People#> >> SELECT ?start ?end >> WHERE { >> pplinst:fname-Bob.lname-Smith pplschm:atEvent ?event >> ?event cater:start ?start ; cater:end ?end >> } >> >> The customary URI escape character, '%', is not permitted in prefixed >> names (nor are ',' and '='). The various row ID schemas have >> different >> impacts on the expressivity in prefixed names given different values: >> >> row ID pos int neg int alphanum date float >> ① attr¹-val¹.attrⁿ-valⁿ ✓ ✓ >> ✓ ✓ >> ② attr¹.val¹-attrⁿ.valⁿ ✓ >> ✓ ✓ >> ④ attr¹=val¹,attrⁿ=valⁿ >> ⑤ attr¹.val¹.attrⁿ.valⁿ ✓ ✓ >> ✓ ✓ >> >> (③ varies from ① only in the reference IRIs) >> >> For an example of negative integer primary keys, this table uses -2 >> and -1 to represent a couple access control groups common to all >> apache servers: >> >> ┌┤AccessRoles├───────┐ >> │┌pk┐│ │ >> │ ID │ desc │ >> │ -2 │ "known users" │ >> │ -1 │ "world" │ >> │ 1 │ "marketing" │ >> │ 2 │ "management" │ >> └────┴───────────────┘ >> >> >> = The balance = >> I see us as pushing a slider around between optimizing between >> readability ("attr¹=val¹,attrⁿ=valⁿ") and usability (being >> able to >> write/query the data with prefixed names). As Richard points out, we >> can write/query the data for an individual database using an @base >> directive and relative IRIs. This choice helps users write >> data/queries as prefixed names (e.g. queries connecting multiple >> databases). >> >> IMO, ④ is the most readable and ⑤ is the most usable, with ① >> being my >> idea of the sweet spot. ⑤ gives us the simplest encoding rules and >> ② >> is less likely to be confused with the '.' addressing scheme used in >> SQL. >> >> -- >> -ericP >> > > > ---- > Ivan Herman, W3C Semantic Web Activity Lead > Home: http://www.w3.org/People/Ivan/ > mobile: +31-641044153 > PGP Key: http://www.ivan-herman.net/pgpkey.html > FOAF: http://www.ivan-herman.net/foaf.rdf > > > > > >
Received on Friday, 9 September 2011 08:11:47 UTC