- From: Ivan Herman <ivan@w3.org>
- Date: Fri, 9 Sep 2011 10:07:49 +0200
- To: Eric Prud'hommeaux <eric@w3.org>
- Cc: public-rdb2rdf-wg@w3.org
On Sep 8, 2011, at 22:34 , Eric Prud'hommeaux wrote: > During the last meeting, we discussed picking a punctuation schema but > asking the community for feedback on picking from a set of choices > (perfectly legit in an LC document). Just an editorial issue. I think this WG must choose one of these that reflect WG consensus. It is then perfectly legit to add a note to the LC saying that alternative schemes are possible, that we explicitly seek feed back on this, and point to a mail like this one (or a wiki page) that lists the alternatives. But, again, we must make a choice on this (last?) issue as soon as possible. As for myself, I must admit I do not have any strong feeling neither pro or con with any of these schemes. As you say, there is a slider here, which I can translate that there is no solution that covers every requirement. So we have to take a compromise. If so, I can personally live with any of these, as long as we have (finally!) fixed it... Ivan P.S. Honestly: this is the type of issue we could spend *weeks* discussing, mainly because it has a distinct flavour of taste. We should really try to avoid that:-) > This can help us pick: > > > = Problem = > Define rules which create unambiguous identifiers for database rows, > columns and references (foreign keys). > Extra credit if they are easy to parse by human or machine and easy > to express in SPARQL, Turtle, RIF, RDF/XML ("STRR" below). > > These URIs are composed from table and attribute names, attribute > values, and miscelaneous punctuation. This email is about tweaking > the punctuation to get the most simplicity in the most use cases. > > Rules in in <http://www.w3.org/2001/sw/rdb2rdf/directMapping/explicitFK>: > Row IRI: base + table + '/' + attr¹ + '-' + val¹ + '.' … attrⁿ + '-' + valⁿ > Column IRI: base + table + '#' + attr > Reference IRI: base + table + '#' + 'ref-' + attr¹ + '.' … attrⁿ > > This uses the '-' separator between attributes in both row IRIs and > reference IRIs. The attrⁿ/valⁿ separator is '.' (for simplicity in > STRR). Outlining some popular choices: > > row IRI ref IRI > ① attr¹-val¹.attrⁿ-valⁿ ref-attr¹.attrⁿ > ② attr¹.val¹-attrⁿ.valⁿ ref-attr¹-attrⁿ > ③ attr¹-val¹.attrⁿ-valⁿ ref-attr¹-attrⁿ > ④ attr¹=val¹,attrⁿ=valⁿ ref-attr¹-attrⁿ > ⑤ attr¹.val¹.attrⁿ.valⁿ ref.attr¹.attrⁿ > > > = Examples = > Given some tables with PKs: > ┌┤Simple├────┬───────┐ ┌┤People├────┬─────────┐ ┌┤Events├────┬────────────┬─────────┐ > │┌pk┐│ │ │ │┌──────────pk────────┐│ │┌────pk────┐│┌─────↬People.pk─────┐│ > │ PK │ attrA │ attrB │ │ fname │ lname │ │ date │ orgfn │ orgln │ > │ 1 │ valA1 │ valB2 │ │ "Bob" │ "Smith" │ │ 2012-01-01 │ "Bob" │ "Smith" │ > │ 2 │ valA2 │ valB2 │ │ "Madonna" │ "" │ │ 2011-12-25 │ "Madonna" │ "" │ > └────┴───────┴───────┘ │ "T in" │ "Ya-Li" │ │ 2012-04-06 │ "T in" │ "Ya-Li" │ > │ "أكرم.عبد" │ "كور" │ │ 2011-10-01 │ "أكرم.عبد" │ "كور" │ > └────────────┴─────────┘ └────────────┴────────────┴─────────┘ > > ┤Simple├ has your run-of-the-mill integer primary key and alphanumeric > attribute names and values. ┤People├ and ┤Events├ have alphanum attribute > names. (Attribute names which are not exclusively alpha-numeric are > horrible no matter what; they don't help us descriminate our options.) > > == Example Row IRIs == > We see these Row IRIs (eliding <base + ...>) for the first rows of > these tables, given the choices of punctuation listed above. > > ① Simple/PK-1 │ People/fname-Bob.lname-Smith │ Events/date-2012-01-01 > ② Simple/PK.1 │ People/fname.Bob-lname.Smith │ Events/date.2012%2D01%2D01 > ③ Simple/PK.1 │ People/fname.Bob-lname.Smith │ Events/date.2012%2D01%2D01 > ④ Simple/PK=1 │ People/fname=Bob,lname=Smith │ Events/date=2012-01-01 > ⑤ Simple/PK.1 │ People/fname.Bob.lname.Smith │ Events/date.2012-01-01 > > == Reference (predicate) IRIs == > Reference (predicate) IRIs for ┤Simple├ are simple and boring: table#ref-attr . > ┤Events├'s references to ┤People├ take to two attributes: > > ① Events/ref-orgfn.orgln > ② Events/ref-orgfn-orgln > ③ Events/ref-orgfn-orgln > ④ Events/ref-orgfn-orgln > ⑤ Events/ref.orgfn.orgln > > > = What needs escaping = > The character used to separate attr/value pairs dictates which > characters require escaping in values. ②③ require escaping '-'s; > ①⑤ requires escaping '.'s and ④ requires escaping ','s. Row > identifiers for rows 3 and 4 of ┤People├ illustrate this: > > ① People/fname-T%20in.lname-Ya-Li │ People/fname-أكرم%2Dعبد.lname-كور > ② People/fname.T%20in-lname.Ya%2DLi │ People/fname.أكرم.عبد-lname%2Dكور > ③ People/fname.T%20in-lname.Ya%2DLi │ People/fname.أكرم.عبد-lname%2Dكور > ④ People/fname=T%20in,lname=Ya-Li │ People/fname=أكرم.عبد,lname=كور > ⑤ People/fname.T%20in.lname.Ya-Li │ People/fname.أكرم%2Dعبد.lname.كور > > (We can also follow the HTML5, WSDL, ... url-encoding spec and > turn ' ' into '+' instead of '%2D'.) > > > = SPARQL, Turtle, RIF, RDF/XML = > RDF Rules (RIF BLD, SPARQL CONSTRUCT) generally express patterns over > predicates, without having to identify Row IRIs. Queries include Row > identifiers a bit more (the savvy user or tool will select an entity > by identifier rather than distinguishing attributes) and Turtle (the > data) will of course include both. > > All of these languages allow the use of relative IRIs and prefixed > names. A prefixed query of a People table for ① looks like: > > PREFIX pplinst: <http://hr.myco.example/2011/schemas/People/> > PREFIX pplschm: <http://hr.myco.example/2011/schemas/People#> > SELECT ?event > WHERE { > pplinst:fname-Bob.lname-Smith pplschm:atEvent ?event > } > > And the relative IRI query looks like: > > BASE <http://hr.myco.example/2011/schemas/> > SELECT ?event > WHERE { > <People/fname-Bob.lname-Smith> <People#atEvent> ?event > } > > Extending the use case to gain some SemWeb utility, we join two > databases, those of the HR and catering departments: > > PREFIX pplinst: <http://hr.myco.example/2011/schemas/People/> > PREFIX pplschm: <http://hr.myco.example/2011/schemas/People#> > PREFIX cater: <http://hr.myco.example/2011/schemas/People#> > SELECT ?start ?end > WHERE { > pplinst:fname-Bob.lname-Smith pplschm:atEvent ?event > ?event cater:start ?start ; cater:end ?end > } > > The customary URI escape character, '%', is not permitted in prefixed > names (nor are ',' and '='). The various row ID schemas have different > impacts on the expressivity in prefixed names given different values: > > row ID pos int neg int alphanum date float > ① attr¹-val¹.attrⁿ-valⁿ ✓ ✓ ✓ ✓ > ② attr¹.val¹-attrⁿ.valⁿ ✓ ✓ ✓ > ④ attr¹=val¹,attrⁿ=valⁿ > ⑤ attr¹.val¹.attrⁿ.valⁿ ✓ ✓ ✓ ✓ > > (③ varies from ① only in the reference IRIs) > > For an example of negative integer primary keys, this table uses -2 > and -1 to represent a couple access control groups common to all > apache servers: > > ┌┤AccessRoles├───────┐ > │┌pk┐│ │ > │ ID │ desc │ > │ -2 │ "known users" │ > │ -1 │ "world" │ > │ 1 │ "marketing" │ > │ 2 │ "management" │ > └────┴───────────────┘ > > > = The balance = > I see us as pushing a slider around between optimizing between > readability ("attr¹=val¹,attrⁿ=valⁿ") and usability (being able to > write/query the data with prefixed names). As Richard points out, we > can write/query the data for an individual database using an @base > directive and relative IRIs. This choice helps users write > data/queries as prefixed names (e.g. queries connecting multiple > databases). > > IMO, ④ is the most readable and ⑤ is the most usable, with ① being my > idea of the sweet spot. ⑤ gives us the simplest encoding rules and ② > is less likely to be confused with the '.' addressing scheme used in > SQL. > > -- > -ericP > ---- Ivan Herman, W3C Semantic Web Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 PGP Key: http://www.ivan-herman.net/pgpkey.html FOAF: http://www.ivan-herman.net/foaf.rdf
Received on Friday, 9 September 2011 08:07:47 UTC