- From: Ivan Herman <ivan@w3.org>
- Date: Fri, 9 Sep 2011 10:07:49 +0200
- To: Eric Prud'hommeaux <eric@w3.org>
- Cc: public-rdb2rdf-wg@w3.org
On Sep 8, 2011, at 22:34 , Eric Prud'hommeaux wrote:
> During the last meeting, we discussed picking a punctuation schema but
> asking the community for feedback on picking from a set of choices
> (perfectly legit in an LC document).
Just an editorial issue. I think this WG must choose one of these that reflect WG consensus. It is then perfectly legit to add a note to the LC saying that alternative schemes are possible, that we explicitly seek feed back on this, and point to a mail like this one (or a wiki page) that lists the alternatives. But, again, we must make a choice on this (last?) issue as soon as possible.
As for myself, I must admit I do not have any strong feeling neither pro or con with any of these schemes. As you say, there is a slider here, which I can translate that there is no solution that covers every requirement. So we have to take a compromise. If so, I can personally live with any of these, as long as we have (finally!) fixed it...
Ivan
P.S. Honestly: this is the type of issue we could spend *weeks* discussing, mainly because it has a distinct flavour of taste. We should really try to avoid that:-)
> This can help us pick:
>
>
> = Problem =
> Define rules which create unambiguous identifiers for database rows,
> columns and references (foreign keys).
> Extra credit if they are easy to parse by human or machine and easy
> to express in SPARQL, Turtle, RIF, RDF/XML ("STRR" below).
>
> These URIs are composed from table and attribute names, attribute
> values, and miscelaneous punctuation. This email is about tweaking
> the punctuation to get the most simplicity in the most use cases.
>
> Rules in in <http://www.w3.org/2001/sw/rdb2rdf/directMapping/explicitFK>:
> Row IRI: base + table + '/' + attr¹ + '-' + val¹ + '.' … attrⁿ + '-' + valⁿ
> Column IRI: base + table + '#' + attr
> Reference IRI: base + table + '#' + 'ref-' + attr¹ + '.' … attrⁿ
>
> This uses the '-' separator between attributes in both row IRIs and
> reference IRIs. The attrⁿ/valⁿ separator is '.' (for simplicity in
> STRR). Outlining some popular choices:
>
> row IRI ref IRI
> ① attr¹-val¹.attrⁿ-valⁿ ref-attr¹.attrⁿ
> ② attr¹.val¹-attrⁿ.valⁿ ref-attr¹-attrⁿ
> ③ attr¹-val¹.attrⁿ-valⁿ ref-attr¹-attrⁿ
> ④ attr¹=val¹,attrⁿ=valⁿ ref-attr¹-attrⁿ
> ⑤ attr¹.val¹.attrⁿ.valⁿ ref.attr¹.attrⁿ
>
>
> = Examples =
> Given some tables with PKs:
> ┌┤Simple├────┬───────┐ ┌┤People├────┬─────────┐ ┌┤Events├────┬────────────┬─────────┐
> │┌pk┐│ │ │ │┌──────────pk────────┐│ │┌────pk────┐│┌─────↬People.pk─────┐│
> │ PK │ attrA │ attrB │ │ fname │ lname │ │ date │ orgfn │ orgln │
> │ 1 │ valA1 │ valB2 │ │ "Bob" │ "Smith" │ │ 2012-01-01 │ "Bob" │ "Smith" │
> │ 2 │ valA2 │ valB2 │ │ "Madonna" │ "" │ │ 2011-12-25 │ "Madonna" │ "" │
> └────┴───────┴───────┘ │ "T in" │ "Ya-Li" │ │ 2012-04-06 │ "T in" │ "Ya-Li" │
> │ "أكرم.عبد" │ "كور" │ │ 2011-10-01 │ "أكرم.عبد" │ "كور" │
> └────────────┴─────────┘ └────────────┴────────────┴─────────┘
>
> ┤Simple├ has your run-of-the-mill integer primary key and alphanumeric
> attribute names and values. ┤People├ and ┤Events├ have alphanum attribute
> names. (Attribute names which are not exclusively alpha-numeric are
> horrible no matter what; they don't help us descriminate our options.)
>
> == Example Row IRIs ==
> We see these Row IRIs (eliding <base + ...>) for the first rows of
> these tables, given the choices of punctuation listed above.
>
> ① Simple/PK-1 │ People/fname-Bob.lname-Smith │ Events/date-2012-01-01
> ② Simple/PK.1 │ People/fname.Bob-lname.Smith │ Events/date.2012%2D01%2D01
> ③ Simple/PK.1 │ People/fname.Bob-lname.Smith │ Events/date.2012%2D01%2D01
> ④ Simple/PK=1 │ People/fname=Bob,lname=Smith │ Events/date=2012-01-01
> ⑤ Simple/PK.1 │ People/fname.Bob.lname.Smith │ Events/date.2012-01-01
>
> == Reference (predicate) IRIs ==
> Reference (predicate) IRIs for ┤Simple├ are simple and boring: table#ref-attr .
> ┤Events├'s references to ┤People├ take to two attributes:
>
> ① Events/ref-orgfn.orgln
> ② Events/ref-orgfn-orgln
> ③ Events/ref-orgfn-orgln
> ④ Events/ref-orgfn-orgln
> ⑤ Events/ref.orgfn.orgln
>
>
> = What needs escaping =
> The character used to separate attr/value pairs dictates which
> characters require escaping in values. ②③ require escaping '-'s;
> ①⑤ requires escaping '.'s and ④ requires escaping ','s. Row
> identifiers for rows 3 and 4 of ┤People├ illustrate this:
>
> ① People/fname-T%20in.lname-Ya-Li │ People/fname-أكرم%2Dعبد.lname-كور
> ② People/fname.T%20in-lname.Ya%2DLi │ People/fname.أكرم.عبد-lname%2Dكور
> ③ People/fname.T%20in-lname.Ya%2DLi │ People/fname.أكرم.عبد-lname%2Dكور
> ④ People/fname=T%20in,lname=Ya-Li │ People/fname=أكرم.عبد,lname=كور
> ⑤ People/fname.T%20in.lname.Ya-Li │ People/fname.أكرم%2Dعبد.lname.كور
>
> (We can also follow the HTML5, WSDL, ... url-encoding spec and
> turn ' ' into '+' instead of '%2D'.)
>
>
> = SPARQL, Turtle, RIF, RDF/XML =
> RDF Rules (RIF BLD, SPARQL CONSTRUCT) generally express patterns over
> predicates, without having to identify Row IRIs. Queries include Row
> identifiers a bit more (the savvy user or tool will select an entity
> by identifier rather than distinguishing attributes) and Turtle (the
> data) will of course include both.
>
> All of these languages allow the use of relative IRIs and prefixed
> names. A prefixed query of a People table for ① looks like:
>
> PREFIX pplinst: <http://hr.myco.example/2011/schemas/People/>
> PREFIX pplschm: <http://hr.myco.example/2011/schemas/People#>
> SELECT ?event
> WHERE {
> pplinst:fname-Bob.lname-Smith pplschm:atEvent ?event
> }
>
> And the relative IRI query looks like:
>
> BASE <http://hr.myco.example/2011/schemas/>
> SELECT ?event
> WHERE {
> <People/fname-Bob.lname-Smith> <People#atEvent> ?event
> }
>
> Extending the use case to gain some SemWeb utility, we join two
> databases, those of the HR and catering departments:
>
> PREFIX pplinst: <http://hr.myco.example/2011/schemas/People/>
> PREFIX pplschm: <http://hr.myco.example/2011/schemas/People#>
> PREFIX cater: <http://hr.myco.example/2011/schemas/People#>
> SELECT ?start ?end
> WHERE {
> pplinst:fname-Bob.lname-Smith pplschm:atEvent ?event
> ?event cater:start ?start ; cater:end ?end
> }
>
> The customary URI escape character, '%', is not permitted in prefixed
> names (nor are ',' and '='). The various row ID schemas have different
> impacts on the expressivity in prefixed names given different values:
>
> row ID pos int neg int alphanum date float
> ① attr¹-val¹.attrⁿ-valⁿ ✓ ✓ ✓ ✓
> ② attr¹.val¹-attrⁿ.valⁿ ✓ ✓ ✓
> ④ attr¹=val¹,attrⁿ=valⁿ
> ⑤ attr¹.val¹.attrⁿ.valⁿ ✓ ✓ ✓ ✓
>
> (③ varies from ① only in the reference IRIs)
>
> For an example of negative integer primary keys, this table uses -2
> and -1 to represent a couple access control groups common to all
> apache servers:
>
> ┌┤AccessRoles├───────┐
> │┌pk┐│ │
> │ ID │ desc │
> │ -2 │ "known users" │
> │ -1 │ "world" │
> │ 1 │ "marketing" │
> │ 2 │ "management" │
> └────┴───────────────┘
>
>
> = The balance =
> I see us as pushing a slider around between optimizing between
> readability ("attr¹=val¹,attrⁿ=valⁿ") and usability (being able to
> write/query the data with prefixed names). As Richard points out, we
> can write/query the data for an individual database using an @base
> directive and relative IRIs. This choice helps users write
> data/queries as prefixed names (e.g. queries connecting multiple
> databases).
>
> IMO, ④ is the most readable and ⑤ is the most usable, with ① being my
> idea of the sweet spot. ⑤ gives us the simplest encoding rules and ②
> is less likely to be confused with the '.' addressing scheme used in
> SQL.
>
> --
> -ericP
>
----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf
Received on Friday, 9 September 2011 08:07:47 UTC