Re: ACTION-158 punctuation schema for the DM from Ivan Herman on 2011-09-09 (public-rdb2rdf-wg@w3.org from September 2011)

From: Ivan Herman <ivan@w3.org>
Date: Fri, 9 Sep 2011 10:07:49 +0200
To: Eric Prud'hommeaux <eric@w3.org>
Cc: public-rdb2rdf-wg@w3.org
Message-Id: <9C9964C1-668F-45BC-84EE-ABA170D195C5@w3.org>
On Sep 8, 2011, at 22:34 , Eric Prud'hommeaux wrote:

> During the last meeting, we discussed picking a punctuation schema but
> asking the community for feedback on picking from a set of choices
> (perfectly legit in an LC document).

Just an editorial issue. I think this WG must choose one of these that reflect WG consensus. It is then perfectly legit to add a note to the LC saying that alternative schemes are possible, that we explicitly seek feed back on this, and point to a mail like this one (or a wiki page) that lists the alternatives. But, again, we must make a choice on this (last?) issue as soon as possible.

As for myself, I must admit I do not have any strong feeling neither pro or con with any of these schemes. As you say, there is a slider here, which I can translate that there is no solution that covers every requirement. So we have to take a compromise. If so, I can personally live with any of these, as long as we have (finally!) fixed it...

Ivan

P.S. Honestly: this is the type of issue we could spend *weeks* discussing, mainly because it has a distinct flavour of taste. We should really try to avoid that:-)



> This can help us pick:
> 
> 
> = Problem =
>  Define rules which create unambiguous identifiers for database rows,
>  columns and references (foreign keys).
>  Extra credit if they are easy to parse by human or machine and easy
>  to express in SPARQL, Turtle, RIF, RDF/XML ("STRR" below).
> 
> These URIs are composed from table and attribute names, attribute
> values, and miscelaneous punctuation. This email is about tweaking
> the punctuation to get the most simplicity in the most use cases.
> 
> Rules in in <http://www.w3.org/2001/sw/rdb2rdf/directMapping/explicitFK>:
> Row IRI: base + table + '/' + attr¹ + '-' + val¹ + '.' … attrⁿ + '-' + valⁿ
> Column IRI: base + table + '#' + attr
> Reference IRI: base + table + '#' + 'ref-' + attr¹ + '.' … attrⁿ
> 
> This uses the '-' separator between attributes in both row IRIs and
> reference IRIs. The attrⁿ/valⁿ separator is '.' (for simplicity in
> STRR). Outlining some popular choices:
> 
>         row IRI              ref IRI
> ① attr¹-val¹.attrⁿ-valⁿ   ref-attr¹.attrⁿ
> ② attr¹.val¹-attrⁿ.valⁿ   ref-attr¹-attrⁿ
> ③ attr¹-val¹.attrⁿ-valⁿ   ref-attr¹-attrⁿ
> ④ attr¹=val¹,attrⁿ=valⁿ   ref-attr¹-attrⁿ
> ⑤ attr¹.val¹.attrⁿ.valⁿ   ref.attr¹.attrⁿ
> 
> 
> = Examples =
> Given some tables with PKs:
> ┌┤Simple├────┬───────┐  ┌┤People├────┬─────────┐  ┌┤Events├────┬────────────┬─────────┐
> │┌pk┐│       │       │  │┌──────────pk────────┐│  │┌────pk────┐│┌─────↬People.pk─────┐│
> │ PK │ attrA │ attrB │  │   fname    │  lname  │  │    date    │    orgfn   │  orgln  │
> │  1 │ valA1 │ valB2 │  │      "Bob" │ "Smith" │  │ 2012-01-01 │      "Bob" │ "Smith" │
> │  2 │ valA2 │ valB2 │  │  "Madonna" │      "" │  │ 2011-12-25 │  "Madonna" │      "" │
> └────┴───────┴───────┘  │     "T in" │ "Ya-Li" │  │ 2012-04-06 │     "T in" │ "Ya-Li" │
>                        │ "أكرم.عبد" │   "كور" │  │ 2011-10-01 │ "أكرم.عبد" │   "كور" │
>                        └────────────┴─────────┘  └────────────┴────────────┴─────────┘
> 
> ┤Simple├ has your run-of-the-mill integer primary key and alphanumeric
> attribute names and values. ┤People├ and ┤Events├ have alphanum attribute
> names. (Attribute names which are not exclusively alpha-numeric are
> horrible no matter what; they don't help us descriminate our options.)
> 
> == Example Row IRIs ==
> We see these Row IRIs (eliding <base + ...>) for the first rows of
> these tables, given the choices of punctuation listed above.
> 
>  ①  Simple/PK-1 │ People/fname-Bob.lname-Smith │ Events/date-2012-01-01
>  ②  Simple/PK.1 │ People/fname.Bob-lname.Smith │ Events/date.2012%2D01%2D01
>  ③  Simple/PK.1 │ People/fname.Bob-lname.Smith │ Events/date.2012%2D01%2D01
>  ④  Simple/PK=1 │ People/fname=Bob,lname=Smith │ Events/date=2012-01-01
>  ⑤  Simple/PK.1 │ People/fname.Bob.lname.Smith │ Events/date.2012-01-01
> 
> == Reference (predicate) IRIs ==
> Reference (predicate) IRIs for ┤Simple├ are simple and boring: table#ref-attr .
> ┤Events├'s references to ┤People├ take to two attributes:
> 
>  ①  Events/ref-orgfn.orgln
>  ②  Events/ref-orgfn-orgln
>  ③  Events/ref-orgfn-orgln
>  ④  Events/ref-orgfn-orgln
>  ⑤  Events/ref.orgfn.orgln
> 
> 
> = What needs escaping =
> The character used to separate attr/value pairs dictates which
> characters require escaping in values. ②③ require escaping '-'s;
> ①⑤ requires escaping '.'s and ④ requires escaping ','s. Row
> identifiers for rows 3 and 4 of ┤People├ illustrate this:
> 
>  ①  People/fname-T%20in.lname-Ya-Li   │ People/fname-أكرم%2Dعبد.lname-كور
>  ②  People/fname.T%20in-lname.Ya%2DLi │ People/fname.أكرم.عبد-lname%2Dكور
>  ③  People/fname.T%20in-lname.Ya%2DLi │ People/fname.أكرم.عبد-lname%2Dكور
>  ④  People/fname=T%20in,lname=Ya-Li   │ People/fname=أكرم.عبد,lname=كور
>  ⑤  People/fname.T%20in.lname.Ya-Li   │ People/fname.أكرم%2Dعبد.lname.كور
> 
> (We can also follow the HTML5, WSDL, ... url-encoding spec and
> turn ' ' into '+' instead of '%2D'.)
> 
> 
> = SPARQL, Turtle, RIF, RDF/XML =
> RDF Rules (RIF BLD, SPARQL CONSTRUCT) generally express patterns over
> predicates, without having to identify Row IRIs. Queries include Row
> identifiers a bit more (the savvy user or tool will select an entity
> by identifier rather than distinguishing attributes) and Turtle (the
> data) will of course include both.
> 
> All of these languages allow the use of relative IRIs and prefixed
> names. A prefixed query of a People table for ① looks like:
> 
>  PREFIX pplinst: <http://hr.myco.example/2011/schemas/People/>
>  PREFIX pplschm: <http://hr.myco.example/2011/schemas/People#>
>  SELECT ?event
>   WHERE {
>     pplinst:fname-Bob.lname-Smith pplschm:atEvent ?event
>   }
> 
> And the relative IRI query looks like:
> 
>  BASE <http://hr.myco.example/2011/schemas/>
>  SELECT ?event
>   WHERE {
>     <People/fname-Bob.lname-Smith> <People#atEvent> ?event
>   }
> 
> Extending the use case to gain some SemWeb utility, we join two
> databases, those of the HR and catering departments:
> 
>  PREFIX pplinst: <http://hr.myco.example/2011/schemas/People/>
>  PREFIX pplschm: <http://hr.myco.example/2011/schemas/People#>
>  PREFIX cater: <http://hr.myco.example/2011/schemas/People#>
>  SELECT ?start ?end
>   WHERE {
>     pplinst:fname-Bob.lname-Smith pplschm:atEvent ?event
>     ?event cater:start ?start ; cater:end ?end
>   }
> 
> The customary URI escape character, '%', is not permitted in prefixed
> names (nor are ',' and '='). The various row ID schemas have different
> impacts on the expressivity in prefixed names given different values:
> 
>         row ID            pos int   neg int   alphanum   date   float
> ① attr¹-val¹.attrⁿ-valⁿ       ✓         ✓         ✓        ✓
> ② attr¹.val¹-attrⁿ.valⁿ       ✓                   ✓                ✓
> ④ attr¹=val¹,attrⁿ=valⁿ
> ⑤ attr¹.val¹.attrⁿ.valⁿ       ✓         ✓         ✓        ✓
> 
> (③ varies from ① only in the reference IRIs)
> 
> For an example of negative integer primary keys, this table uses -2
> and -1 to represent a couple access control groups common to all
> apache servers:
> 
> ┌┤AccessRoles├───────┐
> │┌pk┐│               │
> │ ID │  desc         │
> │ -2 │ "known users" │
> │ -1 │       "world" │
> │  1 │   "marketing" │
> │  2 │  "management" │
> └────┴───────────────┘
> 
> 
> = The balance =
> I see us as pushing a slider around between optimizing between
> readability ("attr¹=val¹,attrⁿ=valⁿ") and usability (being able to
> write/query the data with prefixed names). As Richard points out, we
> can write/query the data for an individual database using an @base
> directive and relative IRIs. This choice helps users write
> data/queries as prefixed names (e.g. queries connecting multiple
> databases).
> 
> IMO, ④ is the most readable and ⑤ is the most usable, with ① being my
> idea of the sweet spot. ⑤ gives us the simplest encoding rules and ②
> is less likely to be confused with the '.' addressing scheme used in
> SQL.
> 
> -- 
> -ericP
> 


----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf
Received on Friday, 9 September 2011 08:07:47 UTC