Re: ACTION-158 punctuation schema for the DM from Michael Hausenblas on 2011-09-09 (public-rdb2rdf-wg@w3.org from September 2011)

From: Michael Hausenblas <michael.hausenblas@deri.org>
Date: Fri, 9 Sep 2011 09:11:14 +0100
To: Ivan Herman <ivan@w3.org>
Cc: Eric Prud'hommeaux <eric@w3.org>, public-rdb2rdf-wg@w3.org
Message-Id: <2B94937A-77DD-4C3A-BFCB-3A9A3C31D18D@deri.org>
> P.S. Honestly: this is the type of issue we could spend *weeks*  
> discussing, mainly because it has a distinct flavour of taste. We  
> should really try to avoid that:-)

+1

Cheers,
 Michael
--
Dr. Michael Hausenblas, Research Fellow
LiDRC - Linked Data Research Centre
DERI - Digital Enterprise Research Institute
NUIG - National University of Ireland, Galway
Ireland, Europe
Tel. +353 91 495730
http://linkeddata.deri.ie/
http://sw-app.org/about.html

On 9 Sep 2011, at 09:07, Ivan Herman wrote:

>
> On Sep 8, 2011, at 22:34 , Eric Prud'hommeaux wrote:
>
>> During the last meeting, we discussed picking a punctuation schema  
>> but
>> asking the community for feedback on picking from a set of choices
>> (perfectly legit in an LC document).
>
> Just an editorial issue. I think this WG must choose one of these  
> that reflect WG consensus. It is then perfectly legit to add a note  
> to the LC saying that alternative schemes are possible, that we  
> explicitly seek feed back on this, and point to a mail like this one  
> (or a wiki page) that lists the alternatives. But, again, we must  
> make a choice on this (last?) issue as soon as possible.
>
> As for myself, I must admit I do not have any strong feeling neither  
> pro or con with any of these schemes. As you say, there is a slider  
> here, which I can translate that there is no solution that covers  
> every requirement. So we have to take a compromise. If so, I can  
> personally live with any of these, as long as we have (finally!)  
> fixed it...
>
> Ivan
>
> P.S. Honestly: this is the type of issue we could spend *weeks*  
> discussing, mainly because it has a distinct flavour of taste. We  
> should really try to avoid that:-)
>
>
>
>> This can help us pick:
>>
>>
>> = Problem =
>> Define rules which create unambiguous identifiers for database rows,
>> columns and references (foreign keys).
>> Extra credit if they are easy to parse by human or machine and easy
>> to express in SPARQL, Turtle, RIF, RDF/XML ("STRR" below).
>>
>> These URIs are composed from table and attribute names, attribute
>> values, and miscelaneous punctuation. This email is about tweaking
>> the punctuation to get the most simplicity in the most use cases.
>>
>> Rules in in <http://www.w3.org/2001/sw/rdb2rdf/directMapping/explicitFK 
>> >:
>> Row IRI: base + table + '/' + attr¹ + '-' + val¹ + '.' … attrⁿ  
>> + '-' + valⁿ
>> Column IRI: base + table + '#' + attr
>> Reference IRI: base + table + '#' + 'ref-' + attr¹ + '.' … attrⁿ
>>
>> This uses the '-' separator between attributes in both row IRIs and
>> reference IRIs. The attrⁿ/valⁿ separator is '.' (for simplicity  
>> in
>> STRR). Outlining some popular choices:
>>
>>        row IRI              ref IRI
>> ① attr¹-val¹.attrⁿ-valⁿ   ref-attr¹.attrⁿ
>> ② attr¹.val¹-attrⁿ.valⁿ   ref-attr¹-attrⁿ
>> ③ attr¹-val¹.attrⁿ-valⁿ   ref-attr¹-attrⁿ
>> ④ attr¹=val¹,attrⁿ=valⁿ   ref-attr¹-attrⁿ
>> ⑤ attr¹.val¹.attrⁿ.valⁿ   ref.attr¹.attrⁿ
>>
>>
>> = Examples =
>> Given some tables with PKs:
>> ┌┤Simple├────┬───────┐   
>> ┌┤People├────┬─────────┐   
>> ┌ 
>> ┤ 
>> Events 
>> ├ 
>> ─ 
>> ─ 
>> ─ 
>> ─ 
>> ┬ 
>> ────────────┬─────────┐
>> │┌pk┐│       │       │   
>> │ 
>> ┌──────────pk────────┐│   
>> │ 
>> ┌ 
>> ─ 
>> ─ 
>> ─ 
>> ─ 
>> pk 
>> ────┐│┌─────↬People.pk─────┐│
>> │ PK │ attrA │ attrB │  │   fname    │  lname  │   
>> │    date    │    orgfn   │  orgln  │
>> │  1 │ valA1 │ valB2 │  │      "Bob" │ "Smith" │  │  
>> 2012-01-01 │      "Bob" │ "Smith" │
>> │  2 │ valA2 │ valB2 │  │  "Madonna" │      "" │  │  
>> 2011-12-25 │  "Madonna" │      "" │
>> └────┴───────┴───────┘   
>> │     "T in" │ "Ya-Li" │  │ 2012-04-06 │     "T in" │  
>> "Ya-Li" │
>>                       │ "أكرم.عبد" │   "كور" │  │  
>> 2011-10-01 │ "أكرم.عبد" │   "كور" │
>>                        
>> └ 
>> ─ 
>> ───────────┴─────────┘   
>> └ 
>> ─ 
>> ─ 
>> ─ 
>> ─ 
>> ─ 
>> ─ 
>> ─ 
>> ─ 
>> ─ 
>> ─ 
>> ─ 
>> ─ 
>> ┴ 
>> ────────────┴─────────┘
>>
>> ┤Simple├ has your run-of-the-mill integer primary key and  
>> alphanumeric
>> attribute names and values. ┤People├ and ┤Events├ have  
>> alphanum attribute
>> names. (Attribute names which are not exclusively alpha-numeric are
>> horrible no matter what; they don't help us descriminate our  
>> options.)
>>
>> == Example Row IRIs ==
>> We see these Row IRIs (eliding <base + ...>) for the first rows of
>> these tables, given the choices of punctuation listed above.
>>
>> ①  Simple/PK-1 │ People/fname-Bob.lname-Smith │ Events/ 
>> date-2012-01-01
>> ②  Simple/PK.1 │ People/fname.Bob-lname.Smith │ Events/date. 
>> 2012%2D01%2D01
>> ③  Simple/PK.1 │ People/fname.Bob-lname.Smith │ Events/date. 
>> 2012%2D01%2D01
>> ④  Simple/PK=1 │ People/fname=Bob,lname=Smith │ Events/ 
>> date=2012-01-01
>> ⑤  Simple/PK.1 │ People/fname.Bob.lname.Smith │ Events/date. 
>> 2012-01-01
>>
>> == Reference (predicate) IRIs ==
>> Reference (predicate) IRIs for ┤Simple├ are simple and boring:  
>> table#ref-attr .
>> ┤Events├'s references to ┤People├ take to two attributes:
>>
>> ①  Events/ref-orgfn.orgln
>> ②  Events/ref-orgfn-orgln
>> ③  Events/ref-orgfn-orgln
>> ④  Events/ref-orgfn-orgln
>> ⑤  Events/ref.orgfn.orgln
>>
>>
>> = What needs escaping =
>> The character used to separate attr/value pairs dictates which
>> characters require escaping in values. ②③ require escaping '-'s;
>> ①⑤ requires escaping '.'s and ④ requires escaping ','s. Row
>> identifiers for rows 3 and 4 of ┤People├ illustrate this:
>>
>> ①  People/fname-T%20in.lname-Ya-Li   │ People/fname-أكرم 
>> %2Dعبد.lname-كور
>> ②  People/fname.T%20in-lname.Ya%2DLi │ People/ 
>> fname.أكرم.عبد-lname%2Dكور
>> ③  People/fname.T%20in-lname.Ya%2DLi │ People/ 
>> fname.أكرم.عبد-lname%2Dكور
>> ④  People/fname=T%20in,lname=Ya-Li   │ People/ 
>> fname=أكرم.عبد,lname=كور
>> ⑤  People/fname.T%20in.lname.Ya-Li   │ People/fname.أكرم 
>> %2Dعبد.lname.كور
>>
>> (We can also follow the HTML5, WSDL, ... url-encoding spec and
>> turn ' ' into '+' instead of '%2D'.)
>>
>>
>> = SPARQL, Turtle, RIF, RDF/XML =
>> RDF Rules (RIF BLD, SPARQL CONSTRUCT) generally express patterns over
>> predicates, without having to identify Row IRIs. Queries include Row
>> identifiers a bit more (the savvy user or tool will select an entity
>> by identifier rather than distinguishing attributes) and Turtle (the
>> data) will of course include both.
>>
>> All of these languages allow the use of relative IRIs and prefixed
>> names. A prefixed query of a People table for ① looks like:
>>
>> PREFIX pplinst: <http://hr.myco.example/2011/schemas/People/>
>> PREFIX pplschm: <http://hr.myco.example/2011/schemas/People#>
>> SELECT ?event
>>  WHERE {
>>    pplinst:fname-Bob.lname-Smith pplschm:atEvent ?event
>>  }
>>
>> And the relative IRI query looks like:
>>
>> BASE <http://hr.myco.example/2011/schemas/>
>> SELECT ?event
>>  WHERE {
>>    <People/fname-Bob.lname-Smith> <People#atEvent> ?event
>>  }
>>
>> Extending the use case to gain some SemWeb utility, we join two
>> databases, those of the HR and catering departments:
>>
>> PREFIX pplinst: <http://hr.myco.example/2011/schemas/People/>
>> PREFIX pplschm: <http://hr.myco.example/2011/schemas/People#>
>> PREFIX cater: <http://hr.myco.example/2011/schemas/People#>
>> SELECT ?start ?end
>>  WHERE {
>>    pplinst:fname-Bob.lname-Smith pplschm:atEvent ?event
>>    ?event cater:start ?start ; cater:end ?end
>>  }
>>
>> The customary URI escape character, '%', is not permitted in prefixed
>> names (nor are ',' and '='). The various row ID schemas have  
>> different
>> impacts on the expressivity in prefixed names given different values:
>>
>>        row ID            pos int   neg int   alphanum   date   float
>> ① attr¹-val¹.attrⁿ-valⁿ       ✓         ✓          
>> ✓        ✓
>> ② attr¹.val¹-attrⁿ.valⁿ       ✓                    
>> ✓                ✓
>> ④ attr¹=val¹,attrⁿ=valⁿ
>> ⑤ attr¹.val¹.attrⁿ.valⁿ       ✓         ✓          
>> ✓        ✓
>>
>> (③ varies from ① only in the reference IRIs)
>>
>> For an example of negative integer primary keys, this table uses -2
>> and -1 to represent a couple access control groups common to all
>> apache servers:
>>
>> ┌┤AccessRoles├───────┐
>> │┌pk┐│               │
>> │ ID │  desc         │
>> │ -2 │ "known users" │
>> │ -1 │       "world" │
>> │  1 │   "marketing" │
>> │  2 │  "management" │
>> └────┴───────────────┘
>>
>>
>> = The balance =
>> I see us as pushing a slider around between optimizing between
>> readability ("attr¹=val¹,attrⁿ=valⁿ") and usability (being  
>> able to
>> write/query the data with prefixed names). As Richard points out, we
>> can write/query the data for an individual database using an @base
>> directive and relative IRIs. This choice helps users write
>> data/queries as prefixed names (e.g. queries connecting multiple
>> databases).
>>
>> IMO, ④ is the most readable and ⑤ is the most usable, with ①  
>> being my
>> idea of the sweet spot. ⑤ gives us the simplest encoding rules and  
>> ②
>> is less likely to be confused with the '.' addressing scheme used in
>> SQL.
>>
>> -- 
>> -ericP
>>
>
>
> ----
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> PGP Key: http://www.ivan-herman.net/pgpkey.html
> FOAF: http://www.ivan-herman.net/foaf.rdf
>
>
>
>
>
>
Received on Friday, 9 September 2011 08:11:47 UTC