- From: Michael Hausenblas <michael.hausenblas@deri.org>
- Date: Fri, 9 Sep 2011 09:11:14 +0100
- To: Ivan Herman <ivan@w3.org>
- Cc: Eric Prud'hommeaux <eric@w3.org>, public-rdb2rdf-wg@w3.org
> P.S. Honestly: this is the type of issue we could spend *weeks*
> discussing, mainly because it has a distinct flavour of taste. We
> should really try to avoid that:-)
+1
Cheers,
Michael
--
Dr. Michael Hausenblas, Research Fellow
LiDRC - Linked Data Research Centre
DERI - Digital Enterprise Research Institute
NUIG - National University of Ireland, Galway
Ireland, Europe
Tel. +353 91 495730
http://linkeddata.deri.ie/
http://sw-app.org/about.html
On 9 Sep 2011, at 09:07, Ivan Herman wrote:
>
> On Sep 8, 2011, at 22:34 , Eric Prud'hommeaux wrote:
>
>> During the last meeting, we discussed picking a punctuation schema
>> but
>> asking the community for feedback on picking from a set of choices
>> (perfectly legit in an LC document).
>
> Just an editorial issue. I think this WG must choose one of these
> that reflect WG consensus. It is then perfectly legit to add a note
> to the LC saying that alternative schemes are possible, that we
> explicitly seek feed back on this, and point to a mail like this one
> (or a wiki page) that lists the alternatives. But, again, we must
> make a choice on this (last?) issue as soon as possible.
>
> As for myself, I must admit I do not have any strong feeling neither
> pro or con with any of these schemes. As you say, there is a slider
> here, which I can translate that there is no solution that covers
> every requirement. So we have to take a compromise. If so, I can
> personally live with any of these, as long as we have (finally!)
> fixed it...
>
> Ivan
>
> P.S. Honestly: this is the type of issue we could spend *weeks*
> discussing, mainly because it has a distinct flavour of taste. We
> should really try to avoid that:-)
>
>
>
>> This can help us pick:
>>
>>
>> = Problem =
>> Define rules which create unambiguous identifiers for database rows,
>> columns and references (foreign keys).
>> Extra credit if they are easy to parse by human or machine and easy
>> to express in SPARQL, Turtle, RIF, RDF/XML ("STRR" below).
>>
>> These URIs are composed from table and attribute names, attribute
>> values, and miscelaneous punctuation. This email is about tweaking
>> the punctuation to get the most simplicity in the most use cases.
>>
>> Rules in in <http://www.w3.org/2001/sw/rdb2rdf/directMapping/explicitFK
>> >:
>> Row IRI: base + table + '/' + attr¹ + '-' + val¹ + '.' … attrⁿ
>> + '-' + valⁿ
>> Column IRI: base + table + '#' + attr
>> Reference IRI: base + table + '#' + 'ref-' + attr¹ + '.' … attrⁿ
>>
>> This uses the '-' separator between attributes in both row IRIs and
>> reference IRIs. The attrⁿ/valⁿ separator is '.' (for simplicity
>> in
>> STRR). Outlining some popular choices:
>>
>> row IRI ref IRI
>> ① attr¹-val¹.attrⁿ-valⁿ ref-attr¹.attrⁿ
>> ② attr¹.val¹-attrⁿ.valⁿ ref-attr¹-attrⁿ
>> ③ attr¹-val¹.attrⁿ-valⁿ ref-attr¹-attrⁿ
>> ④ attr¹=val¹,attrⁿ=valⁿ ref-attr¹-attrⁿ
>> ⑤ attr¹.val¹.attrⁿ.valⁿ ref.attr¹.attrⁿ
>>
>>
>> = Examples =
>> Given some tables with PKs:
>> ┌┤Simple├────┬───────┐
>> ┌┤People├────┬─────────┐
>> ┌
>> ┤
>> Events
>> ├
>> ─
>> ─
>> ─
>> ─
>> ┬
>> ────────────┬─────────┐
>> │┌pk┐│ │ │
>> │
>> ┌──────────pk────────┐│
>> │
>> ┌
>> ─
>> ─
>> ─
>> ─
>> pk
>> ────┐│┌─────↬People.pk─────┐│
>> │ PK │ attrA │ attrB │ │ fname │ lname │
>> │ date │ orgfn │ orgln │
>> │ 1 │ valA1 │ valB2 │ │ "Bob" │ "Smith" │ │
>> 2012-01-01 │ "Bob" │ "Smith" │
>> │ 2 │ valA2 │ valB2 │ │ "Madonna" │ "" │ │
>> 2011-12-25 │ "Madonna" │ "" │
>> └────┴───────┴───────┘
>> │ "T in" │ "Ya-Li" │ │ 2012-04-06 │ "T in" │
>> "Ya-Li" │
>> │ "أكرم.عبد" │ "كور" │ │
>> 2011-10-01 │ "أكرم.عبد" │ "كور" │
>>
>> └
>> ─
>> ───────────┴─────────┘
>> └
>> ─
>> ─
>> ─
>> ─
>> ─
>> ─
>> ─
>> ─
>> ─
>> ─
>> ─
>> ─
>> ┴
>> ────────────┴─────────┘
>>
>> ┤Simple├ has your run-of-the-mill integer primary key and
>> alphanumeric
>> attribute names and values. ┤People├ and ┤Events├ have
>> alphanum attribute
>> names. (Attribute names which are not exclusively alpha-numeric are
>> horrible no matter what; they don't help us descriminate our
>> options.)
>>
>> == Example Row IRIs ==
>> We see these Row IRIs (eliding <base + ...>) for the first rows of
>> these tables, given the choices of punctuation listed above.
>>
>> ① Simple/PK-1 │ People/fname-Bob.lname-Smith │ Events/
>> date-2012-01-01
>> ② Simple/PK.1 │ People/fname.Bob-lname.Smith │ Events/date.
>> 2012%2D01%2D01
>> ③ Simple/PK.1 │ People/fname.Bob-lname.Smith │ Events/date.
>> 2012%2D01%2D01
>> ④ Simple/PK=1 │ People/fname=Bob,lname=Smith │ Events/
>> date=2012-01-01
>> ⑤ Simple/PK.1 │ People/fname.Bob.lname.Smith │ Events/date.
>> 2012-01-01
>>
>> == Reference (predicate) IRIs ==
>> Reference (predicate) IRIs for ┤Simple├ are simple and boring:
>> table#ref-attr .
>> ┤Events├'s references to ┤People├ take to two attributes:
>>
>> ① Events/ref-orgfn.orgln
>> ② Events/ref-orgfn-orgln
>> ③ Events/ref-orgfn-orgln
>> ④ Events/ref-orgfn-orgln
>> ⑤ Events/ref.orgfn.orgln
>>
>>
>> = What needs escaping =
>> The character used to separate attr/value pairs dictates which
>> characters require escaping in values. ②③ require escaping '-'s;
>> ①⑤ requires escaping '.'s and ④ requires escaping ','s. Row
>> identifiers for rows 3 and 4 of ┤People├ illustrate this:
>>
>> ① People/fname-T%20in.lname-Ya-Li │ People/fname-أكرم
>> %2Dعبد.lname-كور
>> ② People/fname.T%20in-lname.Ya%2DLi │ People/
>> fname.أكرم.عبد-lname%2Dكور
>> ③ People/fname.T%20in-lname.Ya%2DLi │ People/
>> fname.أكرم.عبد-lname%2Dكور
>> ④ People/fname=T%20in,lname=Ya-Li │ People/
>> fname=أكرم.عبد,lname=كور
>> ⑤ People/fname.T%20in.lname.Ya-Li │ People/fname.أكرم
>> %2Dعبد.lname.كور
>>
>> (We can also follow the HTML5, WSDL, ... url-encoding spec and
>> turn ' ' into '+' instead of '%2D'.)
>>
>>
>> = SPARQL, Turtle, RIF, RDF/XML =
>> RDF Rules (RIF BLD, SPARQL CONSTRUCT) generally express patterns over
>> predicates, without having to identify Row IRIs. Queries include Row
>> identifiers a bit more (the savvy user or tool will select an entity
>> by identifier rather than distinguishing attributes) and Turtle (the
>> data) will of course include both.
>>
>> All of these languages allow the use of relative IRIs and prefixed
>> names. A prefixed query of a People table for ① looks like:
>>
>> PREFIX pplinst: <http://hr.myco.example/2011/schemas/People/>
>> PREFIX pplschm: <http://hr.myco.example/2011/schemas/People#>
>> SELECT ?event
>> WHERE {
>> pplinst:fname-Bob.lname-Smith pplschm:atEvent ?event
>> }
>>
>> And the relative IRI query looks like:
>>
>> BASE <http://hr.myco.example/2011/schemas/>
>> SELECT ?event
>> WHERE {
>> <People/fname-Bob.lname-Smith> <People#atEvent> ?event
>> }
>>
>> Extending the use case to gain some SemWeb utility, we join two
>> databases, those of the HR and catering departments:
>>
>> PREFIX pplinst: <http://hr.myco.example/2011/schemas/People/>
>> PREFIX pplschm: <http://hr.myco.example/2011/schemas/People#>
>> PREFIX cater: <http://hr.myco.example/2011/schemas/People#>
>> SELECT ?start ?end
>> WHERE {
>> pplinst:fname-Bob.lname-Smith pplschm:atEvent ?event
>> ?event cater:start ?start ; cater:end ?end
>> }
>>
>> The customary URI escape character, '%', is not permitted in prefixed
>> names (nor are ',' and '='). The various row ID schemas have
>> different
>> impacts on the expressivity in prefixed names given different values:
>>
>> row ID pos int neg int alphanum date float
>> ① attr¹-val¹.attrⁿ-valⁿ ✓ ✓
>> ✓ ✓
>> ② attr¹.val¹-attrⁿ.valⁿ ✓
>> ✓ ✓
>> ④ attr¹=val¹,attrⁿ=valⁿ
>> ⑤ attr¹.val¹.attrⁿ.valⁿ ✓ ✓
>> ✓ ✓
>>
>> (③ varies from ① only in the reference IRIs)
>>
>> For an example of negative integer primary keys, this table uses -2
>> and -1 to represent a couple access control groups common to all
>> apache servers:
>>
>> ┌┤AccessRoles├───────┐
>> │┌pk┐│ │
>> │ ID │ desc │
>> │ -2 │ "known users" │
>> │ -1 │ "world" │
>> │ 1 │ "marketing" │
>> │ 2 │ "management" │
>> └────┴───────────────┘
>>
>>
>> = The balance =
>> I see us as pushing a slider around between optimizing between
>> readability ("attr¹=val¹,attrⁿ=valⁿ") and usability (being
>> able to
>> write/query the data with prefixed names). As Richard points out, we
>> can write/query the data for an individual database using an @base
>> directive and relative IRIs. This choice helps users write
>> data/queries as prefixed names (e.g. queries connecting multiple
>> databases).
>>
>> IMO, ④ is the most readable and ⑤ is the most usable, with ①
>> being my
>> idea of the sweet spot. ⑤ gives us the simplest encoding rules and
>> ②
>> is less likely to be confused with the '.' addressing scheme used in
>> SQL.
>>
>> --
>> -ericP
>>
>
>
> ----
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> PGP Key: http://www.ivan-herman.net/pgpkey.html
> FOAF: http://www.ivan-herman.net/foaf.rdf
>
>
>
>
>
>
Received on Friday, 9 September 2011 08:11:47 UTC