Re: PROPOSAL for %-encoding (was: Re: IMPORTANT: remaining issues for closing CR)

On 29 Apr 2012, at 21:49, Eric Prud'hommeaux wrote:
>>> Another nearby issue is that R2RML users are limited in the separator characters that they can safely use in templates. A user creating a template like "Department/{NAME}-{CITY}" may not first inspect his data to make sure there's no '-' in the NAME column.
>> 
>> Well, the frequent case is numeric columns, and finding a safe separator for them is trivial.
>> 
>> All of the RFC 3987 sub-delims are *always* safe:
>> 
>>   sub-delims     = "!" / "$" / "&" / "'" / "(" / ")"
>>                  / "*" / "+" / "," / ";" / "="
> 
> Agreed, but there is some usability pressure to use separators which don't require escaping in prefixed names, which, as we see below, is impossible:
> 
> [166]  PN_LOCAL_ESC  ::=  '\' ( '_' | '~' | '.' | '-' | '!' | '$' | '&' | "'" | '(' | ')' | '*' | '+' | ',' | ';' | '=' | '/' | '?' | '#' | '@' | '%' )
> sub-delims             =                                "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="

I already responded to that:

1. In my experience, users seem to be happy enough with instance IRIs that cannot be prefix-abbreviated

2. Users *can* choose delimiters that can be prefix-abbreviated, *except* in the corner case of multiple PK columns that can potentially contain arbitrary characters

3. In that case, they'll have to use escapes anyways to deal with characters that need escaping in the values, so delimiters that also need escaping shouldn't be a problem

4. Delimiters in R2RML can be multiple characters, and finding a multi-char delimiter that is safe and doesn't require escaping is easy

So I think the current R2RML design is quite excellent from a usability point of view.
>> The cost of a change in the DM, to align it with RFC 3986+3987, as proposed below, seems smaller to me.
> 
> Spec-wise, sure, but we're also trying to guess what will be most appealing to users. You argue that having simple rules is good for them. I argue that slightly more complex rules (escaping '.'s and '-'s) could produce a smoother experience for users.

See above — I think it would actually degrade the user experience because it introduces hard-to-understand corner cases and incompatibilities that are impossible to resolve except by trawling through various grammar documents. Making our specs work the same way as everything else out there is better for users *and* for implementers.

> I apparently wasn't paying attention when we decided on the rule for PN_LOCAL_ESC because right now it seems crazy to me to require escaping really common word separators like '_' | '~' | '.' | '-'.

'_', '-' and '.' *may* be escaped in PN_LOCAL but are also allowed unescaped (if I read the grammar in the Turtle ED correctly). I wouldn't call '~' a common word separator. '~' was never allowed in prefixed names before AFAIK.

> I'm not exactly psyched to bring this up in SPARQL and RDF, but I suppose I should. Barring relaxing those rules, the custom escaping rules in DM don't have the desired payoff for e.g. dates in a primary key logns:time-2012-29-04T01:23:45.

I believe this is a legal prefixed name in the Turtle ED.

> I guess I could also just blow it all off and accept your proposal unchallenged. I wish I could spin off a parallel universe to see which one ends up curing cancer and paving the roads.

I think there wouldn't be much of a difference between the two universes at all for most people. The main difference would be for future implementers, spec authors, validator developers, and standards advocates, who will have to deal with N incompatible mechanisms that are designed for the same purpose in the one universe, and with N+1 incompatible mechanisms in the other.

Best,
Richard

Received on Tuesday, 1 May 2012 11:22:37 UTC