Re: PROPOSAL for %-encoding (was: Re: IMPORTANT: remaining issues for closing CR)

* Richard Cyganiak <richard@cyganiak.de> [2012-04-27 14:28+0100]
> Michael,
> 
> Thanks for collecting the proposals. Here's a proposal for the %-encoding issue.
> 
> On 27 Apr 2012, at 13:19, Michael Hausenblas wrote:
> > 3. DM cannot be implemented as an R2RML mapping (period encoding)
> > 
> > + http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2012Apr/0021.html
> > 
> > --> NO PROPOSAL known
> 
> == WHY A CHANGE IS NECESSARY ==
> 
> The Direct Mapping spec states: “The Direct Mapping is intended to provide a default behavior for R2RML: RDB to RDF Mapping Language [R2RML].” But this is impossible — for many DB schemas, one cannot write a Direct Mapping in R2RML, due to the different %-encoding rules. We have three options:
> 
> 1) Forget about using the DM as a default mapping to R2RML, and delete the sentence above
> 2) DM adopts R2RML %-encoding rules
> 3) R2RML adopts DM %-encoding rules
> 
> The first is unacceptable to me. We *need* a default mapping for R2RML. The DM's usefulness as an R2RML default mapping is the only reason why I +1'd work on it in the first place.
> 
> Options 2 and 3 mean that one spec must adopt the encoding rules of the other. The R2RML encoding rules are based on RFC 3986 and RFC 3987 (the URI and IRI specs), while the DM encoding rules are custom and unlike anything else. Using established standards is good. Therefore, both specs should use the RFC 3986+3987 rules.
> 
> The knock-on effect is that the DM must use “safe” delimiter characters that are escaped by the standard RFC encoding rules. So I propose replacing the unsafe delimiters “.” and “-” with safe “;” and “=”. This gives us row IRIs like <Department/name=accounting;city=Cambridge>.

Recent changes to the specs in SPARQL and Turtle mean that (once these changes are deployed) this URL can be written in SPARQL/Turtle as <Department/name\=accounting\;city\=Cambridge>. This doesn't make the eyes bleed but it is a minor usability impediment 'cause you can't cut and paste those particular URLs from e.g query results to a another query.

Another nearby issue is that R2RML users are limited in the separator characters that they can safely use in templates. A user creating a template like "Department/{NAME}-{CITY}" may not first inspect his data to make sure there's no '-' in the NAME column. As it stands, users will probably want to use separators which need not be escaped in SPARQL/Turtle. As it turns out, that leaves no safe characters, i.e. chars which are escaped in {}s but not in SPARQL. One solution which may help users is the parameterized escape, e.g. "Department/{-|NAME}-{CITY}", which would also make DM URLs templatable. I guess we're in 2nd LC territory anyways, 


> == CHANGE PROPOSAL ==
> 
> SUMMARY: “Direct Mapping: Change %-escaping rules to be compatible with R2RML. Change two delimiter characters to ones that are safe under these rules.”
> 
> 
> PROPOSAL: In the Direct Mapping spec, do the following changes:
> 
> 
> REMOVE: [[
> These identifiers are separated by the punctuation characters '#', '.', '/' and '-'. All SQL identifiers are escaped following URL-encoding HTML form data except that only the above punctuation and the characters not permitted in RDF IRIs are escaped.
> ]]
> ADD: [[
> These identifiers are separated by the punctuation characters '#', ';', '/' and '='. All SQL identifiers are escaped following R2RML's escaping rules.
> ]]
> 
> 
> In “Definition percent-encode”, REMOVE the following bullet point:
> [[
>   • For attribute names, replace each HYPHEN-MINUS character ('-', U+003d) with the string "%3D".
>   • For attribute values, replace each FULL STOP character ('.', U+002e) with the string "%2E".
> ]]
> 
> 
> In “Definition row node”, replace two bullet points:
> REMOVE: [[
>   • a HYPHEN-MINUS character '-',
>   • if it is not the last column in the foreign key, a FULL STOP character '.'
> ]]
> ADD: [[
>   • an EQUALS SIGN character '=',
>   • if it is not the last column in the foreign key, a SEMICOLON character ';'
> ]]
> 
> 
> In “Definition reference property IRI”:
> REMOVE: [[
>   • if it is not the last column in the foreign key, a FULL STOP character '.'
> ]]
> ADD: [[
>   • if it is not the last column in the foreign key, a SEMICOLON character '.'
> ]]
> 
> 
> Change all examples in Section 2 accordingly.
> 
> Change rules [37] and [40] in Appendix A.4 accordingly.
> 
> Change all DM test cases accordingly.
> 
> 
> 
> Best,
> Richard

-- 
-ericP

Received on Friday, 27 April 2012 17:01:56 UTC