PROPOSAL for %-encoding (was: Re: IMPORTANT: remaining issues for closing CR)

Michael,

Thanks for collecting the proposals. Here's a proposal for the %-encoding issue.

On 27 Apr 2012, at 13:19, Michael Hausenblas wrote:
> 3. DM cannot be implemented as an R2RML mapping (period encoding)
> 
> + http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2012Apr/0021.html
> 
> --> NO PROPOSAL known

== WHY A CHANGE IS NECESSARY ==

The Direct Mapping spec states: “The Direct Mapping is intended to provide a default behavior for R2RML: RDB to RDF Mapping Language [R2RML].” But this is impossible — for many DB schemas, one cannot write a Direct Mapping in R2RML, due to the different %-encoding rules. We have three options:

1) Forget about using the DM as a default mapping to R2RML, and delete the sentence above
2) DM adopts R2RML %-encoding rules
3) R2RML adopts DM %-encoding rules

The first is unacceptable to me. We *need* a default mapping for R2RML. The DM's usefulness as an R2RML default mapping is the only reason why I +1'd work on it in the first place.

Options 2 and 3 mean that one spec must adopt the encoding rules of the other. The R2RML encoding rules are based on RFC 3986 and RFC 3987 (the URI and IRI specs), while the DM encoding rules are custom and unlike anything else. Using established standards is good. Therefore, both specs should use the RFC 3986+3987 rules.

The knock-on effect is that the DM must use “safe” delimiter characters that are escaped by the standard RFC encoding rules. So I propose replacing the unsafe delimiters “.” and “-” with safe “;” and “=”. This gives us row IRIs like <Department/name=accounting;city=Cambridge>.


== CHANGE PROPOSAL ==

SUMMARY: “Direct Mapping: Change %-escaping rules to be compatible with R2RML. Change two delimiter characters to ones that are safe under these rules.”


PROPOSAL: In the Direct Mapping spec, do the following changes:


REMOVE: [[
These identifiers are separated by the punctuation characters '#', '.', '/' and '-'. All SQL identifiers are escaped following URL-encoding HTML form data except that only the above punctuation and the characters not permitted in RDF IRIs are escaped.
]]
ADD: [[
These identifiers are separated by the punctuation characters '#', ';', '/' and '='. All SQL identifiers are escaped following R2RML's escaping rules.
]]


In “Definition percent-encode”, REMOVE the following bullet point:
[[
  • For attribute names, replace each HYPHEN-MINUS character ('-', U+003d) with the string "%3D".
  • For attribute values, replace each FULL STOP character ('.', U+002e) with the string "%2E".
]]


In “Definition row node”, replace two bullet points:
REMOVE: [[
  • a HYPHEN-MINUS character '-',
  • if it is not the last column in the foreign key, a FULL STOP character '.'
]]
ADD: [[
  • an EQUALS SIGN character '=',
  • if it is not the last column in the foreign key, a SEMICOLON character ';'
]]


In “Definition reference property IRI”:
REMOVE: [[
  • if it is not the last column in the foreign key, a FULL STOP character '.'
]]
ADD: [[
  • if it is not the last column in the foreign key, a SEMICOLON character '.'
]]


Change all examples in Section 2 accordingly.

Change rules [37] and [40] in Appendix A.4 accordingly.

Change all DM test cases accordingly.



Best,
Richard

Received on Friday, 27 April 2012 13:28:50 UTC