- From: Dave Beckett <dave.beckett@bristol.ac.uk>
- Date: Fri, 05 Aug 2005 11:41:03 +0100
- To: andy.seaborne@hp.com
- Cc: Dan Connolly <connolly@w3.org>, RDF Data Access Working Group <public-rdf-dawg@w3.org>
On Thu, 2005-08-04 at 16:56 +0100, Seaborne, Andy wrote: > > Dan Connolly wrote: > > On Thu, 2005-08-04 at 14:36 +0100, Seaborne, Andy wrote: > > [...] > > > >>>Turtle follows N-Triples and picks just uppercase for hex \u & \U > >>>escapes (I think there was something in the older charmod drafts about > >>>having just one way to encode it). I'd prefer to follow that [0-9] > >>>[A-F]. > >> > >>Can't find anything in current charmod. > > > > > > There are several requirements/guidelines near > > http://www.w3.org/TR/2005/REC-charmod-20050215/#def-char-escape > > > > e.g. > > > > > > C042 [S] Specifications SHOULD NOT invent a new escaping mechanism if > > an appropriate one already exists. > > \u is reasonably common (N-TRIPLES, Java, Python). When I added it along with \U to N-Triples I think I did it as it matched the style of the language - other \-escapes existed - and there was no other existing escape mechanism such as & as mentioned: > When in a XML protocol request &...; applies anyway. > > > > > and here's one that might bite a little harder: > > > > C046 ... In particular, if a character is acceptable in identifiers and > > comments, then its escaped form should also be acceptable. > > > > > so charmod says this should work: > > > > SELECT ?foo\x0045bar WHERE { ?foo\x0045bar dc:title ?xyz }. > > We can do one of: > 1/ Extend \u and \U to apply to variables > 2/ Make it so \u is on input processing (before tokenizing) > so it works everywhere including comments and is transparent to > parsing proper > > 1/ is probable clearer 2/ is probably easier to implement I just replied that 1/ is much easier for me: http://lists.w3.org/Archives/Public/public-rdf-dawg/2005JulSep/0199.html > A test is "SELEC\u0054" > > Another is the comment > > # \u escapes need thinking about. > > which is illegal by 2 but legal by 1. > > > C046 also says "this does not preclude that syntax-significant characters, when > escaped, lose their significance in the syntax." so I think we don't have to do > it for keywords or things like "?" but 2) would allow it. > > For \t etc it only applies in strings. I guess that comments are undefined so > \t is ambiguous as to whether it is "\" and "t" or a tab. \-escapes don't matter in comments as we provide no interpretation of them. They are allowed as it's in the sequence of characters allowed in comments. Dave
Received on Friday, 5 August 2005 10:41:09 UTC