W3C home > Mailing lists > Public > public-rdf-comments@w3.org > November 2013

Re: [Turtle] Please show/explain hexadecimal-encoded characters in comments

From: Eric Prud'hommeaux <eric@w3.org>
Date: Sat, 2 Nov 2013 19:46:15 -0400
To: David Booth <david@dbooth.org>
Cc: public-rdf-comments <public-rdf-comments@w3.org>
Message-ID: <20131102234614.GE13691@w3.org>
* David Booth <david@dbooth.org> [2013-05-08 14:45-0400]
> Regarding
> http://www.w3.org/TR/2013/CR-turtle-20130219/
> 
> As an RDF author I frequently refer to the EBNF syntax rules in
> section 6.5 to check a detail of the Turtle syntax, such as figuring
> out whether a particular character is permitted in a local name.
> For the most part the rules are easy to read.  But several of the
> rules specify unicode characters using hexadecimal, such as:
> 
> [161s] 	WS 	::= 	#x20 | #x9 | #xD | #xA
> [163s] 	PN_CHARS_BASE 	::= 	[A-Z] | [a-z] | [#x00C0-#x00D6] |
> [#x00D8-#x00F6] | [#x00F8-#x02FF] | [#x0370-#x037D] |
> [#x037F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] |
> [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] |
> [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
> [166s] 	PN_CHARS 	::= 	PN_CHARS_U | '-' | [0-9] | #x00B7 |
> [#x0300-#x036F] | [#x203F-#x2040]
> 
> Clearly it is necessary for clarity to use the hexadecimal notation
> in the production rules, so I certainly don't object to their use.
> But as a reader, it drives me bananas trying to figure out what
> those hexadecimal characters are -- searching the web, etc.
> 
> Please add some simple comments to the production rules, indicating
> what the hexadecimal-encoded characters are, so that readers don't
> have to go searching to figure it out.  Something like the following
> would be a big help:
> 
> /* See @@ add link to unicode table @@ */
> /*  #x20 = SPACE, #x9 = TAB, #xD = Carriage return, #xA = Line feed */
> [161s] 	WS 	::= 	#x20 | #x9 | #xD | #xA
> 
> /* #x00B7 = Middle dot, #x0300 = ??? (couldn't find that one) */
> [166s] 	PN_CHARS 	::= 	PN_CHARS_U | '-' | [0-9] | #x00B7 |
> [#x0300-#x036F] | [#x203F-#x2040]

To really help people wondering if some character is permitted in some
terminal, such a listing would have to include the character ranges,
not just the boundaries. As a compromise, I provisionally included
comments for ascii characters, specifically:

[18] 	IRIREF 	::= 	'<' ([^#x00-#x20<>\"{}|^`\] | UCHAR)* '>' /* #x00=NULL #01-#x1F=control codes #x20=space */
[22] 	STRING_LITERAL_QUOTE 	::= 	'"' ([^#x22#x5C#xA#xD] | ECHAR | UCHAR)* '"' /* #x22=" #x5C=\ #xA=new line #xD=carriage return */
[23] 	STRING_LITERAL_SINGLE_QUOTE 	::= 	"'" ([^#x27#x5C#xA#xD] | ECHAR | UCHAR)* "'" EXPONENT) /* #x27=' #x5C=\ #xA=new line #xD=carriage return */
[161s] 	WS 	::= 	#x20 | #x9 | #xD | #xA /* #x20=space #x9=character tabulation #xD=carriage return #xA=new line */

and will ask if the WG considers this extra text to be a net help.

If this comment addresses your comment, please reply with the subject
prefixed by "[RESOLVED]".


> Thanks,
> David
> 

-- 
-ericP

office: +1.617.599.3509
mobile: +33.6.80.80.35.59

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.

There are subtle nuances encoded in font variation and clever layout
which can only be seen by printing this message on high-clay paper.
Received on Saturday, 2 November 2013 23:46:47 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:29:58 UTC