- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Sat, 2 Nov 2013 19:10:45 -0400
- To: Dave Beckett <dave@dajobe.org>
- Cc: public-rdf-comments@w3.org
* Dave Beckett <dave@dajobe.org> [2013-03-04 09:40-0800] > http://www.w3.org/TR/2013/CR-turtle-20130219/#grammar-production-IRIREF > > What characters (Unicode code points) are allowed in an IRIREF in turtle? > > the IRIREF grammar rule is: [^#x00-#x20<>\"{}|^`\] | UCHAR) > > implies that for example U+007F is allowed since it's not in the > escaped range. Taking a look at the IRI RFC 3987 it has a more > restricted range and taking the example U+007F is not allowed. > There are many other Unicode codepoints that are not allowed. > > See the RFC987 rule 'ipchar' and it's expansion to 'ucschar' > > This rule should probably be completed so either it lists all the > allowed characters or lists all the excluded ones (if the [^...] > form remains) The actual RFC3987 grammar is quite complex and the WG was unwilling to copy that grammar (which is not LALR(1)/LL(1)) into the Turtle grammar. I proposed some changes to surface in Turtle some of the characters prohibited by RFC3987 but the WG never reached consensus on that. <http://lists.w3.org/Archives/Public/public-rdf-wg/2013Mar/thread#msg244> Any measure to restrict IRIREF would be incomplete. Finally, on 30 Oct, we resolved "WG will not copy the RFC3987 production for IRIs into Turtle" <https://www.w3.org/2013/meeting/rdf-wg/2013-10-30#resolution_5> If you feel that we've addressed this comment, please reply with "[RESOLVED]" in the subject. > Dave > > -- -ericP office: +1.617.599.3509 mobile: +33.6.80.80.35.59 (eric@w3.org) Feel free to forward this message to any list for any purpose other than email address distribution. There are subtle nuances encoded in font variation and clever layout which can only be seen by printing this message on high-clay paper.
Received on Saturday, 2 November 2013 23:11:15 UTC