W3C home > Mailing lists > Public > public-rdf-comments@w3.org > November 2013

Re: Which characters are allowed in IRIREF in Turtle 2013?

From: Eric Prud'hommeaux <eric@w3.org>
Date: Sat, 2 Nov 2013 19:10:45 -0400
To: Dave Beckett <dave@dajobe.org>
Cc: public-rdf-comments@w3.org
Message-ID: <20131102231044.GD13691@w3.org>
* Dave Beckett <dave@dajobe.org> [2013-03-04 09:40-0800]
> http://www.w3.org/TR/2013/CR-turtle-20130219/#grammar-production-IRIREF
> What characters (Unicode code points) are allowed in an IRIREF in turtle?
> the IRIREF grammar rule is:   [^#x00-#x20<>\"{}|^`\] | UCHAR)
> implies that for example U+007F is allowed since it's not in the
> escaped range.  Taking a look at the IRI RFC 3987 it has a more
> restricted range and taking the example U+007F is not allowed.
> There are many other Unicode codepoints that are not allowed.
> See the RFC987 rule 'ipchar' and it's expansion to 'ucschar'
> This rule should probably be completed so either it lists all the
> allowed characters or lists all the excluded ones (if the [^...]
> form remains)

The actual RFC3987 grammar is quite complex and the WG was unwilling
to copy that grammar (which is not LALR(1)/LL(1)) into the Turtle
grammar. I proposed some changes to surface in Turtle some of the
characters prohibited by RFC3987 but the WG never reached consensus
on that.

Any measure to restrict IRIREF would be incomplete. Finally, on 30
Oct, we resolved "WG will not copy the RFC3987 production for IRIs
into Turtle"

If you feel that we've addressed this comment, please reply with
"[RESOLVED]" in the subject.

> Dave


office: +1.617.599.3509
mobile: +

Feel free to forward this message to any list for any purpose other than
email address distribution.

There are subtle nuances encoded in font variation and clever layout
which can only be seen by printing this message on high-clay paper.
Received on Saturday, 2 November 2013 23:11:15 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:59:43 UTC