W3C home > Mailing lists > Public > public-rdf-wg@w3.org > March 2013

Re: RDF-ISSUE-123 (localName chars): PN_CHARS_BASE permits up to U+EFFFF but RFC-3987 stops at U+EFFFD [RDF Turtle]

From: Gregg Kellogg <gregg@greggkellogg.net>
Date: Mon, 25 Mar 2013 09:08:41 -0700
Cc: Andy Seaborne <andy.seaborne@epimorphics.com>, public-rdf-wg@w3.org
Message-Id: <F1B10F36-F9F6-48FC-817B-D0E2D39FC1AF@greggkellogg.net>
To: Eric Prud'hommeaux <eric@w3.org>
On Mar 24, 2013, at 6:23 PM, Eric Prud'hommeaux <eric@w3.org> wrote:

> * Gregg Kellogg <gregg@greggkellogg.net> [2013-03-24 12:34-0700]
>> Just an FYI, there were some more tests that had chracters outside of the limit allowed by RFC-3987, in particular:
>> 
>> localName_with_PN_CHARS_BASE_character_boundaries.ttl
> 
> Did you mean to change \uf900 to \ufd90? in this test?

This must have been an accident; it's a pain working with these character codes! Indeed, ucschar includes %xF900-FDCF followed by %FDF0-FFEF, so to really test the ranges, it would be the character-set intersection of the Turtle grammar and ucschar from RFC-3987. Feel free to adjust accordingly, but this probably works reasonably well.

Gregg

>> localName_with_assigned_nfc_PN_CHARS_BASE_character_boundaries.ttl
>> localName_with_assigned_nfc_bmp_PN_CHARS_BASE_character_boundaries.ttl
>> localName_with_nfc_PN_CHARS_BASE_character_boundaries.ttl
>> 
>> Used characters after #FFEF. I took the liberty of updating the test files accordingly.
> 
> tx kindly
> 
> 
>> Gregg Kellogg
>> gregg@greggkellogg.net
>> 
>> On Mar 24, 2013, at 4:26 AM, Andy Seaborne <andy.seaborne@epimorphics.com> wrote:
>> 
>>> 
>>> 
>>> On 24/03/13 05:40, RDF Working Group Issue Tracker wrote:
>>>> RDF-ISSUE-123 (localName chars): PN_CHARS_BASE permits up to U+EFFFF but RFC-3987 stops at U+EFFFD [RDF Turtle]
>>>> 
>>>> http://www.w3.org/2011/rdf-wg/track/issues/123
>>>> 
>>>> Raised by: Eric Prud'hommeaux
>>>> On product: RDF Turtle
>>>> 
>>>> Gregg Kellogg pointed out in http://www.w3.org/mid/49EB390E-BCA6-401B-98EC-F4DD6A44AD0B@greggkellogg.net that Turtle's localNames overrun RFC-3987 iri by two characters. These two Unicode characters are reserved for process-internal use and thusly don't make sense in a global identification scheme.
>>>> 
>>>> Should we shave PN_CHARS_BASE down to [#x10000-#xEFFFF]? If this is a bug fix, can we do that without another LC?
>>>> 
>>>> 
>>>> 
>>> 
>>> I prefer Gregg's solution of making the the IRIs in tests legal by RFC 3987.  The grammar may be wider - it is anyway because we don't include an RFC 3986/3987 parser (or scheme specific rules).
>>> 
>>> 	Andy
>>> 
>> 
> 
> -- 
> -ericP
Received on Monday, 25 March 2013 16:09:12 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:04:26 UTC