W3C home > Mailing lists > Public > public-rdf-wg@w3.org > May 2012

Re: Unicode normalization in Turtle

From: Gavin Carothers <gavin@carothers.name>
Date: Tue, 8 May 2012 08:46:37 -0700
Message-ID: <CAPqY83wY8auc_iKrubQ++Yq94_xka6DUu3Y6wbLcNZ2QgX7A1w@mail.gmail.com>
To: Ivan Herman <ivan@w3.org>
Cc: David Wood <david@3roundstones.com>, Richard Cyganiak <richard@cyganiak.de>, RDF Working Group WG <public-rdf-wg@w3.org>
On Tue, May 8, 2012 at 7:40 AM, Ivan Herman <ivan@w3.org> wrote:
>
> On May 8, 2012, at 16:35 , David Wood wrote:
>
>> On May 8, 2012, at 10:06, Richard Cyganiak wrote:
>>
>>> Dear WG,
>>>
>>> The Turtle ED doesn't say anything about Unicode normalization. Should it?

I... don't think so?

>>>
>>> RDF Concepts says that the lexical forms of literals SHOULD be in Unicode Normal Form C (NFC). And in the XML-based syntaxes, XML itself explains Unicode normalization issues (at least XML 1.1 does). So I'd expect Turtle to also say something about it.
>>
>> Consistency of issues across serializations makes sense.
>>
>> Unfortunately, I don't see anything related to Unicode in RDFa 1.0 or 1.1 [1, 2].  Perhaps the intent for RDFa is to defer that to the HTML container.
>
> yes.

Errr... HTML doesn't do Unicode normalization. At least the current
WHAT WG spec and existing browsers don't. See
http://web.lookout.net/2012/03/unicode-normalization-in-urls.html

Also, I believe one of the differences between XML 5th Edition and XML
1.1 is Normalization Form C. Which is not part of most XML parsers
today. Instead this tends be handled else where for example:
http://www.w3.org/TR/xpath-functions/#func-normalize-unicode

Turtle could perhaps mention normalization, but the reality is that
just about all content everywhere is in NFC already.

Background reading for those who want to think about this more:

http://unicode.org/reports/tr15/
http://www.macchiato.com/unicode/nfc-faq
http://web.archive.org/web/20100510152617/http://diveintomark.org/archives/2004/07/06/nfc

Related note while in unicode hell, do we wish to define parsing
Turtle in terms of UTF-8 with error recovery? See
http://www.whatwg.org/specs/web-apps/current-work/multipage/infrastructure.html#utf-8
Better compatibility with random goo that gets screwed up by web
servers etc.

>
> Ivan
>
>
>>
>> Regards,
>> Dave
>>
>> [1] RDFa in XHTML: http://www.w3.org/TR/rdfa-syntax/
>> [2] RDFa Core 1.1: http://www.w3.org/TR/rdfa-core/
>>
>>
>>>
>>> (But I don't know enough about the issue to have any clue what it should say.)
>>>
>>> Best,
>>> Richard
>>
>>
>
>
> ----
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> FOAF: http://www.ivan-herman.net/foaf.rdf
>
>
>
>
>
>
Received on Tuesday, 8 May 2012 15:47:09 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:25:48 GMT