W3C home > Mailing lists > Public > public-rdf-wg@w3.org > October 2011

Re: [Moderator Action] Re: Unicode NFC - status, and RDF Concepts

From: Ivan Herman <ivan@w3.org>
Date: Thu, 13 Oct 2011 06:48:54 +0100
Message-Id: <49260A6F-9455-4CAE-8164-F00AFEA34F03@w3.org>
To: John Cowan <cowan@mercury.ccil.org>, RDF Working Group WG <public-rdf-wg@w3.org>

On Oct 12, 2011, at 04:28 , Leif Halvard Silli wrote:

> John Cowan, Tue, 11 Oct 2011 10:57:45 -0400:
>> Phillips, Addison scripsit:
>>> XML is an interesting case because it makes the opposite decision
>>> consciously: two canonically-equivalent but unequal identifiers are
>>> not equal. 
>> And this applies to both XML names and to namespace URIs.
> One - probably strong - reason why HTML5 could end up with the same 
> solution as XML is that HTML5 has XML 1.0 compatibility as design goal. 
> For that reason, it is also probably smart to focus on XML 1.0 if one 
> wants to drive HTML5 in a particular direction ...
> Btw, I filed bug 12839 on 1st of June to make the HTML5 spec say that 
> normalization should be performed on @id attributes before establishing 
> whether they are unique or not.[1] If the proposal would go through, 
> then <p id='&#xe5;'> and <p id='a&#x30a;'> would be considered having 
> he same value and thus would make the document invalid due to identical 
> @id-s.
> In the discussion inside the bug report, the others, including Henri, 
> wanted @id-s that differ only w.r.t. NFC and NFD, to be considered 
> unique. Still, Validator.nu would consider @id variant with the 
> decomposed character as invalid because it isn't NFC normalized. Still, 
> I think HTML5 says nothing yet, about normalization. So I think this at 
> best speaks about what Henri think HTML5 should say: That only early 
> normalization should occur (read: @id values not in NFC form should be 
> illegal). But if two equivalent variants of the same character occur in 
> the same document, then parsers should still consider them different.
> W.r.t. to the CharmodNormSummary document, then for C005, I'd like to 
> suggest two examples when the author might want to avoid NFC: If the 
> author wants to style different parts a composed character differently 
> - e.g. in different colors. HTML5 just made this legal - see bug 13502.
> Another example could be that some tests I made showed that, apart from 
> file searching (with a IE as an exception to that again), 'accént'  in 
> decomposed form was treated more meaningful than 'accént' in composed 
> form. I tested amongst other things the screenreaders Jaws, VoiceOver 
> and NVDA to come to that - to myself - surprising conclusion. Simply 
> put, the decomposed variant was the only variant that was universally 
> meaningfully 'screen-read'.
> A third example could be authors that want to take advatage of NFD's 
> symmetrical shape: e.g. if you want to sort words based on word length 
> in a primitive fashion.
> [1] http://www.w3.org/Bugs/Public/show_bug.cgi?id=12839
> [2] http://www.w3.org/International/wiki/
> -- 
> leif halvard silli

Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf

Received on Thursday, 13 October 2011 05:47:37 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:04:09 UTC