- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Wed, 12 Oct 2011 05:27:35 +0200
- To: John Cowan <cowan@mercury.ccil.org>
- Cc: "Phillips, Addison" <addison@lab126.com>, Martin J. Dürst <duerst@it.aoyama.ac.jp>, Jeremy Carroll <jeremy@topquadrant.com>, "www-international@w3.org" <www-international@w3.org>, RDF Working Group WG <public-rdf-wg@w3.org>
John Cowan, Tue, 11 Oct 2011 10:57:45 -0400: > Phillips, Addison scripsit: > >> XML is an interesting case because it makes the opposite decision >> consciously: two canonically-equivalent but unequal identifiers are >> not equal. > > And this applies to both XML names and to namespace URIs. One - probably strong - reason why HTML5 could end up with the same solution as XML is that HTML5 has XML 1.0 compatibility as design goal. For that reason, it is also probably smart to focus on XML 1.0 if one wants to drive HTML5 in a particular direction ... Btw, I filed bug 12839 on 1st of June to make the HTML5 spec say that normalization should be performed on @id attributes before establishing whether they are unique or not.[1] If the proposal would go through, then <p id='å'> and <p id='å'> would be considered having he same value and thus would make the document invalid due to identical @id-s. In the discussion inside the bug report, the others, including Henri, wanted @id-s that differ only w.r.t. NFC and NFD, to be considered unique. Still, Validator.nu would consider @id variant with the decomposed character as invalid because it isn't NFC normalized. Still, I think HTML5 says nothing yet, about normalization. So I think this at best speaks about what Henri think HTML5 should say: That only early normalization should occur (read: @id values not in NFC form should be illegal). But if two equivalent variants of the same character occur in the same document, then parsers should still consider them different. W.r.t. to the CharmodNormSummary document, then for C005, I'd like to suggest two examples when the author might want to avoid NFC: If the author wants to style different parts a composed character differently - e.g. in different colors. HTML5 just made this legal - see bug 13502. Another example could be that some tests I made showed that, apart from file searching (with a IE as an exception to that again), 'accént' in decomposed form was treated more meaningful than 'accént' in composed form. I tested amongst other things the screenreaders Jaws, VoiceOver and NVDA to come to that - to myself - surprising conclusion. Simply put, the decomposed variant was the only variant that was universally meaningfully 'screen-read'. A third example could be authors that want to take advatage of NFD's symmetrical shape: e.g. if you want to sort words based on word length in a primitive fashion. [1] http://www.w3.org/Bugs/Public/show_bug.cgi?id=12839 [2] http://www.w3.org/International/wiki/ -- leif halvard silli
Received on Wednesday, 12 October 2011 03:28:22 UTC