- From: Martin Duerst <duerst@it.aoyama.ac.jp>
- Date: Wed, 27 Jun 2007 19:38:22 +0900
- To: Richard Tobin <richard@inf.ed.ac.uk>, "Grosso, Paul" <pgrosso@ptc.com>
- Cc: <public-iri@w3.org>, "Richard Ishida" <ishida@w3.org>, "Felix Sasaki" <fsasaki@w3.org>, <www-xml-linking-comments@w3.org>, <public-xml-core-wg@w3.org>, public-i18n-core@w3.org
Hello Richard, Very good catch, thanks. John Cowan has mostly already explained things, I don't have much to add. The problem with Unicode is that there are (on purpose) very very many characters. Both really needed characters, but also oddities. In other cases, it might be easy to draw the line between useful and oddity clearly, but the large number of characters/code points effectively means that it's a slippery slope, and therefore different specs easily get out of sync. But I guess that in this area, it would also be possible to adapt the IRI spec slightly, if there are very specific preferences from the XML side. Regards, Martin. At 18:23 07/06/26, Richard Tobin wrote: >Can I clarify the status of some characters of the characters Martin >listed, please? > >> http://www.w3.org/TR/REC-xml/#charsets allows (although, at least >> in never versions, discourages): >> [#xFDD0-#xFDDF], >> [#x1FFFE-#x1FFFF], [#x2FFFE-#x2FFFF], [#x3FFFE-#x3FFFF], >> [#x4FFFE-#x4FFFF], [#x5FFFE-#x5FFFF], [#x6FFFE-#x6FFFF], >> [#x7FFFE-#x7FFFF], [#x8FFFE-#x8FFFF], [#x9FFFE-#x9FFFF], >> [#xAFFFE-#xAFFFF], [#xBFFFE-#xBFFFF], [#xCFFFE-#xCFFFF], >> [#xDFFFE-#xDFFFF], [#xEFFFE-#xEFFFF], [#xFFFFE-#xFFFFF], >> [#x10FFFE-#x10FFFF] >> >> In the IRI spec, these are excluded: >> ucschar = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF >> / %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD >> / %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD >> / %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD >> / %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD >> / %xD0000-DFFFD / %xE1000-EFFFD > >I see XML discourages FDD*, but the ucschar excludes both FDD* and >FDE*. Does anyone know the reason for this discrepancy? FDE* seem to >be also "not a character". > >ucschar also excludes E0***, which seem to be "tags" - what does that >mean? > >ucschar also exclude FFF*, but XML makes no mention of them, except >of course FFFE and FFFF which aren't allowed in XML at all. > >-- Richard #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University #-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp
Received on Thursday, 28 June 2007 01:10:13 UTC