RE: Fwd: Re: HRRIs, IRIs, etc

Can I clarify the status of some characters of the characters Martin
listed, please?

> http://www.w3.org/TR/REC-xml/#charsets allows (although, at least
> in never versions, discourages):
> [#xFDD0-#xFDDF],
> [#x1FFFE-#x1FFFF], [#x2FFFE-#x2FFFF], [#x3FFFE-#x3FFFF],
> [#x4FFFE-#x4FFFF], [#x5FFFE-#x5FFFF], [#x6FFFE-#x6FFFF],
> [#x7FFFE-#x7FFFF], [#x8FFFE-#x8FFFF], [#x9FFFE-#x9FFFF],
> [#xAFFFE-#xAFFFF], [#xBFFFE-#xBFFFF], [#xCFFFE-#xCFFFF],
> [#xDFFFE-#xDFFFF], [#xEFFFE-#xEFFFF], [#xFFFFE-#xFFFFF],
> [#x10FFFE-#x10FFFF]
> 
> In the IRI spec, these are excluded:
>    ucschar        = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF
>                   / %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD
>                   / %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD
>                   / %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD
>                   / %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD
>                   / %xD0000-DFFFD / %xE1000-EFFFD

I see XML discourages FDD*, but the ucschar excludes both FDD* and
FDE*.  Does anyone know the reason for this discrepancy?  FDE* seem to
be also "not a character".

ucschar also excludes E0***, which seem to be "tags" - what does that
mean?

ucschar also exclude FFF*, but XML makes no mention of them, except
of course FFFE and FFFF which aren't allowed in XML at all.

-- Richard

Received on Tuesday, 26 June 2007 09:23:29 UTC