W3C home > Mailing lists > Public > public-xml-core-wg@w3.org > June 2007

RE: Fwd: Re: HRRIs, IRIs, etc

From: Richard Tobin <richard@inf.ed.ac.uk>
Date: Tue, 26 Jun 2007 10:23:04 +0100 (BST)
To: Martin Duerst <duerst@it.aoyama.ac.jp>, "Grosso, Paul" <pgrosso@ptc.com>
Cc: <public-iri@w3.org>, "Richard Ishida" <ishida@w3.org>, "Felix Sasaki" <fsasaki@w3.org>, <www-xml-linking-comments@w3.org>, <public-xml-core-wg@w3.org>, public-i18n-core@w3.org
Message-Id: <20070626092304.242DF2278D9@macpro.inf.ed.ac.uk>

Can I clarify the status of some characters of the characters Martin
listed, please?

> http://www.w3.org/TR/REC-xml/#charsets allows (although, at least
> in never versions, discourages):
> [#xFDD0-#xFDDF],
> [#x1FFFE-#x1FFFF], [#x2FFFE-#x2FFFF], [#x3FFFE-#x3FFFF],
> [#x4FFFE-#x4FFFF], [#x5FFFE-#x5FFFF], [#x6FFFE-#x6FFFF],
> [#x7FFFE-#x7FFFF], [#x8FFFE-#x8FFFF], [#x9FFFE-#x9FFFF],
> [#xAFFFE-#xAFFFF], [#xBFFFE-#xBFFFF], [#xCFFFE-#xCFFFF],
> [#xDFFFE-#xDFFFF], [#xEFFFE-#xEFFFF], [#xFFFFE-#xFFFFF],
> [#x10FFFE-#x10FFFF]
> 
> In the IRI spec, these are excluded:
>    ucschar        = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF
>                   / %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD
>                   / %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD
>                   / %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD
>                   / %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD
>                   / %xD0000-DFFFD / %xE1000-EFFFD

I see XML discourages FDD*, but the ucschar excludes both FDD* and
FDE*.  Does anyone know the reason for this discrepancy?  FDE* seem to
be also "not a character".

ucschar also excludes E0***, which seem to be "tags" - what does that
mean?

ucschar also exclude FFF*, but XML makes no mention of them, except
of course FFFE and FFFF which aren't allowed in XML at all.

-- Richard
Received on Tuesday, 26 June 2007 09:23:29 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:21:35 GMT