W3C home > Mailing lists > Public > whatwg@whatwg.org > March 2008

[whatwg] several messages about handling encodings in HTML

From: Řistein E. Andersen <html5@xn--istein-9xa.com>
Date: Tue, 04 Mar 2008 01:53:55 +0100
Message-ID: <E1JWLPn-0009u4-Dy@node1-4.ouvaton.local>
On Fri, 29 Feb 2008 01:21:20 +0000 (UTC), Ian Hickson wrote:

> (I've made the characters not allowed in XML also not allowed in HTML, 
> with the exception of some of the space characters which we need to have 
> allowed for legacy reasons.)

The C1 character U+0085 NEXT LINE (NEL) is also a Unicode space character,
and this one is neither disallowed nor discouraged in XML as far as
I can tell.  I am not sure if we really want to support this character, though;
Opera, Safari and Firefox do not seem to recognise it at all, and one IE7
installation seems to treat it as a non-breakable wide space, but this may well
be font-dependent.  (Allowing this character could be confusing given that
&#x85; does not refer to U+0085, but rather to an ellipsis for compatibility
with Windows-1252.)

More importantly, the current draft seems to allow C0 (not only white space) controls
and delete, as well as U+FDD0 to U+FDDF and the non-characters *FE and *FF
when these are expressed as character references.  Would it be possible to
(dis)allow the same set of characters in both cases?

-- 
?istein E. Andersen
Received on Monday, 3 March 2008 16:53:55 UTC

This archive was generated by hypermail 2.3.1 : Monday, 13 April 2015 23:08:40 UTC