[whatwg] Valid Unicode from Henri Sivonen on 2006-12-01 (public-whatwg-archive@w3.org from December 2006)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Fri, 1 Dec 2006 18:22:54 +0200
Message-ID: <5D6035EC-82E5-477B-A6BD-4CBE1267DC59@iki.fi>

On Dec 1, 2006, at 14:38, Elliotte Harold wrote:

> 1. Are private use characters allowed?

I think the answer should be "Yes", because not allowing them could  
make people subvert Unicode and use e.g. Latin-1 code points for a  
different purpose with a bogus font. Also, not allowing them would be  
a violation of Charmod requirements for specs.

> 2. Are control characters allowed (probably yes, based on other  
> parts of the spec).

Personally, I'd like to make non-conforming the control characters  
that XML 1.0 disallows (in order to keep conforming HTML5 documents  
convertible to XHTML5) as well as C1 controls (because they have no  
legitimate use in HTML but are a sign of a common bug).

> 3. Are surrogate characters allowed? (probably no)

Surrogates are an artifact of UTF-16. They have no place on the  
character level. So I'd say "No".

> 6. Are noncharacters U+FDD0..U+FDEF allowed (?)
> 7. Are the noncharacters from the last two characters of each plane  
> allowed (?)

I don't have particularly strong feelings here. Putting those  
characters is HTML is a bad idea, but allowing them is not a problem  
for HTML5 to XHTML5 conversion and they aren't a common problem like  
C1 controls.

-- 
Henri Sivonen
hsivonen at iki.fi
http://hsivonen.iki.fi/

Received on Friday, 1 December 2006 08:22:54 UTC