W3C home > Mailing lists > Public > www-validator@w3.org > May 2013

Re: [VE][html5] Add Subject Here Getting an error for using Unicode PUAs!

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Fri, 03 May 2013 17:42:05 +0300
Message-ID: <5183CCBD.5030109@cs.tut.fi>
To: Anon SU <anonymous84327@gmail.com>
CC: www-validator@w3.org
2013-05-03 2:04, Anon SU wrote:

> I'm getting the following error: *Document uses the Unicode Private Use
> Area(s), which should not be used in publicly exchanged documents.
> (Charmod C073)*

It is a warning, not an error message. A minimal document that triggers 
the warning is

<!DOCTYPE html>

As far as I can see, there is nothing in the HTML5 CR or in the WHATWG 
Living HTML document that justifies the warning. I cannot find any 
statement about the allowed set of characters in HTML serialization. For 
XHTML serialization, generic XML rules apply, and they do not disallow 
Private Use characters (on the contrary, the explicit rule for allowed 
characters allows them, and there is no recommendation against them 
either in XML, as far as data characters are considered).

> Why shouldn't Unicode PUA be used? What's wrong with them??

Apparently "Charmod" in the message refers to "Character Model for the 
World Wide Web 1.0: Fundamentals", http://www.w3.org/TR/charmod/ which 
is a W3C Recommendation and contains clause 4.5 about Private Use code 
points. There item C073 says: "Publicly interchanged content SHOULD NOT 
use codepoints in the private use area." This is farely natural on the 
basis of the very concept of Private Use: private use code points are 
meaningless outside the scope of a private agreement, and different 
agreements may have different definitions for them.

However, "Charmod" is about the WWW, and HTML5 is not limited to the 
WWW. So the warning should be read as relating to possible use of a 
document on the WWW or in other public interchange.

> I'm using font-based icons from IcoMoon ( http://icomoon.io/app/ ).

Well, they shouldn't use Private Use code points.

Checking what the validator http://validator.w3.org/nu/ says about some 
some points, I made the following observations:

Code points U+0005, U+000B, U+000E, U+007F, U+0086, U+FDD0, U+FFFE are 
reported as "forbidden", in error messages. I cannot find a 
justification for this in HTML5 CR for HTML serialization. (I can see 
many reasons why they *should* be avoided and perhaps even be made 
forbidden, but that's a different issue.).

For XHTML serialization, the report is partly correct, but U+007F, 
U+0086, U+FDD0 are not forbidden in XML, just discouraged.

Received on Friday, 3 May 2013 14:42:32 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 14:18:08 UTC