W3C home > Mailing lists > Public > w3c-wai-gl@w3.org > April to June 2005

RE: Proposal for Guideline 1.1 [definition of text]

From: Christophe Strobbe <christophe.strobbe@esat.kuleuven.be>
Date: Tue, 24 May 2005 19:01:43 +0200
Message-Id: <>
To: <w3c-wai-gl@w3.org>

Hi Gregg and all,

At 17:02 24/05/2005, Gregg Vanderheiden wrote:
If I remember right - the reason for referencing unicode is that it is not
only important that the text be in characters but that they be in a form
that can be machine readable by standard (not proprietary) machines/

Unicode was the standard code - so it was cited.

Your definition does not disallow proprietary text code. Can you think of a
way other than citing Unicode to make sure that the characters are encoded
in - well - Unicode?

No, I cannot think of any other way than citing Unicode (or ISO/IEC 10646).
The loophole for proprietary text code which you claim to see in my
definition is probably rather theoretical. Text that uses a proprietary
encoding or a proprietary character set cannot be read by anyone
(regardless of disability) except the producer of the text. Such text
would only be put on public web sites by accident because it is quite
useless to anyone outside the organisation.
If we really must require that text only consists of characters that
are defined in the Unicode standard, it is important to maintain the
reference to the "Unicode / ISO/IEC 106464 repertoire" in the definition
(as Wendy's proposal does) because of a loophole in Unicode.
Unless I misunderstand the Unicode specification, it is not possible to
disallow "proprietary text code" by requiring "Unicode", because Unicode
defines "private use areas" where Unicode implementations may put
whatever characters they want:
"Private-use code points are considered to be assigned characters, but the
abstract characters associated with them have no interpretation specified
by this standard. They can be given any interpretation by conformant
processes." [4]
"Private-use characters are assigned Unicode code points whose interpretation
is not specified by this standard and whose use may be determined by private
agreement among cooperating users. These characters are designated for
private use and do not have defined, interpretable semantics except by
private agreement.
All code points in the blocks of private-use characters in the Unicode Standard
are permanently designated for private use — no assignment to a particular,
standard set of characters will ever be endorsed or documented by the
Unicode Consortium for any of these code points." [5]
Private use areas are meant to be used in closed systems only, so the
"Unicode repertoire" is important if we want to keep the reference to Unicode.

Whatever we decide regarding Unicode, I would still avoid the reference
to XML 1.1.

Are we trying to allow things to be encoded in something other than Unicode?
(i.e. were the crafters of this SC missing something?)

The discussion was about the *set of characters* (not to be confused with
"encoding") that can be used in *text* (not markup).
Unicode defines both a "character repertoire" or character set (to which new
characters are added with each new version) and a number of "encoding forms"
(UTF-8, UTF-16 and UTF-32, which don't change when new characters are added to
the repertoire).



[4] The Unicode Standard, Version 4: chapter 3: "Conformance". 
[5] The Unicode Standard, Version 4: chapter 15: "Special Areas and Format 
Characters" (15.7: Private Use Characters). 

>  -- ------------------------------
>Gregg C Vanderheiden Ph.D.
>Professor - Ind. Engr. & BioMed Engr.
>Director - Trace R & D Center
>University of Wisconsin-Madison
>-----Original Message-----
>From: w3c-wai-gl-request@w3.org [mailto:w3c-wai-gl-request@w3.org] On Behalf
>Of Christophe Strobbe
>Sent: Tuesday, May 24, 2005 9:29 AM
>To: w3c-wai-gl@w3.org
>Subject: Re: Proposal for Guideline 1.1 [definition of text]
>At 04:35 27/04/2005, Wendy wrote:
> >Attached is an html file with the issue summary for Guideline 1.1 as well
>as proposed text for the guideline and related definitions.
> >(...)
>        Proposed definition: A sequence of characters. Characters are
>    those included in the Unicode / ISO/IEC 106464 repertoire. Refer to
>    Characters (in Extensible Markup Language (XML) 1.1) for more
>    information about the accepted character range.
>        [@@what about functional text content? e.g., links?] [@@refer to
>    XML 1.0 or 1.1 - Christophe felt 1.0 is safer, but yet it's dated
>    and not as "internationalized" - ala Richard's talk at the Technical
>    Plenary]
>        Current definition: none
>My reasons for avoiding a reference to XML 1.1 are the following:
>- XML 1.1 is not backward compatible with XML 1.0: XML 1.1 requires that
>control characters #x7F through #x9F, which were freely allowed in XML 1.0
>documents, now must also appear only as character references, and it allows
>the control characters #x1 through #x1F, most of which are forbidden in XML
>1.0, through the use of character references.
>- Because XML 1.1 is not backward compatible, the XML WG recommended that
>"that applications that produce XML documents keep using XML 1.0 as much as
>possible, and only use XML 1.1 when necessary" [1].
>- XML 1.0 is less "internationalised" only in important parts of XML such as
>element and attribute names, enumerated attribute values, or processing
>instruction targets, but characters that were not present in Unicode 2.0
>(the Unicode version that XML 1.0 referred to) can be used in XML 1.0
>character data. Since we are talking about the definition of text (i.e. not
>markup such as element or attribute names) there is no need to refer to XML
>1.1. More importantly, because of the compatibility issue, I would like to
>avoid the impression that WCAG recommends XML 1.1 over XML 1.0.
>If there are concerns about the accepted character range in a document,
>these can be covered by guideline 4.1 (use technologies according to
>Joe Clark [2] and Mike Barta [3] have argued that there is no need to as to
>require that characters exist in the Unicode standard.
>The following definition avoids both the references to XML 1.0/1.1 and
>      Proposed definition:
>      Any sequence of characters that exist in the writing systems of
>      the world's natural languages.
>Note: this does not mean that programming code is not text, because
>programming code is written with characters that already exist in a writing
>        Proposed definition: Unicode is a universal character set that
>    defines all the characters needed for writing the majority of living
>    languages in use on computers. For more information refer to the
>    Unicode Consortium or to Tutorial: Character sets & encodings in
>    XHTML, HTML and CSS produced by the W3C Internationalization
>    Activity. [Additional optional clarification: This does not mean
>    that all documents should be encoded in Unicode. It means that
>    documents should only contain characters defined by Unicode. Any
>    encoding may be used for your document as long as it is properly
>    declared and is a subset of the Unicode repertoire. ]
>        Current definition: none.
>I find the inclusion of the additional clarification useful.
>I apologise for the late response; I hope these comments are still useful.
>Best regards,
>[1] http://www-128.ibm.com/developerworks/xml/library/x-xmlns11.html
>[2] http://lists.w3.org/Archives/Public/w3c-wai-gl/2005AprJun/0211.html
>[3] http://lists.w3.org/Archives/Public/w3c-wai-gl/2005AprJun/0320.html

Christophe Strobbe
K.U.Leuven - Departement of Electrical Engineering - Research Group 
on  Document Architectures
Kasteelpark Arenberg 10 - 3001 Leuven-Heverlee - BELGIUM
tel: +32 16 32 85 51
Received on Tuesday, 24 May 2005 17:02:25 UTC

This archive was generated by hypermail 2.4.0 : Thursday, 24 March 2022 21:07:39 UTC