W3C home > Mailing lists > Public > w3c-wai-gl@w3.org > July to September 2004

Re: [Issue 673] Proposed definitions for text, Unicode, non-text content

From: Wendy Chisholm <wendy@w3.org>
Date: Thu, 16 Sep 2004 13:20:01 -0400
Message-ID: <4149CB41.8080205@w3.org>
To: Martin Duerst <duerst@w3.org>
Cc: wai-gl <w3c-wai-gl@w3.org>, Richard Ishida <ishida@w3.org>

Hello Martin and Richard,

Thank you for your quick responses. 

>> Proposed definitions to address issue 673 [1]. Notes and references 
>> at [2].  These are not perfect, but lay the basis for tomorrow's 
>> teleconference.
>> text
>> A sequence of characters included in the Unicode character set. Refer 
>> to Characters in Extensible Markup Language (XML) 1.0 (Third Edition) 
>> for more specific information about the accepted character range.
> I think it would be better to define 'text' as a sequence of characters,
> and then say that characters are those included in Unicode, rather than
> to do all this in a single sentence, in order to separate the different
> issues. The first part ('text is a sequence of characters') is the 'real'
> definition, the second part is pinning down the term 'character' with
> some operational means. You could also just refer to e.g. a definition
> of 'character' in an abstract sense.
You describe the approach taken in the XML specification. I chose not to 
do it that way (and to refer to the XML definition) because WCAG 2.0 is 
a less technical document than XML. Thus, for our audience I think this 
makes sense. Is there a technical issue with doing it this way?

> Also, for the accepted character range, you might want to point
> to a specific production, production [2]. But you then have the
> problem that this also includes a lot of unassinged codepoints,
> which I'm not sure you want to include.
No, we don't want to include unassigned codepoints.  My understanding of 
the XML spec is that it excludes these and that is why I propose 
referencing the XML spec.  In response to Richard's comment, I propose 
referencing XML 1.1 instead of 1.0: 

Is this correct?

>> In this document, we use "Unicode" to refer to the Unicode character set
> Unicode
> Please don't use the term 'character set' as such. It has been misused
> too often. Better e.g. use "coded character set".
How about:
Unicode: "Unicode provides a unique number for every character, no 
matter what the platform, no matter what the program, no matter what the 
language." The Unicode Consortium 
There are at least three possible encodings for Unicode, UTF-8/16/32. 
[not sure if we need to mention encodings?]


wendy a chisholm
world wide web consortium
web accessibility initiative
Received on Thursday, 16 September 2004 17:21:10 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:59:33 UTC