Re: [Issue 673] Proposed definitions for text, Unicode, non-text content

Hello Martin and Richard,

Thank you for your quick responses. 

>> Proposed definitions to address issue 673 [1]. Notes and references 
>> at [2].  These are not perfect, but lay the basis for tomorrow's 
>> teleconference.
>>
>> text
>> A sequence of characters included in the Unicode character set. Refer 
>> to Characters in Extensible Markup Language (XML) 1.0 (Third Edition) 
>> for more specific information about the accepted character range.
>
>
> I think it would be better to define 'text' as a sequence of characters,
> and then say that characters are those included in Unicode, rather than
> to do all this in a single sentence, in order to separate the different
> issues. The first part ('text is a sequence of characters') is the 'real'
> definition, the second part is pinning down the term 'character' with
> some operational means. You could also just refer to e.g. a definition
> of 'character' in an abstract sense.
>
You describe the approach taken in the XML specification. I chose not to 
do it that way (and to refer to the XML definition) because WCAG 2.0 is 
a less technical document than XML. Thus, for our audience I think this 
makes sense. Is there a technical issue with doing it this way?

> Also, for the accepted character range, you might want to point
> to a specific production, production [2]. But you then have the
> problem that this also includes a lot of unassinged codepoints,
> which I'm not sure you want to include.
>
No, we don't want to include unassigned codepoints.  My understanding of 
the XML spec is that it excludes these and that is why I propose 
referencing the XML spec.  In response to Richard's comment, I propose 
referencing XML 1.1 instead of 1.0: 
http://w3.org/TR/2004/REC-xml11-20040204/#charsets

Is this correct?

>> In this document, we use "Unicode" to refer to the Unicode character set
>
> Unicode
>
> Please don't use the term 'character set' as such. It has been misused
> too often. Better e.g. use "coded character set".
>
>
How about:
Unicode: "Unicode provides a unique number for every character, no 
matter what the platform, no matter what the program, no matter what the 
language." The Unicode Consortium 
http://www.unicode.org/standard/WhatIsUnicode.html
There are at least three possible encodings for Unicode, UTF-8/16/32. 
[not sure if we need to mention encodings?]


Best,
--wendy

-- 
wendy a chisholm
world wide web consortium
web accessibility initiative
http://www.w3.org/WAI/
/--

Received on Thursday, 16 September 2004 17:21:10 UTC