- From: Wendy Chisholm <wendy@w3.org>
- Date: Thu, 16 Sep 2004 13:20:01 -0400
- To: Martin Duerst <duerst@w3.org>
- Cc: wai-gl <w3c-wai-gl@w3.org>, Richard Ishida <ishida@w3.org>
Hello Martin and Richard,
Thank you for your quick responses.
>> Proposed definitions to address issue 673 [1]. Notes and references
>> at [2]. These are not perfect, but lay the basis for tomorrow's
>> teleconference.
>>
>> text
>> A sequence of characters included in the Unicode character set. Refer
>> to Characters in Extensible Markup Language (XML) 1.0 (Third Edition)
>> for more specific information about the accepted character range.
>
>
> I think it would be better to define 'text' as a sequence of characters,
> and then say that characters are those included in Unicode, rather than
> to do all this in a single sentence, in order to separate the different
> issues. The first part ('text is a sequence of characters') is the 'real'
> definition, the second part is pinning down the term 'character' with
> some operational means. You could also just refer to e.g. a definition
> of 'character' in an abstract sense.
>
You describe the approach taken in the XML specification. I chose not to
do it that way (and to refer to the XML definition) because WCAG 2.0 is
a less technical document than XML. Thus, for our audience I think this
makes sense. Is there a technical issue with doing it this way?
> Also, for the accepted character range, you might want to point
> to a specific production, production [2]. But you then have the
> problem that this also includes a lot of unassinged codepoints,
> which I'm not sure you want to include.
>
No, we don't want to include unassigned codepoints. My understanding of
the XML spec is that it excludes these and that is why I propose
referencing the XML spec. In response to Richard's comment, I propose
referencing XML 1.1 instead of 1.0:
http://w3.org/TR/2004/REC-xml11-20040204/#charsets
Is this correct?
>> In this document, we use "Unicode" to refer to the Unicode character set
>
> Unicode
>
> Please don't use the term 'character set' as such. It has been misused
> too often. Better e.g. use "coded character set".
>
>
How about:
Unicode: "Unicode provides a unique number for every character, no
matter what the platform, no matter what the program, no matter what the
language." The Unicode Consortium
http://www.unicode.org/standard/WhatIsUnicode.html
There are at least three possible encodings for Unicode, UTF-8/16/32.
[not sure if we need to mention encodings?]
Best,
--wendy
--
wendy a chisholm
world wide web consortium
web accessibility initiative
http://www.w3.org/WAI/
/--
Received on Thursday, 16 September 2004 17:21:10 UTC