Re: [Issue 673] Proposed definitions for text, Unicode, non-text content from Martin Duerst on 2004-09-09 (w3c-wai-gl@w3.org from July to September 2004)

From: Martin Duerst <duerst@w3.org>
Date: Thu, 09 Sep 2004 09:19:10 +0900
To: wendy@w3.org, wai-gl <w3c-wai-gl@w3.org>
Cc: Richard Ishida <ishida@w3.org>
Message-Id: <4.2.0.58.J.20040909085933.055176d8@localhost>

Hello Wendy,

Some comments below.

At 19:49 04/09/08 -0400, Wendy Chisholm wrote:
>Proposed definitions to address issue 673 [1]. Notes and references at 
>[2].  These are not perfect, but lay the basis for tomorrow's teleconference.
>
>text
>A sequence of characters included in the Unicode character set. Refer to 
>Characters in Extensible Markup Language (XML) 1.0 (Third Edition) for 
>more specific information about the accepted character range.

I think it would be better to define 'text' as a sequence of characters,
and then say that characters are those included in Unicode, rather than
to do all this in a single sentence, in order to separate the different
issues. The first part ('text is a sequence of characters') is the 'real'
definition, the second part is pinning down the term 'character' with
some operational means. You could also just refer to e.g. a definition
of 'character' in an abstract sense.

Also, for the accepted character range, you might want to point
to a specific production, production [2]. But you then have the
problem that this also includes a lot of unassinged codepoints,
which I'm not sure you want to include.

>Unicode
>In this document, we use "Unicode" to refer to the Unicode character set

Please don't use the term 'character set' as such. It has been misused
too often. Better e.g. use "coded character set".

>and not the character encoding

Unicode could be some other things, such as the book, the standard,...
I would therefore remove any negative part of the definition, including
the parenthesis below.

Regards,    Martin.

>(the Unicode character set may be encoded in ASCII, UTF-8, UTF-16, etc. 
>characters in the Unicode set can be created with numeric character 
>references - so there is a clear separation between the character encoding 
>and those characters defined in the unicode character set that we consider 
>"text"- @@provide reference to I18N list?)
>
>non-text content
>Non-text content is content that can not be represented by a Unicode 
>character or sequence of unicode characters. Non-text content includes but 
>is not limited to
>* images and graphics,
>* sound clips, movies, and animations,
>* ASCII art (which may use several unicode characters to create an image)
>Providing text alternatives for non-text content is addressed in Guideline 
>1.2, providing captions and audio descriptions of multimedia is addressed 
>in guideline 1.2, and interacting with non-text content via scripts, 
>applets, and programmatic objects is addressed in guideline 4.2 .
>
>[1] <http://trace.wisc.edu/bugzilla_wcag/show_bug.cgi?id=673>
>[2] <http://www.w3.org/2004/09/wcag-unicode.html>
>
>--
>wendy a chisholm
>world wide web consortium
>web accessibility initiative
>http://www.w3.org/WAI/
>/--

Received on Thursday, 9 September 2004 00:21:35 UTC