RE: Proposal for Guideline 1.1 [definition of text]

If I remember right - the reason for referencing unicode is that it is not
only important that the text be in characters but that they be in a form
that can be machine readable by standard (not proprietary) machines/
software.

Unicode was the standard code - so it was cited.    

Your definition does not disallow proprietary text code.  Can you think of a
way other than citing Unicode to make sure that the characters are encoded
in - well - Unicode?

Are we trying to allow things to be encoded in something other than Unicode?
(i.e. were the crafters of this SC missing something?)

Thanks.


 
Gregg

 -- ------------------------------ 
Gregg C Vanderheiden Ph.D. 
Professor - Ind. Engr. & BioMed Engr.
Director - Trace R & D Center 
University of Wisconsin-Madison 


-----Original Message-----
From: w3c-wai-gl-request@w3.org [mailto:w3c-wai-gl-request@w3.org] On Behalf
Of Christophe Strobbe
Sent: Tuesday, May 24, 2005 9:29 AM
To: w3c-wai-gl@w3.org
Subject: Re: Proposal for Guideline 1.1 [definition of text]


Hi,

At 04:35 27/04/2005, Wendy wrote:
>Attached is an html file with the issue summary for Guideline 1.1 as well
as proposed text for the guideline and related definitions. 
>(...)


<blockquote>
text

       Proposed definition: A sequence of characters. Characters are
   those included in the Unicode / ISO/IEC 106464 repertoire. Refer to
   Characters (in Extensible Markup Language (XML) 1.1) for more
   information about the accepted character range.

       [@@what about functional text content? e.g., links?] [@@refer to
   XML 1.0 or 1.1 - Christophe felt 1.0 is safer, but yet it's dated
   and not as "internationalized" - ala Richard's talk at the Technical
   Plenary]

       Current definition: none
</blockquote>

My reasons for avoiding a reference to XML 1.1 are the following:
- XML 1.1 is not backward compatible with XML 1.0: XML 1.1 requires that
control characters #x7F through #x9F, which were freely allowed in XML 1.0
documents, now must also appear only as character references, and it allows
the control characters #x1 through #x1F, most of which are forbidden in XML
1.0, through the use of character references.
- Because XML 1.1 is not backward compatible, the XML WG recommended that
"that applications that produce XML documents keep using XML 1.0 as much as
possible, and only use XML 1.1 when necessary" [1].
- XML 1.0 is less "internationalised" only in important parts of XML such as
element and attribute names, enumerated attribute values, or processing
instruction targets, but characters that were not present in Unicode 2.0
(the Unicode version that XML 1.0 referred to) can be used in XML 1.0
character data. Since we are talking about the definition of text (i.e. not
markup such as element or attribute names) there is no need to refer to XML
1.1. More importantly, because of the compatibility issue, I would like to
avoid the impression that WCAG recommends XML 1.1 over XML 1.0.

If there are concerns about the accepted character range in a document,
these can be covered by guideline 4.1 (use technologies according to
specification).

Joe Clark [2] and Mike Barta [3] have argued that there is no need to as to
require that characters exist in the Unicode standard.
The following definition avoids both the references to XML 1.0/1.1 and
Unicode:

text
     Proposed definition:
     Any sequence of characters that exist in the writing systems of
     the world's natural languages.

Note: this does not mean that programming code is not text, because
programming code is written with characters that already exist in a writing
system.



<blockquote>
unicode

       Proposed definition: Unicode is a universal character set that
   defines all the characters needed for writing the majority of living
   languages in use on computers. For more information refer to the
   Unicode Consortium or to Tutorial: Character sets & encodings in
   XHTML, HTML and CSS produced by the W3C Internationalization
   Activity. [Additional optional clarification: This does not mean
   that all documents should be encoded in Unicode. It means that
   documents should only contain characters defined by Unicode. Any
   encoding may be used for your document as long as it is properly
   declared and is a subset of the Unicode repertoire. ]

       Current definition: none.
<blockquote>

I find the inclusion of the additional clarification useful.

I apologise for the late response; I hope these comments are still useful.

Best regards,

Christophe



[1] http://www-128.ibm.com/developerworks/xml/library/x-xmlns11.html
[2] http://lists.w3.org/Archives/Public/w3c-wai-gl/2005AprJun/0211.html
[3] http://lists.w3.org/Archives/Public/w3c-wai-gl/2005AprJun/0320.html

Received on Tuesday, 24 May 2005 15:02:45 UTC