- From: Gregg Vanderheiden <gv@trace.wisc.edu>
- Date: Tue, 24 May 2005 10:02:43 -0500
- To: "'Christophe Strobbe'" <christophe.strobbe@esat.kuleuven.be>, <w3c-wai-gl@w3.org>
If I remember right - the reason for referencing unicode is that it is not only important that the text be in characters but that they be in a form that can be machine readable by standard (not proprietary) machines/ software. Unicode was the standard code - so it was cited. Your definition does not disallow proprietary text code. Can you think of a way other than citing Unicode to make sure that the characters are encoded in - well - Unicode? Are we trying to allow things to be encoded in something other than Unicode? (i.e. were the crafters of this SC missing something?) Thanks. Gregg -- ------------------------------ Gregg C Vanderheiden Ph.D. Professor - Ind. Engr. & BioMed Engr. Director - Trace R & D Center University of Wisconsin-Madison -----Original Message----- From: w3c-wai-gl-request@w3.org [mailto:w3c-wai-gl-request@w3.org] On Behalf Of Christophe Strobbe Sent: Tuesday, May 24, 2005 9:29 AM To: w3c-wai-gl@w3.org Subject: Re: Proposal for Guideline 1.1 [definition of text] Hi, At 04:35 27/04/2005, Wendy wrote: >Attached is an html file with the issue summary for Guideline 1.1 as well as proposed text for the guideline and related definitions. >(...) <blockquote> text Proposed definition: A sequence of characters. Characters are those included in the Unicode / ISO/IEC 106464 repertoire. Refer to Characters (in Extensible Markup Language (XML) 1.1) for more information about the accepted character range. [@@what about functional text content? e.g., links?] [@@refer to XML 1.0 or 1.1 - Christophe felt 1.0 is safer, but yet it's dated and not as "internationalized" - ala Richard's talk at the Technical Plenary] Current definition: none </blockquote> My reasons for avoiding a reference to XML 1.1 are the following: - XML 1.1 is not backward compatible with XML 1.0: XML 1.1 requires that control characters #x7F through #x9F, which were freely allowed in XML 1.0 documents, now must also appear only as character references, and it allows the control characters #x1 through #x1F, most of which are forbidden in XML 1.0, through the use of character references. - Because XML 1.1 is not backward compatible, the XML WG recommended that "that applications that produce XML documents keep using XML 1.0 as much as possible, and only use XML 1.1 when necessary" [1]. - XML 1.0 is less "internationalised" only in important parts of XML such as element and attribute names, enumerated attribute values, or processing instruction targets, but characters that were not present in Unicode 2.0 (the Unicode version that XML 1.0 referred to) can be used in XML 1.0 character data. Since we are talking about the definition of text (i.e. not markup such as element or attribute names) there is no need to refer to XML 1.1. More importantly, because of the compatibility issue, I would like to avoid the impression that WCAG recommends XML 1.1 over XML 1.0. If there are concerns about the accepted character range in a document, these can be covered by guideline 4.1 (use technologies according to specification). Joe Clark [2] and Mike Barta [3] have argued that there is no need to as to require that characters exist in the Unicode standard. The following definition avoids both the references to XML 1.0/1.1 and Unicode: text Proposed definition: Any sequence of characters that exist in the writing systems of the world's natural languages. Note: this does not mean that programming code is not text, because programming code is written with characters that already exist in a writing system. <blockquote> unicode Proposed definition: Unicode is a universal character set that defines all the characters needed for writing the majority of living languages in use on computers. For more information refer to the Unicode Consortium or to Tutorial: Character sets & encodings in XHTML, HTML and CSS produced by the W3C Internationalization Activity. [Additional optional clarification: This does not mean that all documents should be encoded in Unicode. It means that documents should only contain characters defined by Unicode. Any encoding may be used for your document as long as it is properly declared and is a subset of the Unicode repertoire. ] Current definition: none. <blockquote> I find the inclusion of the additional clarification useful. I apologise for the late response; I hope these comments are still useful. Best regards, Christophe [1] http://www-128.ibm.com/developerworks/xml/library/x-xmlns11.html [2] http://lists.w3.org/Archives/Public/w3c-wai-gl/2005AprJun/0211.html [3] http://lists.w3.org/Archives/Public/w3c-wai-gl/2005AprJun/0320.html
Received on Tuesday, 24 May 2005 15:02:45 UTC