W3C home > Mailing lists > Public > public-i18n-geo@w3.org > February 2005

RE: Please review Language techniques changes

From: Deborah Cawkwell <deborah.cawkwell@bbc.co.uk>
Date: Sun, 6 Feb 2005 18:44:28 -0000
Message-ID: <418B7E44473AC34488C9E730D09FF3CF05179024@bbcxue204.national.core.bbc.co.uk>
To: "Richard Ishida" <ishida@w3.org>, "GEO" <public-i18n-geo@w3.org>
Hi Richard & all
 
See my comments below.
 
Best regards
 
Deborah
 
-----------------------------------
 
Authoring Techniques for XHTML & HTML Internationalization: Specifying the language of content 1.0
 
I've made some comments on the first part of this document; but I find myself getting confused despite working on this in my day job & involvement with this document.  :(
 
I think there is still confusion over character encoding & language specification; I recognise that for some these are orthogonal. I think more mention of the former (as a contrast) would be useful, perhaps in terms of 'what this techniques document does not address & why character encoding is out of scope...
 
Great to see character encoding addressed in W3C HTML validation tool. Could they add language of content specification, could flag a warning if missing as we say it's not necessarily bad? 
 
Some of my comments are very minor.
 
I would like to re-address the structure of the document, see revised table of contents (could be fleshed out further) below. Generally I think shorter sub-titles, though the longer sentences were maybe interim? I think the functions of a TOC are:
  1) learning, in terms of pyramid model, where user gains overview & can drill down.
  2) scope
  3) navigation, where user can quickly go to the info needed to solve a particular problem or fill a knowlege gap.
-------------------
Table of Contents
1 Introduction
    1.1 Who should use this document
    1.2 How to use this document
    
2 Technologies
 HTML
 XHTML
 CSS
3 User agents & testing
 Which user agents
 Test approach
 Test document
    
2 Definitions of language in web pages
 Language of content vs character encoding
 Multiple languages in a page vs character encoding
 Definitions of language of content
  Primary language
  Text processing language
  
3 Why specify language of content
  
4 Mechanisms to specify language: what they are for & what they are not for
 Content-language meta tag
 Language attributes
 HTTP header
 
5 How to specify language of content
 Single-language page
 Pages with more than one language
 Language changes within a page
 Language of content of link destination
  Pros & cons
  CSS & hreflang attribute
  National flag icons
 
6 Tips & techniques
 Do not declare the language of a document in the body tag
 Follow the guidelines in RFC3066 or its successors for language attribute values.
    Use the two-letter ISO 639 codes for the language code where there are both 2- and 3-letter codes.
    Consider using the codes zh-Hans and zh-Hant to refer to Simplified and Traditional Chinese, respectively 
 
7 Resources
 Overview documents: developer tasks
 IANA character codes
 
-------------------
NB In my example text, I use the I18N abbreviation which it may be clearer to expand.
 
 
Abstract
 
 + Improved - language clearer & more direct, less jargon.
 + Last sentence of 1st paragraph clumsy - "Marking up language meta information is something that can and should be done today. Without it, none of these applications can be taken advantage of." Suggestion: "Without it, advantage cannot be gained from such current and future applications". 
 
 
Table of contents
 
 + Definitions, which are crucial to this document, should show the sub-sections, eg, Primary language, Text processing language.
 + Section 6 (Specifying language): title seems too vague. Should this not be higher up in the document? Or is it a case of setting out the more abstract concepts, then showing how to implement them? 
 
 
1 Introduction
  
+ Add quotes to the term specified. Easier to read & conventional, ie: The term 'author' is used in the sense described by the HTML 4.01 specification, ie. as a person or program that writes or generates HTML documents. Maybe also quote 'person' & 'program'.
+ How many organisations have localization groups or even anyone with a localization role? Who do we think we address here? Do we address mono-lingual sites of English only? In the later sentence, it would be useful to touch on I18N architecture.  
"This document provides guidance for developers of HTML that enables support for international deployment. Enabling international deployment is the responsibility of all content authors, not just the localization group, and is relevant from the very start of development. Ignoring the advice in this document, or relegating it to a later phase in the development process, will only add unnecessary costs and resource issues at a later date."

1.2 How to use this document
+ Not clear: "Information is also available about the applicability of recommendations to user agents (see below)." Suggestion: "User agent suppport for I18N features is provided." Why 'see below'? 'User agent support' is important to developers & should have its own section; this would make it clearly available in this document from the table of contents.

1.3 Technologies addressed
+ What does 'the right editing tools' mean?
+ This section (below quoted) is too woolly - what are we trying to say?
 + XHTML 1.0 can be served as XML (using MIME types application/xhtml+xml, application/xml or text/xml) or HTML (using the MIME type text/html).
 + This document focuses on the later, as the former is less common & user agent support for XHTML served as XML is still patchy. Serving XHTML as HTML is valid following the compatibility guidelines in Appendix C of the XHTML 1.0 specification.
"Note that XHTML 1.0 source can be served as XML (using MIME types application/xhtml+xml, application/xml or text/xml) or HTML (using the MIME type text/html).
"It is very common for XHTML to be served as HTML, following the compatibility guidelines in Appendix C of the XHTML 1.0 specification. This allows authors with the right editing tools to produce valid XML code, which therefore. HTML represented as valid XML code lends itself to processing with such things as scripting or XSLT, but is also well supported for display by most mainstream browsers. (XHTML served as application/xhtml+xml is not well supported for browser display at the moment.)
"In this document we wish to reflect practical reality for content authors, so we cover XHTML served as text/html in the techniques. Indeed we encourage the use of XHTML, and all the examples (unless trying to make a specific point about HTML 4.01) are written in XHTML.
"For XHTML served as XML, this document limits its advice to documents served as application/xhtml+xml. Note that user agent support for XHTML served as XML is still patchy."

1.4 User agents addressed
+ We should cut to the quick & lose phrases such as "In order to improve the value of this information to the user we try to " & "In an attempt to make manageable the task of tracking browser applicability manageable"; a standard developer approach is to focus on specific browser versions, base & new. Sounds a bit like we're making excuses.
"In order to improve the value of this information to the user we try to ground techniques with information about their applicability to particular user agents. User agents, in the current version of this document, means a number of mainstream browsers. (The scope may grow as resources and test results become available for other user agents.)
"In an attempt to make manageable the task of tracking browser applicability manageable, we have chosen a 'base version' for each of the user agents we are tracking. This base version represents a fairly recent, standards-compliant version of the browser, but nonetheless a version that we might expect many people to be using. Where a browser operates in both standards- and quirks-mode, standards-mode is assumed (ie. you should use a DOCTYPE statement)."
+ Suggestion: 
"User agents in this document means selected mainstream browsers against which we have run I18N tests. We hope to expand this scope particularly to other sorts of user agents, eg, voice, as our resources grow. 
"We have chosen a 'base version' for each of the user agents we are tracking. This base version represents a fairly recent, standards-compliant version of the browser, but nonetheless a version that we might expect many people to be using. Where a browser operates in both standards- and quirks-mode, standards-mode is assumed (ie. you should use a DOCTYPE statement)."
+ Too vague, possibly at odds with the rest of the document. Also, on what basis are techniques applicable/available for immediate use? To draw an analogy with CSS, we may embrace the concept of degrading presentation, but this is confounded by the market dominance of MS IE. In this environment, if a CSS technique, eg rounded corners, is supported by Firefox, but not by IE, can it be used? More importantly, possibly is this acceptable to designers, client commissioners? Re I18N, in this document, we do actually state where a particular technique is supported or not by the dominant browser & where it is important for developers to support a feature, eg, lang tags, to increase more applications of that feature. 
The test icons are really useful.
"Generally, the techniques described will be applicable for immediate use. However we may also recommend things that are not yet widely supported, but are described by the standards, and hopefully will be supported given a little time. Where issues of this kind exist, or other issues related to user agent support, these will be flagged by small graphics immediately after the technique summary:"

2 Definitions
+ I still get confused about these... :(
+ Should have their own sections, visible from the table of contents.
+ The last sentence of the first paragraph should come first, possibly in its own paragraph for emphasis. 
+ Could we call 'primary' language, 'audience' language?
Primary language
    "Primary language is metadata about the document as a whole. Such metadata may be used for searching, serving the right language version, classification, etc. It is not specific enough to indicate the language of a particular run of text in the document for text processing - for example, in a way that would be needed for the application of text-to-speech, styling, automatic font assignment, etc. It typically describes the language of the intended audience of the document."
    
+ Where best to declare primary language confuses me too: I thought previously we had recommended meta over header?
"Primary language metadata is usually best declared outside the document in the HTTP Content-Language header, although there may be situations where an internal declaration using the meta element may be appropriate."
+ I get confused about different language uses, ie, primary & text processing vs the orthogonally related charset & lang attribute.
Text processing language
+ Is there a defining sentence such as for 'primary' language, eg "The 'primary' language typically describes the language of the intended audience of the document"? 
"When specifying the text processing language you are declaring the language in which a particular range of text is actually written, so that user agents or applications that manipulate the text, such as voice browsers, spell checkers, or style processors can effectively handle the text in question. So we are, by necessity, talking about associating a single language with a specific range of text."
+ Could the first sentence of this 'text processing' language section be:
"The text processing language is the language of a particular range of text, which can be different to the primary language, eg where in a phrase book context. Information is required so that user agents or applications can manipulate that text appropriately, eg voice browsers, spell checkers or style processors."

3 Declaring the language of a page
+ Should this be 'how' to declare the language of a page? With another section (preceeding) on 'why' declare the language of a page? 
 
 

	-----Original Message----- 
	From: public-i18n-geo-request@w3.org on behalf of Richard Ishida 
	Sent: Fri 1/28/2005 18:53 
	To: GEO 
	Cc: 
	Subject: Please review Language techniques changes
	
	


	Chaps,
	
	I spent much of today updating the Language techniques doc in an attempt to move it yet closer to publication.
	http://www.w3.org/International/geo/html-tech/tech-lang.html

	
	I made a lot of editorial changes.
	
	I also added two new techniques.
	
	All changes are marked up as revisions.
	
	Please take a look at the changes and send me email about any problems you spot.  Please look in particular at the new techniques. 
	
	Also, please look especially at the editor's notes included.  I would like to discuss these at the next telecon.
	
	Cheers,
	RI
	
	
	============
	Richard Ishida
	W3C
	
	contact info:
	http://www.w3.org/People/Ishida/

	
	W3C Internationalization:
	http://www.w3.org/International/

	
	Publication blog:
	http://people.w3.org/rishida/blog/

	
	
	
	


http://www.bbc.co.uk/


This e-mail (and any attachments) is confidential and may contain
personal views which are not the views of the BBC unless specifically
stated.
If you have received it in error, please delete it from your system. 
Do not use, copy or disclose the information in any way nor act in
reliance on it and notify the sender immediately. Please note that the
BBC monitors e-mails sent or received. 
Further communication will signify your consent to this.
Received on Sunday, 6 February 2005 18:44:30 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:12:39 GMT