W3C home > Mailing lists > Public > www-i18n-comments@w3.org > October 2003

ATeXHI 1.0 or Babel Scribe 1.0

From: Karl Dubost <karl@w3.org>
Date: Fri, 24 Oct 2003 17:57:13 -0600
To: www-i18n-comments@w3.org
Message-Id: <C9E96576-067D-11D8-80FA-000A95718F82@w3.org>


this is a few comments with regard to your 1st WD.
	ATeXHI 1.0 or Babel Scribe 1.0
Authoring Techniques for XHTML & HTML Internationalization 1.0

First of all, thank you very much for this work it was much needed. I 
hope you will have success and good reviews for each of your version.

* QA Spec Guidelines - http://www.w3.org/TR/qaframe-spec

The QA Spec Guidelines are entering in CR phase, which is an 
implementation phase for the QA WG. It seems that it will be a 
wonderful opportunity for both WG, GEO and QA, to implement these 
guidelines and for the QA WG to help and create tools when it's needed.

This following review is not a review against QA Spec Guidelines

I have discussed with Richard Ishida on IRC and he told me that some of 
the verbiage was repeating the same principles along the document. The 
document to be read by the outline. I Would encourage the editors to 
write atomic statement for each feature and to not repeat the same 
verbiage BUT to point to these atomic statement from different outlines.

	It will be like having modules addressing some problems, and profiles 
collecting a set of modules or features applied to specific problems or 
readers. It will have the advantage for the editor to be easier to 
maintain as well and less confusing for the reader in certain 

* Abstract
	You limit your scope to XHTML 1.0/HTML 4.01. XHTML 1.1 is already a 
specification and includes Ruby, which is an interesting technology for 
the Web and I18N. XHTML 2.0 is in development it may be the opportunity 
to input more I18N stuff in XHTML and when XHTML 2.0 does not address 
certain I18N issues to put them in this document.

* Status
	"""These are techniques that need to be
	addressed from the start of content development
	if unnecessary costs and resource issues
	are to be avoided later on."""

	It's never too late to improve a Web site or a document. It might be 
benefitial to point out that if the site does not respect simple 
principles of I18N, it can still improve Step by Step the overall 
	See http://www.w3.org/QA/2003/03/web-kit where we mentionned I18N

* 1.3 Standards addressed
	"""ote that XHTML source can be served as XML
	(using MIME types application/xhtml+xml,
	application/xml or text/xml) or HTML
	(using the MIME type text/html)."""

	It might happen in the future that text/xml be deprecated. There's a 
lot of discussion around that. It's at risk.

* 1.4 User agents addressed.
	Netscape 7 is a frozen/dead product and will not be developed anymore, 
I would encourage to focus on Mozilla more than Netscape.
	If you want that your document fresh and evolving with tools, you may 
want to choose to compatibility charts outside of your main document.

* 2.1 Internationalizing the page header

The recommendation is good but your example is not very good. If you 
serve your document as text/html, you do not need the XML declaration
	<?xml version="1.0" encoding="UTF-8"?>
And if you serve it as application/xhtml+xml, there's no need to put 
the xml declaration if your document is utf-8 and utf-16, it's even not 
recommended, because IE 6 Windows have problem with the xml declaration 
and pass in quirks mode when it's here.
	It's good to encourage utf-8, and there's an incentive to do it by 
saying that if you use utf-8, you don't need to put the xml declaration 
and therefore IE 6 will be friendly with you.

	"""In case of conflict, the Content-Type
	charset declaration and the XML declaration
	have precedence over the meta charset statement,
	according to the HTML 4.01 and XHTML 1.0
	specifications. [Ed. note: Is this true in
	practise? esp wrt IE?]"""

See CUAP - http://www.w3.org/TR/cuap. There is the precedence order.

	"""Use meta charset declarations as early as possible in the head      
	When the browser does not in the http headers the encoding, it will be 
necessary to parse the begining of the document to get the encoding 
information. As such, it's indeed preferable to have it at the start so 
the user agent will be able to display with the correct encoding. 
Though it might be useful to test or ask to vendors when do they stop 
parsing the header to find this information.

	"""For HTML use the lang attribute, and for
	XHTML use the lang and xml:lang attributes
	in the html tag. """
	There's an incentive to use XHTML over HTML by the fact of being able 
to smoothen your transition to XHTML 1.1 or XHTML 2.0. In XHTML 1.0, 
you can use only xml:lang if you wish and you will have no problems to 
switch to XHTML 1.1 or XHTML 2.0 where xml:lang is the only possible 

	One of the reason of using xml:lang or lang attributes in a document, 
is the behaviour of CSS rules. For example, in IE5 Macintosh if you put 
a "q" element for citation, the quotes will be different depending on 
the wrapping language.  blabla  in french,  blabla  in english, 
etc. You have also rules of selection in CSS 2 depending on the 
language too.
	Another good point to make, if the document is read by a translating 
agent (automatic translation), it will not have to guess the main 
language of the language by an heuristic, therefore performance 
improvement for processing it.
	The meta statement must be compatible with the html element, though 
it's not mandatory. I guess the html element should have precedence on 
the meta element.

* 2.2 International Layout considerations
	right, left and  before, after
	An interesting issue which appeared when we designed a QA stylesheet 
for right/left direction languages. We have small red arrows in the 
menu and for languages left to right the arrow points to the right. 
Luckily enough the arrow was specified with a before CSS structure and 
was in the CSS and not in the HTML with an img element so we have been 
able to create another stylesheet for right to left languages. It has 
been less painful than having thousands of pages to modify.
	Though it's interesting to understand that a simple arrow may have 
internationalization problems.

* 3.1 Choose a page encoding
	Choose UTF-8 or another Unicode encoding for all content.

	- Give the list or a reference to a list of Unicode encodings

	"""* Unicode (UTF-8) forms will be easier to migrate to XForms."""
	You can add for the reasons I gave before:
	* Unicode (UTF-8) forms will be easier to migrate to XHTML 1.1/XHTML 

	"""If you don't use a Unicode encoding, select an encoding that best 
supports the languages / characters to be included in the page text."""

	This is not testable per se. You might recommend:		Use an encoding 
that supports the languages/characters included in the page text.

	"""Check that user agents (all agents that must render the page) 
adequately support the page encoding that you have selected. If not, 
you                                   might need to use a more widely 
supported encoding to achieve an adequate degree of user agent 

	It contradicts in a sense a principle of accessibility and of the Web 
which says whatever your user agent you should be able to access the 
content. Though this said, it doesn't solve the problem. I would not 
encourage people to do browser sniffing too, because it challenges

	It's the same for the next technique. """Use character sets and 
encodings that will be accessible and common to your users.""" when you 
recommend such techniques, you have to moderate it by explaining the 
constraints/difficulties it might create to other users.

* 3.2 Specifying a page encoding

	"""Where practical, declare the page's character
	encoding by setting the charset parameter in the
	HTTP Content-Type header."""

	Not where practical, do that all the time. Each time you have the 
opportunity to serve your document with the right encoding in the HTTP 
header, just do it. It has the benefit for the user agent to not have 
to guess or parse the begining of the HTML document to know how to 
display it.
	It's not incompatible with specifying inside the document for the 
reason you gave, saving locally, etc.

	You may give an example for httpd.conf and/or .htaccess for Apache and 
an example for Jigsaw

	Apache httpd.conf and .htaccess

	AddCharset utf-8 .html

	you can also do things like

	<FilesMatch "/somewhere/europe/*.html">
		AddCharset iso-8859-1 .html

Ask to Yves Lafon on the method for Jigsaw.

"""For XHTML served as text/html, where practical use an XML 
declaration with an encoding attribute."""

	No. When XHTML is served as text/html the XML declaration becomes 
completely irrelevant and as I said gives problem to IE6. And you 
explain it just after. The visual checking is not a good 
recommendation. :) even if it's done often.

* 4.1 Choosing & specifying fonts
	"""Do not use <font> tags - use CSS styles instead."""
	I see in the Ed Note """Ed. note: Describe the evils of using <font> 
to cheat on the charset and represent other                             
       scripts.]""". It would be good to give techniques and examples 
how the Webmaster can switch from the use of font to the use of other 

	"""Always use the serif and sans-serif fallbacks"""
	to add "In the font property in CSS".

5.3 Specifying the language of a link destination

	"""Use the hreflang attribute on the a element."""
	It is supported by CSS :)))) You should read my entry about it.

	http://www.la-grange.net/2002/09/03#hreflang (french)

	CSS rule for it
	/* display of the language you linked to */
	a[hreflang]:after { content: " [" attr(hreflang) "] "; }

	What are the benefits of that?

	1) strong usability benefits, the user will know browsing your Web 
site what is the language of the ressource your are linking to. Imagine 
you are in a document writtent in french and you link to a reference in 
english, but some of your readers do not know english at all. They will 
not have to follow the link to discover afterward they can't read it. 
They save time, and bandwidth.

	2) It will be good if the I18N activity review the CUAP note and add 
comments to it or new checkpoints. Why? Because you might encourage or 
recommend behaviours of user agents. For example, you might recommend 
to a user agent which is an automatic translator to respect the 
attributes "lang" and "xml:lang" in a document, so it doesn't translate 
things which should not (like trying to translate french to french 
sometimes... silly. and to use with intelligence the hreflang 
attribute. It means in a context where you have this attribute the 
automatic translator will know beforehand the main language used if the 
user follow the link and will give the possibility to translate 
	Also for indexing search engines like Google it has the benefits of 
knowing the language before to index it and so to be more effective in 
indexing the page.

* Do not add dir="rtl" to the body tag.
	"""According to the Microsoft article Authoring HTML
	for Middle Eastern Content, the following behaviors
	can only be expected in Internet Explorer 5 if the
	dir attribute is on the html element, rather than the
	body element."""

	Specify which version of IE mac or windows?

This is it for a first review ;)

Karl Dubost - http://www.w3.org/People/karl/
W3C Conformance Manager
*** Be Strict To Be Cool ***
Received on Friday, 24 October 2003 19:57:09 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:20:14 UTC