- From: Karl Dubost <karl@w3.org>
- Date: Fri, 24 Oct 2003 17:57:13 -0600
- To: www-i18n-comments@w3.org
Hi,
this is a few comments with regard to your 1st WD.
ATeXHI 1.0 or Babel Scribe 1.0
Authoring Techniques for XHTML & HTML Internationalization 1.0
First of all, thank you very much for this work it was much needed. I
hope you will have success and good reviews for each of your version.
* QA Spec Guidelines - http://www.w3.org/TR/qaframe-spec
The QA Spec Guidelines are entering in CR phase, which is an
implementation phase for the QA WG. It seems that it will be a
wonderful opportunity for both WG, GEO and QA, to implement these
guidelines and for the QA WG to help and create tools when it's needed.
This following review is not a review against QA Spec Guidelines
I have discussed with Richard Ishida on IRC and he told me that some of
the verbiage was repeating the same principles along the document. The
document to be read by the outline. I Would encourage the editors to
write atomic statement for each feature and to not repeat the same
verbiage BUT to point to these atomic statement from different outlines.
It will be like having modules addressing some problems, and profiles
collecting a set of modules or features applied to specific problems or
readers. It will have the advantage for the editor to be easier to
maintain as well and less confusing for the reader in certain
circumstances.
* Abstract
You limit your scope to XHTML 1.0/HTML 4.01. XHTML 1.1 is already a
specification and includes Ruby, which is an interesting technology for
the Web and I18N. XHTML 2.0 is in development it may be the opportunity
to input more I18N stuff in XHTML and when XHTML 2.0 does not address
certain I18N issues to put them in this document.
* Status
"""These are techniques that need to be
addressed from the start of content development
if unnecessary costs and resource issues
are to be avoided later on."""
It's never too late to improve a Web site or a document. It might be
benefitial to point out that if the site does not respect simple
principles of I18N, it can still improve Step by Step the overall
quality.
See http://www.w3.org/QA/2003/03/web-kit where we mentionned I18N
* 1.3 Standards addressed
"""ote that XHTML source can be served as XML
(using MIME types application/xhtml+xml,
application/xml or text/xml) or HTML
(using the MIME type text/html)."""
It might happen in the future that text/xml be deprecated. There's a
lot of discussion around that. It's at risk.
* 1.4 User agents addressed.
Netscape 7 is a frozen/dead product and will not be developed anymore,
I would encourage to focus on Mozilla more than Netscape.
If you want that your document fresh and evolving with tools, you may
want to choose to compatibility charts outside of your main document.
* 2.1 Internationalizing the page header
The recommendation is good but your example is not very good. If you
serve your document as text/html, you do not need the XML declaration
<?xml version="1.0" encoding="UTF-8"?>
And if you serve it as application/xhtml+xml, there's no need to put
the xml declaration if your document is utf-8 and utf-16, it's even not
recommended, because IE 6 Windows have problem with the xml declaration
and pass in quirks mode when it's here.
It's good to encourage utf-8, and there's an incentive to do it by
saying that if you use utf-8, you don't need to put the xml declaration
and therefore IE 6 will be friendly with you.
"""In case of conflict, the Content-Type
charset declaration and the XML declaration
have precedence over the meta charset statement,
according to the HTML 4.01 and XHTML 1.0
specifications. [Ed. note: Is this true in
practise? esp wrt IE?]"""
See CUAP - http://www.w3.org/TR/cuap. There is the precedence order.
"""Use meta charset declarations as early as possible in the head
element."""
When the browser does not in the http headers the encoding, it will be
necessary to parse the begining of the document to get the encoding
information. As such, it's indeed preferable to have it at the start so
the user agent will be able to display with the correct encoding.
Though it might be useful to test or ask to vendors when do they stop
parsing the header to find this information.
"""For HTML use the lang attribute, and for
XHTML use the lang and xml:lang attributes
in the html tag. """
There's an incentive to use XHTML over HTML by the fact of being able
to smoothen your transition to XHTML 1.1 or XHTML 2.0. In XHTML 1.0,
you can use only xml:lang if you wish and you will have no problems to
switch to XHTML 1.1 or XHTML 2.0 where xml:lang is the only possible
attribute.
One of the reason of using xml:lang or lang attributes in a document,
is the behaviour of CSS rules. For example, in IE5 Macintosh if you put
a "q" element for citation, the quotes will be different depending on
the wrapping language. « blabla » in french, “ blabla ” in english,
etc. You have also rules of selection in CSS 2 depending on the
language too.
Another good point to make, if the document is read by a translating
agent (automatic translation), it will not have to guess the main
language of the language by an heuristic, therefore performance
improvement for processing it.
The meta statement must be compatible with the html element, though
it's not mandatory. I guess the html element should have precedence on
the meta element.
* 2.2 International Layout considerations
right, left and before, after
An interesting issue which appeared when we designed a QA stylesheet
for right/left direction languages. We have small red arrows in the
menu and for languages left to right the arrow points to the right.
Luckily enough the arrow was specified with a before CSS structure and
was in the CSS and not in the HTML with an img element so we have been
able to create another stylesheet for right to left languages. It has
been less painful than having thousands of pages to modify.
Though it's interesting to understand that a simple arrow may have
internationalization problems.
* 3.1 Choose a page encoding
Choose UTF-8 or another Unicode encoding for all content.
- Give the list or a reference to a list of Unicode encodings
"""* Unicode (UTF-8) forms will be easier to migrate to XForms."""
You can add for the reasons I gave before:
* Unicode (UTF-8) forms will be easier to migrate to XHTML 1.1/XHTML
2.0
"""If you don't use a Unicode encoding, select an encoding that best
supports the languages / characters to be included in the page text."""
This is not testable per se. You might recommend: Use an encoding
that supports the languages/characters included in the page text.
"""Check that user agents (all agents that must render the page)
adequately support the page encoding that you have selected. If not,
you might need to use a more widely
supported encoding to achieve an adequate degree of user agent
support."""
It contradicts in a sense a principle of accessibility and of the Web
which says whatever your user agent you should be able to access the
content. Though this said, it doesn't solve the problem. I would not
encourage people to do browser sniffing too, because it challenges
It's the same for the next technique. """Use character sets and
encodings that will be accessible and common to your users.""" when you
recommend such techniques, you have to moderate it by explaining the
constraints/difficulties it might create to other users.
* 3.2 Specifying a page encoding
"""Where practical, declare the page's character
encoding by setting the charset parameter in the
HTTP Content-Type header."""
Not where practical, do that all the time. Each time you have the
opportunity to serve your document with the right encoding in the HTTP
header, just do it. It has the benefit for the user agent to not have
to guess or parse the begining of the HTML document to know how to
display it.
It's not incompatible with specifying inside the document for the
reason you gave, saving locally, etc.
You may give an example for httpd.conf and/or .htaccess for Apache and
an example for Jigsaw
Apache httpd.conf and .htaccess
AddCharset utf-8 .html
you can also do things like
<FilesMatch "/somewhere/europe/*.html">
AddCharset iso-8859-1 .html
</FilesMatch>
Ask to Yves Lafon on the method for Jigsaw.
"""For XHTML served as text/html, where practical use an XML
declaration with an encoding attribute."""
No. When XHTML is served as text/html the XML declaration becomes
completely irrelevant and as I said gives problem to IE6. And you
explain it just after. The visual checking is not a good
recommendation. :) even if it's done often.
* 4.1 Choosing & specifying fonts
"""Do not use <font> tags - use CSS styles instead."""
I see in the Ed Note """Ed. note: Describe the evils of using <font>
to cheat on the charset and represent other
scripts.]""". It would be good to give techniques and examples
how the Webmaster can switch from the use of font to the use of other
techniques.
"""Always use the serif and sans-serif fallbacks"""
to add "In the font property in CSS".
5.3 Specifying the language of a link destination
"""Use the hreflang attribute on the a element."""
It is supported by CSS :)))) You should read my entry about it.
http://www.la-grange.net/2002/09/03#hreflang (french)
CSS rule for it
/* display of the language you linked to */
a[hreflang]:after { content: " [" attr(hreflang) "] "; }
What are the benefits of that?
1) strong usability benefits, the user will know browsing your Web
site what is the language of the ressource your are linking to. Imagine
you are in a document writtent in french and you link to a reference in
english, but some of your readers do not know english at all. They will
not have to follow the link to discover afterward they can't read it.
They save time, and bandwidth.
2) It will be good if the I18N activity review the CUAP note and add
comments to it or new checkpoints. Why? Because you might encourage or
recommend behaviours of user agents. For example, you might recommend
to a user agent which is an automatic translator to respect the
attributes "lang" and "xml:lang" in a document, so it doesn't translate
things which should not (like trying to translate french to french
sometimes... silly. and to use with intelligence the hreflang
attribute. It means in a context where you have this attribute the
automatic translator will know beforehand the main language used if the
user follow the link and will give the possibility to translate
adequatly.
Also for indexing search engines like Google it has the benefits of
knowing the language before to index it and so to be more effective in
indexing the page.
* Do not add dir="rtl" to the body tag.
"""According to the Microsoft article Authoring HTML
for Middle Eastern Content, the following behaviors
can only be expected in Internet Explorer 5 if the
dir attribute is on the html element, rather than the
body element."""
Specify which version of IE mac or windows?
This is it for a first review ;)
--
Karl Dubost - http://www.w3.org/People/karl/
W3C Conformance Manager
*** Be Strict To Be Cool ***
Received on Friday, 24 October 2003 19:57:09 UTC