Re: Internationalised HTML

On Sat, 1 Feb 2003, Charles McCathieNevile wrote:

> On the other hand, the names of tags isn't very important - they are
> not meant to be read by people, but by machines (although it is meant
> to be possible for people to read/write them) - and like C source code,
> it isn't very much more meaningful for english speakers.

The element (sic) names - or generic identifiers, to be exact - are meant
to convey a general idea of the meanings of markup constructs and,
besides, they are used in the definition of a markup system, or "language"
as the misleading parlance goes. It is impossible to define what, say,
a <blockquote> element means without referring to the element name. Thus,
element names are meant to be readable and understandable to human beings,
if only to authors. Whether they are written by humans, in the sense of
typing some characters, is relatively immaterial here.

They are important to anyone designing a user style sheet, too. To tell my
browser to highlight all block quotations in some way I like, I need to
use the element name. - Similar considerations apply to attribute names
and keyword-like attribute values.

There are examples of actual confusion around element, attribute, and
value names, caused by the fact that not all people speak the same dialect
of English. The <cite> element is famous: roughly half of people who try
to learn HTML seriously have mistaken it as meaning quotation. Partly
because the descriptions in the specifications have been (and are)
somewhat vague, but largely because of the tag name. To take another
example, British people have often complained about misspellings like
"color" (e.g. in <font color="...">). Using CSS instead of presentational
markup would naturally take this problem out of HTML, but not out of the
authoring world. The are some naming decisions that might look a bit
headless: <head>, <h1>, <thead>, and <th> all reflect the word "head",
in confusingly similar but varying meanings, and <title> elements and
title attributes add to the confusion, and so does <caption>, since they
are heading-like too, but not _called_ headings.

However, I would suggest taking all of this as fait accompli. Something to
be learned from for the future and in some general sense, not something to
be fixed now in HTML. It's seldom a good idea to paint a car while it is
in actual use, moving fast.

> It is now possible, using XML Schemas, to create an xml language where
> the elements can be named/described in multiple languages. Using RDF
> Ontology and Web Services we can expect it to be possible to write our
> own version of "HTML" using whatever tags we like, declaring it's
> relationship to HTML, and have it work

I wonder what that would mean - in the worst case, everyone inventing tags
of his own and "defining" them by providing a style sheet. There's little
need to try and guess how many authors will pay the slightest attention to
anything beyond _their_ fixed idea of how the page should look.

If it just means the possibility of _seeing_ tag and attribute names in
your preferred language, when viewing or editing HTML source, well, _that_
could actually be implemented without much Xfuss and *logies.

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/

Received on Saturday, 1 February 2003 02:46:00 UTC