hreflang

Dear members of www-html,

I was really afraid I'd be at least 2 years late to re-discuss @hreflang
in XHTML 2 now. Browsing the lists' archives every now and then, though,
I see there are others sharing my concerns. So sat down and tried one
more effort on

Why @hreflang Should Be Handled As In HTML 4.01 
===============================================
(by me, written in a foreign language late at night. Please be
indulgent.)

In short, it shouldn't change to what the current public working draft
proposes (multi-value, changing the accept-language request header)
because
- it's superfluous that way
- it's difficult to handle with CSS
- it's http only and denies XHTML's use as a general markup language
- it leads to bad user experience
- it's based on a questionable concept that is hardly used
- browser behaviour isn't specified 

As an alternative, I propose @hreflang to be used the same way it was
meant in HTML 4.01 plus the introduction of an additional @acceptlang
(or @getlang) to satisfy those who really, really want it the way the
XHTML 2.0 draft reads now.

The Details:


A.) What Do We Have (HTML 4.x, XHTML 1.x)?

@hreflang is metadata. A user agent will normally not act upon it, it's
meant as additional information for the human reader. 

Examle:
<a href="http://members.aon.at/neumair/index_de.htm"
hreflang="de">Bed&Breakfast</a>

It says: "This link will take you to a site about a B&B, but beware: the
document you'll get is in German, so you might not understand
it." (Actually, it says: "When the author of the page you're reading
last checked this link, the document was in German." This will be
important later.)
Firefox displays this nicely in the properties of the link. More likely,
there will be a stylesheet that does something like

a[hreflang|="de"]:after	{ content: " [German]"; vertical-align: super;}

All of this isn't vital, as meta-information usually isn't. 


B.) What Will We Get (XHTML 2, Public Working Draft 27 May 2005)?

@hreflang changes the UA's accept-language request header - provided
we're using http. It is not metadata any more, it actively influences
which document we get.

<a href="http://members.aon.at/neumair/" hreflang="de">Info in
German</a>
<a href="http://members.aon.at/neumair/" hreflang="en">Info in
English</a>

Useful. 
There's even more:

<a href="http://members.aon.at/neumair/" hreflang="de;q=0.5,
en">Info</a>

Tell the server what we prefer, in order.
Even more useful.

We even have the former metadata-thingie back in XHTML 2, even though
the markup is a bit painful:

<meta about="http://members.aon.at/neumair/index_de.htm"
property="dc:language" content="de"/ >
<a href="http://members.aon.at/neumair/index_de.htm">Bed&Breakfast</a>

Perfect, isn't it? - No. Not at all.


C.) Why I'm Not Happy With It. 

1.) It's superfluous.

The way XHTML 2 uses @hreflang is elegant, but superfluous. In most
cases, it simply needn't be there. If I (=the web author) link to a
resource, I usually know wich languages it is available in and which of
these languages I want. So in the above examples,
XHTML2's <a href="http://members.aon.at/neumair/" hreflang="de">Info in
German</a> 
is equal to
<a href="http://members.aon.at/neumair/index_de.htm">Info in German</a>
The new @hreflang isn't needed, it's all in the @href.
Oh, wait a minute, not all of it: The new @hreflang is to be a
multi-value attribute. 
<a href="http://members.aon.at/neumair/" hreflang="de;q=0.5,
en">Info</a> is new to XHTML2 and can't be expressed without the
proposed new meaning of @hreflang. 
Right. In this case, frankly, I doubt why anybody would want to do this.
Either the author relies on content negotiation and doesn't care about
the language or he does care about the language. If he does care, he
will point to a specific language version (using whatever method), not
to 2 or 3 languages.
(As I re-read this, there's one single situation in which the new
version could make sense, but for other reasons I doubt it's practical:
In combination with @hreftype, expressing something like "I need either
mime-type a or mime-type b, but exactly this one language")

2.) It's difficult to handle with CSS

Given a multi-value @hreflang containing, among others, "de-at de-ch
en-us en-gb ....":
How do I match this in CSS? [hreflang|="en"] will not work, neither will
[hreflang~="en"]. CSS and (X)HTML should work well together.
Not beeing able to CSS-style a document depending on an attribute value
is a huge drawback and must be traded for an even bigger advantage -
which I don't see...
(And, of course: The only way in XHTML 2 to express the old
metadata-meaning of @hreflang is:
<meta about="http://members.aon.at/neumair/index_de.htm"
property="dc:language" content="de"/ >
<a href="http://members.aon.at/neumair/index_de.htm">Bed&Breakfast</a>
Try to do CSS on this one...)

3.) It's http only

Being very picky (and believe me, you're not the only one who hates me
for this), I still believe that XHTML should - as far as possible - be
agnostic of dirty networking stuff it might float around in. It's a
principle. 
XHTML is a markup language. XHTML2-documents will end up on CD-ROM and
link to other offline resources or non-negotiating services.
With the new @hreflang relying on content negotiation, the attribute is
useless as soon as the target is file:///something or
telnet://thismachine. (The old metadata-version was useful when pointing
to a telnet-service, indicating that its user interface is this or that
language.)
Even worse and indeed dangerous:
Navigation within a set of documents using relative links will produce
different results, depending on what media you're retrieving the
documents from. Browse them online (http) and the <....href="next"
hreflang="en">next</a> will take you to the correct place. Do the same
thing offline (CDROM, harddisc), and you could get a different document.
This is, from my point of view, one of the most severe issues. 

4.) It leads to bad user experience

Example:
The author of a german-language website links to a site that is
available in japanese and english. He proudly uses the new @hreflang to
indicate he wants the english version, as he assumes his german readers
usually know english better than japanese.
Then he forgets about this link for a while.
In the meantime, the site he linked to gets translated to german, a
third version becomes available. The human reader will still be directed
to the english version, even though a german version he'd feel more
comfortable with would be there. He might never learn there's a german
version at all.
With @hreflang as it is now (HTML 4), the author would still mark the
link as english (@hreflang="en"). The reader would quickly find that
this (meta)information was wrong but - who cares! It's only a change for
the better. (Same if I thought I'd had to communicate in english on this
mailing-list only to find out that everybody prefers german.)

5.) It's Based On a Questionable Concept That Is Hardly Used

I don't have exact figures, but I made a quick check and examined a few
sites that do have multi-language versions. I chose sites I regularly
visit and websites of bigger companies. Only two of them use language
negotiation: hotmail and google (google only to display a tiny link to
the real localized version). 
This might have several reasons, but one benefit: Language negotiation
is a questionable concept anyway, and the lesser it's getting used, the
better. Content negotiation in general is machine-to-machine
communication.
The natural language that the human user might prefer/understand is
generally unknown to the user agent. People are meant to set this option
in the configuration dialogue, but how many really do? And even if they
do: Does the person who uses the browser right now prefer the same
language as the person who configured it?
It's reasonable to ask why there should be an attribute that completely
relies on a 'broken-by-design' feature hardly used in real life. 

6.) Browser Behaviour Isn't Specified 

The spec now reads: "The user agent must use this list as the field
value of the accept-language request header when requesting the resource
using HTTP." - My interpretation is: The UA must do this _when following
the link_. What about the user interaction following immediately
afterwards?
Take the example from point 4.: The link forces my browser to use an
accept-language request header of "en" even though I prefer german and
the website is available in german. 
What if I follow a link on this website? Like I click on a headline to
read the article? Will I get the english version that corresponds to the
english headline? Or will my browser suddenly suprise me by presenting a
german text?
What if I make my browser re-load the page right after I clicked the
link that got me there? Will I still get the english version? Or will my
browser re-load the document based on its URI and my personal
accept-language request header, resulting in the german version of the
document? (Which would not be what 'reload' means to me; on the other
hand, I don't see any other way to get the german version and correct
the 'broken' link.)



Thank you for reading this to the end. Sorry it's so long, but I really
do believe this is an important and underrated issue; I'd highly
appreciate any input. 

Regards,

Oskar

Received on Friday, 11 November 2005 06:22:44 UTC