- From: Oskar Welzl <lists@welzl.info>
- Date: Thu, 10 Nov 2005 23:01:17 +0100
- To: www-html@w3.org
Dear members of www-html, I was really afraid I'd be at least 2 years late to re-discuss @hreflang in XHTML 2 now. Browsing the lists' archives every now and then, though, I see there are others sharing my concerns. So sat down and tried one more effort on Why @hreflang Should Be Handled As In HTML 4.01 =============================================== (by me, written in a foreign language late at night. Please be indulgent.) In short, it shouldn't change to what the current public working draft proposes (multi-value, changing the accept-language request header) because - it's superfluous that way - it's difficult to handle with CSS - it's http only and denies XHTML's use as a general markup language - it leads to bad user experience - it's based on a questionable concept that is hardly used - browser behaviour isn't specified As an alternative, I propose @hreflang to be used the same way it was meant in HTML 4.01 plus the introduction of an additional @acceptlang (or @getlang) to satisfy those who really, really want it the way the XHTML 2.0 draft reads now. The Details: A.) What Do We Have (HTML 4.x, XHTML 1.x)? @hreflang is metadata. A user agent will normally not act upon it, it's meant as additional information for the human reader. Examle: <a href="http://members.aon.at/neumair/index_de.htm" hreflang="de">Bed&Breakfast</a> It says: "This link will take you to a site about a B&B, but beware: the document you'll get is in German, so you might not understand it." (Actually, it says: "When the author of the page you're reading last checked this link, the document was in German." This will be important later.) Firefox displays this nicely in the properties of the link. More likely, there will be a stylesheet that does something like a[hreflang|="de"]:after { content: " [German]"; vertical-align: super;} All of this isn't vital, as meta-information usually isn't. B.) What Will We Get (XHTML 2, Public Working Draft 27 May 2005)? @hreflang changes the UA's accept-language request header - provided we're using http. It is not metadata any more, it actively influences which document we get. <a href="http://members.aon.at/neumair/" hreflang="de">Info in German</a> <a href="http://members.aon.at/neumair/" hreflang="en">Info in English</a> Useful. There's even more: <a href="http://members.aon.at/neumair/" hreflang="de;q=0.5, en">Info</a> Tell the server what we prefer, in order. Even more useful. We even have the former metadata-thingie back in XHTML 2, even though the markup is a bit painful: <meta about="http://members.aon.at/neumair/index_de.htm" property="dc:language" content="de"/ > <a href="http://members.aon.at/neumair/index_de.htm">Bed&Breakfast</a> Perfect, isn't it? - No. Not at all. C.) Why I'm Not Happy With It. 1.) It's superfluous. The way XHTML 2 uses @hreflang is elegant, but superfluous. In most cases, it simply needn't be there. If I (=the web author) link to a resource, I usually know wich languages it is available in and which of these languages I want. So in the above examples, XHTML2's <a href="http://members.aon.at/neumair/" hreflang="de">Info in German</a> is equal to <a href="http://members.aon.at/neumair/index_de.htm">Info in German</a> The new @hreflang isn't needed, it's all in the @href. Oh, wait a minute, not all of it: The new @hreflang is to be a multi-value attribute. <a href="http://members.aon.at/neumair/" hreflang="de;q=0.5, en">Info</a> is new to XHTML2 and can't be expressed without the proposed new meaning of @hreflang. Right. In this case, frankly, I doubt why anybody would want to do this. Either the author relies on content negotiation and doesn't care about the language or he does care about the language. If he does care, he will point to a specific language version (using whatever method), not to 2 or 3 languages. (As I re-read this, there's one single situation in which the new version could make sense, but for other reasons I doubt it's practical: In combination with @hreftype, expressing something like "I need either mime-type a or mime-type b, but exactly this one language") 2.) It's difficult to handle with CSS Given a multi-value @hreflang containing, among others, "de-at de-ch en-us en-gb ....": How do I match this in CSS? [hreflang|="en"] will not work, neither will [hreflang~="en"]. CSS and (X)HTML should work well together. Not beeing able to CSS-style a document depending on an attribute value is a huge drawback and must be traded for an even bigger advantage - which I don't see... (And, of course: The only way in XHTML 2 to express the old metadata-meaning of @hreflang is: <meta about="http://members.aon.at/neumair/index_de.htm" property="dc:language" content="de"/ > <a href="http://members.aon.at/neumair/index_de.htm">Bed&Breakfast</a> Try to do CSS on this one...) 3.) It's http only Being very picky (and believe me, you're not the only one who hates me for this), I still believe that XHTML should - as far as possible - be agnostic of dirty networking stuff it might float around in. It's a principle. XHTML is a markup language. XHTML2-documents will end up on CD-ROM and link to other offline resources or non-negotiating services. With the new @hreflang relying on content negotiation, the attribute is useless as soon as the target is file:///something or telnet://thismachine. (The old metadata-version was useful when pointing to a telnet-service, indicating that its user interface is this or that language.) Even worse and indeed dangerous: Navigation within a set of documents using relative links will produce different results, depending on what media you're retrieving the documents from. Browse them online (http) and the <....href="next" hreflang="en">next</a> will take you to the correct place. Do the same thing offline (CDROM, harddisc), and you could get a different document. This is, from my point of view, one of the most severe issues. 4.) It leads to bad user experience Example: The author of a german-language website links to a site that is available in japanese and english. He proudly uses the new @hreflang to indicate he wants the english version, as he assumes his german readers usually know english better than japanese. Then he forgets about this link for a while. In the meantime, the site he linked to gets translated to german, a third version becomes available. The human reader will still be directed to the english version, even though a german version he'd feel more comfortable with would be there. He might never learn there's a german version at all. With @hreflang as it is now (HTML 4), the author would still mark the link as english (@hreflang="en"). The reader would quickly find that this (meta)information was wrong but - who cares! It's only a change for the better. (Same if I thought I'd had to communicate in english on this mailing-list only to find out that everybody prefers german.) 5.) It's Based On a Questionable Concept That Is Hardly Used I don't have exact figures, but I made a quick check and examined a few sites that do have multi-language versions. I chose sites I regularly visit and websites of bigger companies. Only two of them use language negotiation: hotmail and google (google only to display a tiny link to the real localized version). This might have several reasons, but one benefit: Language negotiation is a questionable concept anyway, and the lesser it's getting used, the better. Content negotiation in general is machine-to-machine communication. The natural language that the human user might prefer/understand is generally unknown to the user agent. People are meant to set this option in the configuration dialogue, but how many really do? And even if they do: Does the person who uses the browser right now prefer the same language as the person who configured it? It's reasonable to ask why there should be an attribute that completely relies on a 'broken-by-design' feature hardly used in real life. 6.) Browser Behaviour Isn't Specified The spec now reads: "The user agent must use this list as the field value of the accept-language request header when requesting the resource using HTTP." - My interpretation is: The UA must do this _when following the link_. What about the user interaction following immediately afterwards? Take the example from point 4.: The link forces my browser to use an accept-language request header of "en" even though I prefer german and the website is available in german. What if I follow a link on this website? Like I click on a headline to read the article? Will I get the english version that corresponds to the english headline? Or will my browser suddenly suprise me by presenting a german text? What if I make my browser re-load the page right after I clicked the link that got me there? Will I still get the english version? Or will my browser re-load the document based on its URI and my personal accept-language request header, resulting in the german version of the document? (Which would not be what 'reload' means to me; on the other hand, I don't see any other way to get the german version and correct the 'broken' link.) Thank you for reading this to the end. Sorry it's so long, but I really do believe this is an important and underrated issue; I'd highly appreciate any input. Regards, Oskar
Received on Friday, 11 November 2005 06:22:44 UTC