W3C home > Mailing lists > Public > www-international@w3.org > January to March 2007

Re: How do I say ???this is not in any language??? in XHTML/HTML

From: Martin Duerst <duerst@it.aoyama.ac.jp>
Date: Wed, 14 Mar 2007 12:24:58 +0900
Message-Id: <6.0.0.20.2.20070314121401.07c0c3a0@localhost>
To: John Cowan <cowan@ccil.org>, Richard Ishida <ishida@w3.org>
Cc: www-international@w3.org

At 05:45 07/03/14, John Cowan wrote:
>
>Richard Ishida scripsit:
>
>>    1. A few years ago we introduced into the XML spec the idea
>>    that xml:lang="" conveys that 'there is no language information
>>    available'. (See 2.12 Language Identification[2])
>> 
>>    2. An alternative is to use the value 'und', for 'undetermined'.
>> 
>>    3. In the IANA Subtag Registry[3] there is another tag, 'zxx', that
>>    means 'No linguistic content'. Perhaps this is a better choice. It
>>    has my vote at the moment.
>
>Rightly so.  The other two choices indicate slightly different flavors
>of ignorance about the content; if you *know* the content is nonlinguistic,
>you should use "zxx".

+1

>> I'm not clear whether the HTML DTD supports an empty string value for
>> lang. If so, the presumably the validator needs to be fixed. If not,
>> then this is not a viable option, since you'd really want both lang
>> and xml:lang to have the same values.
>
>Neither the HTML 4 nor the XHTML 1.0 DTDs permit an empty value for the
>lang attribute; XHTML 1.0 does not permit an empty value for the xml:lang
>attribute either.  IMHO XHTML 1.0 is obsolete in its treatment of xml:lang.
>Whether you want the validator to override the DTD in this respect
>is a question.

Tweaking the validator to accept this even if the DTD doesn't allow
it is a very dangerous start down a slippery slope.

But both HTML4 (or its successor) and XHTML 1.0 (or its successor)
should very clearly be fixed. Richard, can you make sure that the
Core WG contacts the relevant WGs and makes sure that's on their
radar?

But all is not lost for validation. There is an easy fix, using
the internal subset in an XHTML document. Here is an example:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"
 [<!ENTITY % LanguageCode "CDATA">]
>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="" lang="">
  <head xml:lang="" lang="">
    <meta http-equiv="content-type" content="text/html; charset=UTF-8" />
    <title xml:lang="" lang="">Empty lang/xml:lang attribute test</title>
  </head>
  <body xml:lang="" lang="">
    <h1 xml:lang="" lang="">Test Using DTD Tweaking to Allow Empty
      <code>lang</code>/<code>xml:lang</code> Attributes</h1>
    <p xml:lang="" lang="">The HTML and XHTML DTDs don't allow empty
      <code>lang</code>/<code>xml:lang</code> attributes.
      But simply adding [&lt;!ENTITY % LanguageCode "CDATA">]
      as the internal subset (see source of this document)
      makes sure that the document is still valid.</p>
  </body>
</html>

It has empty values all over the place, but validates without
problems. The trick is the line
   [<!ENTITY % LanguageCode "CDATA">]
which overrides the definition of the LanguageCode entity
in the XHTML DTD to something that allows empty values.

Regards,     Martin.



#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp     
Received on Wednesday, 14 March 2007 03:43:06 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:09 GMT