RE: ISSUE-88 / Re: what's the language of a document ?

> From: Ian Hickson [mailto:ian@hixie.ch]
> Sent: 11 March 2010 01:14
> To: Richard Ishida
> Cc: www-international@w3.org; public-html@w3.org; 'Maciej Stachowiak'
> Subject: RE: ISSUE-88 / Re: what's the language of a document ?
> 
> On Wed, 24 Feb 2010, Richard Ishida wrote:
> >
> > It's significant that the thing we're calling the pragma is a use of a
> > <meta> element.
> 
> It's the <meta> element for historical reasons. I don't think it's
> particularly significant to the discussion at hand,

Then that's probably why we are not in agreement.  We feel that the fact
that it was specified using a mechanism for metadata is exactly the
relevance to the discussion at hand.


> > It's metadata, and the view of the i18n WG is that it should be
> > available for use to specify metadata if you need to do so *in the
> > document*.
> 
> If you need to specify the language, the lang="" attribute seems to
> provide a significantly better solution than this pragma. If it wasn't for
> compatibility with legacy content, I think the pragma would be best
> removed from the language altogether.

I'm confused. If compatibility with legacy content is important (and I
believe it is) why are we talking about changing the pragma to only support
single language values ?

I totally agree that the lang="..." attribute is a significantly better
solution than this pragma for specifying the language of content at the
element level (ie. for applications like spellcheckers, font assignment,
first-letter handling, voice browsers, etc.), because those applications
need to know specifically which language is associated with which piece of
text.  But the pragma is for specifying *metadata* about the languages of
*the intended audience of the document as a whole* (which for a truly
multilingual document is aimed at speakers of more than one language).  This
is why it is useful that lang="..." can and should only support a single
language as it's value, whereas the pragma can and should (as metadata)
support a value with more than one language.  I've said all that several
times before, and several other people are saying the same thing on this
thread, and I'm not sure why seem not to be making those distinctions clear.


> > It's true that a lot of people misunderstood the use of this pragma in
> > the past, but that's what we're trying to clarify here (and btw I've
> > seen evidence that that is changing).
> 
> Can you share this evidence? If people really are learning how to use this
> pragma, that changes matters significantly.

I looked but was unable to find the slides I remember seeing at a conference
- they showed trends in usage of lang vs pragma for declaring the language
of the document - ie. it was showing that people appear to be declaring the
language of content at the element level (not metadata) in the right way.

Wrt the pragma, though, unless you have conclusive evidence that no-one ever
has nor currently intends to use the pragma as a declaration of metadata
about the document, then you cannot change how it works, because you'll
break things. I think one could argue for deprecation of the use of the
pragma going forward, if you were able to convince me that it is not useful
for in-document metadata declarations, but we shouldn't just change the
syntax in a way that would cause problems for people who may have been
following the HTML4 spec in good faith.


> > The i18n WG agrees that authors should be discouraged from using the
> > pragma for the purposes that the lang attribute should be used, but we
> > are also saying that, its use should be *encouraged* for cases where you
> > want to specify metadata inside the document.
> 
> Could you elaborate on what use cases this would be intended to address? I
> don't understand why authors would want to do this.

Some people have already mentioned some of these things on this thread.  I
was thinking about similar things. I'm not expecting Google to use it for
searching, but I can see it being used for things such as searching,
organization and classification of resources in controlled systems, content
management, XML processing (using XHTML), translation tool environments
(where your workflow process needs to know the language of the document as a
whole regardless of the language of the initial text in the document), and
other ways in which metadata is used - particularly when used without the
intermediary of a server.  This is the world beyond the display of content
by a browser.

> 
> Also, note that using http-equiv is not setting metadata. It's setting
> pragma directives for the user agent. If there is a solid use case here
> for document-wide metadata concerning languages, we can certainly handle
> it, but it would be best to handle it using the dedicated metadata
> mechanisms (<meta name>, microdata, RDFa, a dedicated attribute like
> lang="", or some other such mechanism.)

> 
> 
> > And if you are using this to specify metadata, you must allow for
> > multiple values.  What's more, changing the syntax of the pragma to
> > accept only one language is likely to only further confuse people, in
> > the opinion of the i18n WG, since it now appears to be more like the
> > lang attribute, and in addition, the behaviour is different to previous
> > versions of HTML, which further complicates explanations about how to
> > handle language in HTML.
> 
> Previous versions of HTML did not match reality. As such, I don't think
> they're really relevant here.
> 
> Reality is that the http-equiv="Content-Language" value is handled more or
> less as defined in HTML5. It does not provide metadata; it can't handle
> multiple values. When supported at all, it just sets the default for the
> lang="" attribute.

Do you have evidence that it doesn't provide metadata (the fact that people
have misused it to declare the default language of the content is not
evidence that it cannot be used as metadata)?  I don't understand why you
say it doesn't handle multiple values in terms of use as metadata- certainly
multiple values don't make much sense for describing ranges of content at
the element level, but they do for describing the document as an object, and
I'm not aware of any browsers that fall over if you use multiple values for
the pragma.

When talking about metadata we're not talking about support within the
browser or editor for setting things like fonts, voice, etc.  Those
declarations should indeed should be done using the lang attribute. We're
talking about describing the resource object on a server or a CD or in a
workflow or some other system.

> 
> 
> > In addition, we are worried about the effect on legacy data of changing
> > the number of allowed language values for this meta element.  There may
> > not be much out there, but there may also be some, and we felt that this
> > is inconsistent with the efforts of the html folks to maintain backwards
> > compatibility in other areas.
> 
> The goal of maintaining backwards compatibility in this case is exactly
> why multiple languages were dropped and why the meaning of this pragma
> was
> changed from the previous definition to the definition that matched actual
> usage and implementation.

That's backwards compatibility with incorrect usage to describe the language
of a range of content - not backwards compatibility with use as metadata.
The proposals in the Change Proposal were formulated to support the legacy
of incorrect usage, while not prejudicing other types of legacy. 


> 
> 
> > This was why we wanted to talk with you at TPAC and go through the
> > proposals in
> > http://lists.w3.org/Archives/Public/public-html/2009Oct/1086.html (on
> > which the Change Proposal is based), and we left that meeting
> > understanding that you had agreed to the proposals.
> 
> I thought I'd made the changes we agreed to -- apparently we didn't
> understand each other at that meeting!
> 
> 
> > > I recommend going through the normal process for these, by the way
> > > (using bugs and so forth) rather than jumping straight to the Change
> > > Proposal stage. It will help ensure that we keep issues focused.
> >
> > Actually we have been following the process.  Here is the original bug
> > report http://www.w3.org/Bugs/Public/show_bug.cgi?id=8088 which you
> > rejected.
> 
> That bug has a much narrower focus than some of the changes you have
> proposed, as far as I can tell.

I think the core is the same.

RI

Received on Friday, 12 March 2010 17:35:37 UTC