Re: Getting beyond the ping pong match (was RE: Cleaning House) from Mark Birbeck on 2007-05-07 (www-html@w3.org from May 2007)

From: Mark Birbeck <mark.birbeck@x-port.net>
Date: Mon, 7 May 2007 14:22:07 +0100
To: public-html@w3.org
Cc: www-html@w3.org
Message-ID: <640dd5060705070622v49f58068x712d5c6b75d154f@mail.gmail.com>
Henri,

> The point is exploring *if* they can be interpreted using existing
> practice as a guide. This is subtly different from just using a
> dictionary: if research showed that a non-word string was
> consistently used to denote something useful, a dictionary would not
> have to be involved. Since it is improbable that the string
> "copyright" would appear accidentally without someone thinking of the
> concept of copyright while writing the string, it is reasonable to
> assume that the string is motivated by the concept.

But the logical error that is being made by the proposal is to
conclude that you are able to _infer_ one author's intent from that of
others. Since there is nothing in the HTML spec that says that
@class="copyright" means _anything_ at a global level, then even if
100 authors use it in the way that is being suggested, you *cannot*
infer from that anything about author 101.

(And that is putting aside Bjoern's perfectly correct points that the
spec doesn't actually say what @class="copyright" means anyway.)



USE CASES AND REQUIREMENTS

Don't get me wrong though--the fact that 'copyright' is an oft-used
class name is still a very useful piece of information. It tells us
that copyright information is something that authors (or at least
their publishing software) are quite prepared to add to their
documents, and so it gives us a clue that if we could come up with a
standard way of allowing this, we'd end up with some very useful
metadata to make use of.

But I seriously worry about what kind of standard will emerge when
this very useful use-case is being used not to tell us that there is a
requirement, but to tell us how to actually provide a solution to that
requirement--i.e., to justify defining that @class="copyright" now has
_global_ semantics when before it didn't.

If we were talking about @rel="copyright" that would be somewhat
different, since @rel indicates a relationship between two documents,
and authors would be consciously using a globally valid technique to
add metadata. But the definition of @class tells us only that it can
be used for semantics, and tells us anything about the values.



INFERRING MEANING

Of course, in some specific environment, it might be useful to infer
that @class="copyright" has meaning. Google, for example, could make
use of this, and show us the text in the search results, from any
element that has this class value. But in this situation, if they get
it wrong, it's not the end of the world. And more than that, they
could use other rules on their servers to process the document and the
element's content, and work out whether some element really is a
copyright message.

But 'inferring' meaning from documents in this way to aid processing
is a far cry from inferring the syntax. And this is because, as others
have said, @class cannot *by definition* be deemed to be unambiguous.



XHTML 2

It's frustrating to see the very discussions that we've had over the
years in the XHTML 2 work now happening all over again, but I guess
that is almost inevitable when politics is such a key part of
standards-making--so there's no point in complaining. :) In this
particular area we also considered tweaking things so that existing
@class values had universal meaning, but in the end we concluded that
all we could say about _existing_ @class values was that they were
'locally defined', i.e., they were private to the author. Since there
was no mechanism for an author to indicate that they had chosen some
'global' value, then it would be incorrect of us to assume that they
had.

However, that is not the case with @rel and @rev, since there are some
predefined values. And it is also not the case with @role, since that
says that non-prefixed values are reserved (to answer Geoffrey's
question). And finally, we felt it was also not the case with values
of @class that look like XML QNames, i.e., are 'qualified'; we felt
that an author using 'dc:creator' as a class value almost certainly
knew what they were writing (since it is currently not a very common
practice), and so it was legitimate to 'infer' something more from
this than some CSS styling rules.



CONCLUSIONS

So firstly, I would say that there is not necessarily anything wrong
with deriving 'universal' meaning from @class values, but we can't
unfortunately say anything about pre-existing values. Going forward,
any values that want to be part of some 'global' dictionary need to be
'qualified' in some way. This has been discussed in this thread, with
one option being to find a unique prefix. That's not a bad solution,
but as every language designer knows, no matter what you are doing,
you very quickly come up against the problem of namespacing. Our
approach was therefore to use an XML namespace style approach, with
the CURIE syntax being proposed to support this:

  <http://www.w3.org/TR/curie/>

But secondly, I would say that using @class rather than the role
attribute to carry values that are about the structure of a document,
could appear to be an example of the 'not invented here' mindset. I'm
sure it's not, but I would urge people to consider @role for this
task, since:

  * @role was created specifically to allow authors to say what an
element's purpose is;

  * it was further motivated by trying to provide an 'unpolluted'
value space so that there
     would be no ambiguities;

  * it is available as a standalone module that can be used in different mark-up
    languages;

  * it has been added to Firefox already.

Regards,

Mark

-- 
  Mark Birbeck, formsPlayer

  mark.birbeck@x-port.net | +44 (0) 20 7689 9232
  http://www.formsPlayer.com | http://internet-apps.blogspot.com

  standards. innovation.
Received on Monday, 7 May 2007 13:22:27 UTC