Re: ISSUE-55: Re-enable @profile in HTML5 (draft 1) from Smylers on 2009-10-12 (public-html@w3.org from October 2009)

From: Smylers <Smylers@stripey.com>
Date: Mon, 12 Oct 2009 08:39:42 +0100
To: Manu Sporny <msporny@digitalbazaar.com>, HTMLWG WG <public-html@w3.org>
Message-ID: <20091012073942.GF31583@stripey.com>
Manu Sporny writes:

> Based on the responses to the e-mails I sent yesterday, it seems as if
> we're using slightly different definitions of "backwards
> compatibility".

Ah.  That's unfortunate.  Thanks for getting to the bottom of this.
Let's use the terms 'behaviour backwards compatibility' and 'validity
backwards compatibility' for each type.

(But beware that when folks in the HTML5 community refer to "backwards
compatibility" they usually mean behaviour backwards compatibility.
'Don't break the web' is an HTML5 motto, meaning 'Don't cause mainstream
browsers when operating on existing webpages to behave differently for
users than they do now'.)

> Jonas responded to this thread by stating:
> 
> "I would personally recommend that RDFa follow the strategy that HTML
> uses" ... "To never break backwards compatibility with existing
> content."

Jonas was advising not to break behaviour backwards compatibility.
Which type of backwards compatibility does RDFa 1.1 wish to break versus
1.0?

> 1. HTML5 doesn't change the behavior of anything in a
>    backwards-incompatible way.
> 2. HTML5 doesn't change the behavior of anything, except for parts of
>    the HTML4 specification that don't match widely-deployed
>    implementations.
> 
> I believe #1 is false, #2 is more accurate

Using a definition of behaviour backwards compatibility, #1 is true but
omits to mention what the baseline for comparison is; it's the behaviour
of existing mainstream browsers.  (#1 is not true if your baseline is
the HTML4 spec, but for HTML5 that isn't the baseline.)

#2 is less accurate because it fails to account for all the behaviour
which mainstream browsers have but simply aren't covered by HTML4; HTML5
needs to be compatible with these as well the portions of HTML4 which
are widely deployed.

> > Whether HTML has depended on the version attribute is a matter for
> > browser developers, not spec writers.  Jonas is a Mozilla developer.
> > If he says that Mozilla doesn't look at version numbers in order to
> > process 'HTML4 pages' differently from 'HTML3 pages' (etc) then he
> > knows what he's talking about.
> 
> I didn't mean to imply that Jonas doesn't know how Firefox treats the
> @version attribute. I'm sure he does and I'd, of course, defer any
> Firefox question to him in this discussion. However, I don't know if
> Jonas was around for the design of HTML 2.0 or HTML 3.2, or HTML
> 4.01...  but I do know that Dan Connolly was around during that time
> frame.

So far as behaviour backwards compatibility is concerned it's what
browsers do which matters; anything which a standards committee intended
but wasn't implemented in practice doesn't affect behaviour backwards
compatibility.

> > >  * There is currently no way for an author to specify that they
> > >    would like their documents to be processed as HTML5 instead of
> > >    HTML6.
> > 
> > That's true, but then HTML6 doesn't exist yet.  HTML6 _may_ be developed
> > with the same goals as HTML5, and as such retain backwards compatibility
> > such that processing an HTML5 document with an HTML6 user agent will
> > yield exactly the same behaviour and output as doing so with an HTML5
> > user agent.  In which case no version specifier is needed.
> > 
> > Or HTML6 may make so many incompatible changes, along XHTML2 lines, that
> > it fails to gain significant market share on the web or in mainstream
> > browsers, in which case no version specifier is needed.
> 
> If you were intending to enumerate all the possibilities, you missed one:
> 
> Or HTML6 may want to make several backwards-incompatible spec changes
> based on authoring use in the field such that default parsing behavior
> matches real-world usage, in which case a version specifier is needed
> to ensure that previously conforming documents continue to be
> conforming after the change.

If a behaviour change is introduced then that would only be behaviour
backwards compatible if no existing webpages had content which relied on
the previous behaviour.  Introducing a new element would be an example
of this.  In HTML5 <rainbow> is a validity error.  Suppose HTML6
introduces a <rainbow> element; this would only be possible because
there aren't existing pages using a <rainbow> tag (and relying on it
having no special behaviour), so it isn't behaviour backwards
incompatible.

That would be validity backwards incompatible, but in practical terms
the validity of existing pages would not be changed: there are not a
bunch of HTML5 pages incorrectly using <rainbow> which would suddenly
start validating.

However if HTML6 merely wants to introduce a validity change (but no
change in behaviour) this could be in one of two directions:

* HTML6 could allow some mark-up which HTML5 deems invalid, for example
  re-introducing <link rev="...">.  In this case it would be because
  it's decided the mark-up is something that authors should be allowed
  to do.

  This change would mean that HTML5 documents which are otherwise valid
  except for a banned use of <link rev> would suddenly change to being
  valid.  This is a good thing.  If the HTML6 designers decide that
  <link rev> is OK for authors to use there is no point in validators
  objecting to it and telling authors not to use it.  In which case
  there's no point in validators giving this advice to _any_ authors.

  It does not make sense for a validator on encountering <link rev> to
  effectively be saying 'This mark-up is compatible with all revelant
  user agents, and has been deemed acceptable by those currently
  designing HTML, but because you haven't made an arbitrary change to
  indicate that this is an HTML6 document I'm going to give you
  erroneous advice and tell you not to use that attribute'.

  So no version indicator is needed.

* Or HTML6 could disallow some mark-up which HTML5 currently considers
  valid.  Perhaps microdata turns out in practice to be a miserable
  failure, causing widespread confusion and is not well supported, so
  it's decided to remove it from the (conforming) language in HTML6.  Or
  perhaps <b> is deemed bad for accessibility and a host of more
  'semantic' elements are introduced to replace it.

  Again, the best advice a validator can give an author if it encounters
  some microdata mark-up or <b> or whatever is not to use that mark-up.
  It doesn't make sense for the validator to fail to pass on that advice
  simply because the document is labelled as HTML5 -- the author should
  still change the mark-up to have a useful HTML document.

  So again no version indicator is needed.

So the only circumstance requiring a version indicator is where there is
a genuine change in behaviour -- such that there is existing content
which requires the old behaviour, and will be new content requiring the
new behaviour, and user agents will have to implement both behaviours
and pick the appropriate one.  Note that in such circumstances:

> > ... retaining backwards compatibility with the current web will
> > require that documents without that attibute (or whatever) be
> > processed according to HTML5 rules.

However much we desire pages to be labelled with version numbers, there
will be some pages which are unlabelled.  And we cannot spec that
browsers should decline to render such pages; that breaks the web.  So
unlabelled pages have to be rendered according to 'current'
expectations.  Those expectations are codified as HTML5.

For evermore user agents will have to interpret unversioned content as
HTML5 (or a future version which has behaviour backwards compatibility
with HTML5).  It is already too late to change this.

So any future behaviour change which requires a version indicator will
have to make th new behaviour depend on that version indicator; it will
not be possible for that indicator to be optional.

As such:

> > ... it'll be as easy to add the attribute then as now, meaning doing
> > it now has no advantage for HTML6.
> 
> No, it won't be easy to add the attribute then because the problem
> will be non-deterministic at that point.
> 
> How will you know if somebody omits the @version tag on an HTML6
> document if you should parse as HTML5 or parse as HTML6?

You will have to parse it as HTML5 [if the two are incompatible].  That
is completely deterministic.

> If we create a rule now that says anything that doesn't have a
> @version attribute will be parsed using the latest rules known to the
> User Agent, we're covered.

We already cannot create such a rule, because it means any future
changes would break existing content which doesn't have a version
identifier.

(Alternatively such a rule would constrain future versions of HTML to
retain compatibility with current behaviour.  In which case a version
identifier is unnecessary, because everything's always compatible.)

> When you stop versioning something - you create a non-deterministic
> problem that can only be solved by /requiring/ a version specifier at a
> later date.

Yup.  If HTML makes a behaviour incompatible change at a later date then
it will also need to introduce a compulsory version specifier at that
date.

Introducing an optional version specifier right now doesn't change that
either way.

> > >  * There is currently no way for an author to specify that their
> > >    document should be processed via extended processing behavior
> > >    using FeatureX version 1.0 instead of FeatureX version 2.0.
> > 
> > True.  But possibly the FeatureX 2.0 spec could define that, rather than
> > there needing to be a general HTML mechanism for it.  Given how
> > undesirable backwards incompatibility is, HTML5 should not be
> > encouraging it or making it easy.
> 
> That's certainly one way to look at it - another is that we don't want
> to create a situation where we have no choice but to continue to live
> with the mistakes we will inevitably make in authoring the HTML5
> specification.

It's possible that we will have no choice but to live with them -- a
large part of the purpose of HTML5 is to codify the mistakes made in
creating previous HTML specifications.

> So, I'm curious - Smylers, Maciej, Jonas - assuming that the RDFa
> Community wants to change the default behavior for XMLLiteral
> processing to match authoring usage behavior... how should we make
> that change in RDFa 1.1 that ensures that the RDFa 1.0 documents
> continue to be processed as RDFa 1.0, but documents not marked with a
> version automatically use the latest processing rules (RDFa 1.1)?

I don't understand how there can be a processing rule change to match
existing author behaviour:

* A validity change to allow something authors are already doing (but
  which is processed in the same way as now) would make sense.  In which
  case, make the validity change and don't worry that that mark-up used
  to be invalid.

* A spec change to match the behaviour of current processors makes
  sense, if authors are currently writing mark-up which isn't in the
  current spec and processors are behaving as authors intended.  But
  this is more like a spec correction, since it is bringing the spec in
  line with reality; there's no need to retain backwards compatibility
  (either sort) with behaviour that never existed in practice.

* A behaviour change to do something which authors wish to do but
  currently can't sounds possible.  But in this case the authors
  wouldn't already be doing it, since that mark-up would be giving the
  wrong behaviour with current processors.

But if you have authors currently writing mark-up which is not valid and
which requires different behaviour from that of current processors and
you wish to enable that, and you also require for mark-up which matches
the current spec and processors to retain its current behaviour, and
those two forms of mark-up conflict such that processors need to know
which of the two behaviours is required, then processors cannot support
the new behaviour without an explicit indicator to do so.  So if the
existing author behaviour we're trying to enable doesn't include such a
distinguishing version indicator then we're already scuppered.

But going forwards you could enable the new behaviour in newly written
content by any flag which explicitly denotes 'this mark-up requires the
new behaviour'.  To avoid the 'syndication' problem this flag should be
inextricably entwined with the mark-up in question, for example by using
different attribute names from the 'old' behaviour.  Inserting a "2"
into the attribute names would be one way of doing this.

Smylers
Received on Monday, 12 October 2009 07:40:32 UTC