ISSUE-41 versioning comments

During the HTML working group meeting, I reported on the state of
TAG work on Versioning, and there was some discussion. In
addition, after the end of the HTML working group meeting,
discussion continued.

I took the transcript and tried to turn it into sentences that
I believed, shamelessly using wording from others without
credit.

I'm mainly posting this to get it in the archives and just in
case I don't have time to edit it down more -- what I'd like
to do is update the "Versioning and HTML" document though.

Larry


===========

On "Language":  We need to be more explicit about what *we*
mean by a language, and take into account the distinction which we are
all familiar with but weren't explicit about -- between a language
defined by a specification, a language "defined" by a single
implementation, and a language which is defined as an agreement among
a community.


Comments from WG meeting:


Extensibility is intertwined with versioning.
* Creating a 'version' of a language is a way of combining a
  (potentially large) set of extensions.
* You could think of language + extensions as another 'version' of the
  language.
* Things like "ignoring unknown extensions" is one way of handling
  extensibility.
* Namespaces are relevant as ways of extending the 'version indicator'
  for sub-features.


Languages and Platforms
* HTML is a language. the DOMAPIs are a language. JavaScript and CSS are
  languages.  The "browser platform" is a set of languages, along with
  expectations for which versions of which languages are needed.


One challenge in HTML5 is that there is nothing that enables you in a
uniform way to allow you to extend the language in a way that
guarantee that there won't be conflict.

For example, namespace URIs are a good way to ensure that there won't
be conflict.

Of course, using prefixes or coordinating with other implementations
is possible.  But not everyone, in practice, is going to standardize
before shipping.

MS has stringent backwards compatibility requirements.

We don't want to open the door to vendors shipping
proprietary extensions and worrying about the consequences later. Think
about how the OpenSVG effort is using the language. (?)

Border corners or transform are examples of versioning issues with
<canvas> -- vendors may maintain multiple implementations in a single
browser to deal with the compatibility issues?


MIME types:
Documents don't "have" a MIME type, they're "served" or "sent" with a
MIME type.

Subsets are kinds of versions, e.g., languages which aren't completely
implemented.

(XML5 is a superset of XML 1.x fwiw, and deals with entities)


Versioning and extensibility aren't just "related" but fundamentally
combined.  Versioning can be fuzzy, and the goal of bringing it up in
the TAG is to try to be less fuzzy about it.


Often when versioning is mentioned it's not clear what the
implications are for the various actors (authors, user agents, tools)
and I'm not sure if it's always considered that being fuzzy about it
can work if the language is incrementally evolved rather than in
versions.

Most stuff on the Web is incrementally evolved and not in huge steps.

Browsers implement HTML, CSS, and DOM APIs piecemeal and that likewise
authors adopt them piecemeal.

We haven't carefully distinguished between implementations, instances,
and specifications, and I agree the document on versioning should.

Examples: Opera supports <canvas>, but not <aside>; IE8 supports
onhashchange, but not <canvas>, etc.. Authors use <canvas> and for IE8
they use a JavaScript workaround.

You can define a 'language' by an implementation, or not just the
implementation, but also the installation of the implementation.

But if you tie language to the implementation how do you get interop
on it?

It's best to talk about a 'language' as an agreement between multiple
implementations and instantiations. A community agrees on a common
language, even if there are subsets of the community which also have
additional terms or changes or restrictions or modifications.

Over time, the "community" you consider important evolves, as well as
the needs of different communities.

Example: Sometimes people tie versioning to the implementation,
navigator.userAgent or HTTP user agent string.

But using the name of the software as an indicator for its
capabilities has enormous drawbacks. It's why (Mozilla Compatible)
ruled in HTTP user agent.  Because the desire was to determine
capabilities, not implementation name, and the handling of unknown
agents was to default it to assume -- incorrectly -- that if you
didn't kno9w the name of the agent, the agent didn't know about common
extensions. The difficulty of identifying capabilities by identifying
software have been seen in lots of other standards.

It might seem like the community of web authors don't agree on a
common language, e.g. some people write <br> and some write <br/> and
they each have different views of what language they think they're
writing, and then all those things get indistinguishably published as
text/html.

But the common language they agree on allows <br> and <br/> just as
the language we are speaking allows 'yes' and 'yeah', and some people
only use one or the other.  text/html is a language indicator, which
may or may not indicate anything to anyone, given content-type
sniffing.

Many may not believe they're using a language which allows both -
people will think they're writing XHTML and so <br/> is the only thing
that's allowed, and other people will think they're writing HTML4 and
so <br> is the only thing that's allowed, and everyone else won't
think about it at all and will just copy-and-paste whatever works.

In that sense, the 'speakers' of a language don't define the
language as completely as the 'recievers'.

People can communicate even though they have different vocabularies, because
the community of agreement is broader than the set of people
talking, and includes those who provide dictionaries.

There isn't really any agreement in intentions. I don't have to know
what your intentions are in order to understand what you're saying. In
any case, the agreement in communication is between the parties
involved in the communication.


I think the notion is that people believe they are speaking "HTML",
and the working group is trying to create a language which is the
basis for future agreements about what constitutes the "HTML"
language.

Some languages have a 'formal' definition (Acadamie Francais)
as well as a 'vernacular' -- what is actually spoken. English
has no formal definition; the English Dictionary follows usage.


I meant something like: Authors aren't all intending to write in some
common language (some intend HTML4, others intend XHTML), though it
turns out that they all happen to be writing something that can be
parsed as a single common language (like what HTML5 defines) and
implementors intentionally implement that common language.

HTML Authors are intending to write in a language that they believe
the that browsers will inderstand and interpret, and certainly there
is a general understanding that the language has dialects: for
example, during Browser Wars 1.0, there was explicit attempts to
create proprietary dialects, through support of "best viewed by"
marketing campaigns, in which browser vendors intentionally introduced
dialects.

In Browser Wars 2.0, the players are different, but it's not clear the
economic forces that caused Browser Wars 1.0 have gone away. HTML5 is
a dialect being introduced with the hopes of gathering enough
consensus as to eliminate other dialects, but the issues around
extensibility remain.

Neither Mozilla Foundation nor Microsoft have promised, or could
reasonably expected to promise, not to introduce new features that
aren't implemented by the other, so... there will be
dialects. Dialects are inevitable.

What is a 'language' -- in the HTML context there's a complex muddle
of what various people write (and what they wrote ten years ago and
haven't updated) and what various browsers understand (sometimes in
conflicting ways), and there's not a perfect overlap between any of
those things, and specifications all define something different
again.

Traditionally, computer scientist texts define a language as e.g. a
set of strings (usually defined by a grammar), sometimes with a
definition of its semantics, and that doesn't seem like a useful
definition in the context of HTML.

The syntax of a language is one of its important components. Certainly
verisoning of programming languages, compatibility of compilers are
serious issues for any experienced software engineer.

The HTML language allows all strings (because all strings can be
parsed as HTML), which doesn't seem very useful.

Languages have syntax and semantics.  The syntax defines the structure
of the language, not just the set of admissible strings. Certainly
there are a set of strings that are allowable and have syntax and
semantics. If HTML wants to allow all strings to be admissible, that's
unusual but not really a problem... text/plain also allows all
strings.

Programming languages are different (and even with ECMAScript they try
very hard to stay clear of versioning and succeed reasonably).

Compatibility is important, but 'backward' assumes a linear evolution
which is often not accurate. A single implementation with a controlled
linear evolution can talk about 'backward' compatibility.

Whether a language is a 'programming' language or a set of APIs is
somewhat irrelevant.

Maybe it is if you forget about compatibility between implementations,
and look at compatibility between an implementation and the documents
that comprise the web (which is what really matters for compatibility)
Because then there's a clear linear time ordering, assuming the web
doesn't fragment.

But there have been many cases of disjoint evolution of the web and
Browser Wars 1.0 was only the most visible and egregious and
intentional one.Any language which has multiple implmenetations also
has to deal with distributed extensions. The HTML "distributed
extensibility" question just comes down to whether there is anyting
that can be done to manage the extensibility to reduce chaos and
future incompatibility problems.

============================

Discussion about <SVG> and other features;

Claim: for the Open Web platform to work, there should be feature
parity across browsers... if it can be done by a plugin, great, but
this isn't a theoretical exercise, this is a matter of pragmatics.

Does this mean no browser can ever implement any feature that some
other browser doesn't implement, and otherwise the Open Web platform
cannot work?

But how many browsers count? The Amazon Kindle doesn't implement all
the browser features that Mozilla and Chrome implement, so does the
Aamazon Kindle hinder the platform?  If Microsoft implements something
other browsers don't, does that hinder the platform?

If MS drops important features, it does hinder it... authors can't
rely on their content working on it (assuming enough people are using
the kindle as a browsing device).

the idea that <canvas> is fast and <Svg> isn't -- is that really true?
And are those intrinsic issues with <svg> or just the accident of how
much effort has gone into optimizing the <canvas> implementations?

it's unreasonable to believe that all desirable web graphics can be
supported by canvas. certainly Google Earth or Lively couldn't be. So
there has to be some choice about which use cases are important to
build in and which ones aren't, and what it means to "mandate" a
feature.

What's the extensibility story: it harms the platform if 3 out of 4
browsers decide they want it, and the 4th holds out?

Mozilla proposes animation extensions to PNG, writes a specification,
implements it; someone at Opera thinks it's a good idea and implements
it too.

Is it that 2 out of 4: minority, 3 out of 4, majority?

With APNG, the format is designed to degrade (to a single static
image) in browsers that don't support it, so the idea is people will
start using it with the static fallback in IE (and most WebKit-based
browsers)

If it gets sufficiently widely used then the other browser developers
will decide it's worth implementing.

Is HTML with APNG is a different 'version' than HTML without APNG?

Is it a 'standard' feature that's needed for any browser
         that wants to browse the web?

Is APNG is an extension?


It might be noted that none of those browser developers will care what
an HTML spec says about the feature (they'll just implement it if it
seems worth implementing).  But at some point that kind of thinking
leads you to say "close down the standards group"... the standards
group is a place where implementors get together and agree what
they're going to implement. If nobody's committed to do that, then
what's the point of talking? The "spec" isn't an exercise in prose
writing, it's supposed to document the agreement of the concerned
parties.

People talk about "what browser implementors will do" as if they
weren't in the room.

With APNG, it's just an image format (like PNG or GIF or animated-GIF
or JPEG2000-with-stereo-3D), so it doesn't seem like a different
'version' of HTML at all - it's just a feature of the
widely-implemented web platform, and it becomes such a feature by
having implementors and authors use it widely.


But is there's a clear boundary between things-like-APNG and
things-like-SVG ?

SVG might be the best exemplar we have at hand, since it is supported
in all the major desktop browsers save IE... it's a perfect case to
serve as an example for consideration.

They're different, but i don't know how to draw the line.  Why mandate
SVG but don't mandate APNG?  Is the the distinction non-technical, but
rather how many browsers implement it?  ?

But what about MathML? it's of limited use so you woudn't mandate it?


I think we need to get a handle on "how extensions become mandated".
That seems key to the versioning issues.

Specifications can be irrelevant if they don't represent the (rough)
consensus of those who are intended to implement the specification .
The mission of W3C is to "lead the web" to its "full
potential". Leadership means getting people to agree to follow.

APNG vs MNG is perhaps an example of how specs are largely irrelevant,
and features get (relatively) widely implemented based on technical
merits as determined by browser developers.

Received on Friday, 19 June 2009 00:30:07 UTC