- From: Larry Masinter <masinter@adobe.com>
- Date: Thu, 18 Jun 2009 17:28:52 -0700
- To: Anne van Kesteren <annevk@opera.com>, Doug Schepers <schepers@w3.org>, Philip Taylor <pjt47@cam.ac.uk>
- CC: "www-tag@w3.org" <www-tag@w3.org>
During the HTML working group meeting, I reported on the state of TAG work on Versioning, and there was some discussion. In addition, after the end of the HTML working group meeting, discussion continued. I took the transcript and tried to turn it into sentences that I believed, shamelessly using wording from others without credit. I'm mainly posting this to get it in the archives and just in case I don't have time to edit it down more -- what I'd like to do is update the "Versioning and HTML" document though. Larry =========== On "Language": We need to be more explicit about what *we* mean by a language, and take into account the distinction which we are all familiar with but weren't explicit about -- between a language defined by a specification, a language "defined" by a single implementation, and a language which is defined as an agreement among a community. Comments from WG meeting: Extensibility is intertwined with versioning. * Creating a 'version' of a language is a way of combining a (potentially large) set of extensions. * You could think of language + extensions as another 'version' of the language. * Things like "ignoring unknown extensions" is one way of handling extensibility. * Namespaces are relevant as ways of extending the 'version indicator' for sub-features. Languages and Platforms * HTML is a language. the DOMAPIs are a language. JavaScript and CSS are languages. The "browser platform" is a set of languages, along with expectations for which versions of which languages are needed. One challenge in HTML5 is that there is nothing that enables you in a uniform way to allow you to extend the language in a way that guarantee that there won't be conflict. For example, namespace URIs are a good way to ensure that there won't be conflict. Of course, using prefixes or coordinating with other implementations is possible. But not everyone, in practice, is going to standardize before shipping. MS has stringent backwards compatibility requirements. We don't want to open the door to vendors shipping proprietary extensions and worrying about the consequences later. Think about how the OpenSVG effort is using the language. (?) Border corners or transform are examples of versioning issues with <canvas> -- vendors may maintain multiple implementations in a single browser to deal with the compatibility issues? MIME types: Documents don't "have" a MIME type, they're "served" or "sent" with a MIME type. Subsets are kinds of versions, e.g., languages which aren't completely implemented. (XML5 is a superset of XML 1.x fwiw, and deals with entities) Versioning and extensibility aren't just "related" but fundamentally combined. Versioning can be fuzzy, and the goal of bringing it up in the TAG is to try to be less fuzzy about it. Often when versioning is mentioned it's not clear what the implications are for the various actors (authors, user agents, tools) and I'm not sure if it's always considered that being fuzzy about it can work if the language is incrementally evolved rather than in versions. Most stuff on the Web is incrementally evolved and not in huge steps. Browsers implement HTML, CSS, and DOM APIs piecemeal and that likewise authors adopt them piecemeal. We haven't carefully distinguished between implementations, instances, and specifications, and I agree the document on versioning should. Examples: Opera supports <canvas>, but not <aside>; IE8 supports onhashchange, but not <canvas>, etc.. Authors use <canvas> and for IE8 they use a JavaScript workaround. You can define a 'language' by an implementation, or not just the implementation, but also the installation of the implementation. But if you tie language to the implementation how do you get interop on it? It's best to talk about a 'language' as an agreement between multiple implementations and instantiations. A community agrees on a common language, even if there are subsets of the community which also have additional terms or changes or restrictions or modifications. Over time, the "community" you consider important evolves, as well as the needs of different communities. Example: Sometimes people tie versioning to the implementation, navigator.userAgent or HTTP user agent string. But using the name of the software as an indicator for its capabilities has enormous drawbacks. It's why (Mozilla Compatible) ruled in HTTP user agent. Because the desire was to determine capabilities, not implementation name, and the handling of unknown agents was to default it to assume -- incorrectly -- that if you didn't kno9w the name of the agent, the agent didn't know about common extensions. The difficulty of identifying capabilities by identifying software have been seen in lots of other standards. It might seem like the community of web authors don't agree on a common language, e.g. some people write <br> and some write <br/> and they each have different views of what language they think they're writing, and then all those things get indistinguishably published as text/html. But the common language they agree on allows <br> and <br/> just as the language we are speaking allows 'yes' and 'yeah', and some people only use one or the other. text/html is a language indicator, which may or may not indicate anything to anyone, given content-type sniffing. Many may not believe they're using a language which allows both - people will think they're writing XHTML and so <br/> is the only thing that's allowed, and other people will think they're writing HTML4 and so <br> is the only thing that's allowed, and everyone else won't think about it at all and will just copy-and-paste whatever works. In that sense, the 'speakers' of a language don't define the language as completely as the 'recievers'. People can communicate even though they have different vocabularies, because the community of agreement is broader than the set of people talking, and includes those who provide dictionaries. There isn't really any agreement in intentions. I don't have to know what your intentions are in order to understand what you're saying. In any case, the agreement in communication is between the parties involved in the communication. I think the notion is that people believe they are speaking "HTML", and the working group is trying to create a language which is the basis for future agreements about what constitutes the "HTML" language. Some languages have a 'formal' definition (Acadamie Francais) as well as a 'vernacular' -- what is actually spoken. English has no formal definition; the English Dictionary follows usage. I meant something like: Authors aren't all intending to write in some common language (some intend HTML4, others intend XHTML), though it turns out that they all happen to be writing something that can be parsed as a single common language (like what HTML5 defines) and implementors intentionally implement that common language. HTML Authors are intending to write in a language that they believe the that browsers will inderstand and interpret, and certainly there is a general understanding that the language has dialects: for example, during Browser Wars 1.0, there was explicit attempts to create proprietary dialects, through support of "best viewed by" marketing campaigns, in which browser vendors intentionally introduced dialects. In Browser Wars 2.0, the players are different, but it's not clear the economic forces that caused Browser Wars 1.0 have gone away. HTML5 is a dialect being introduced with the hopes of gathering enough consensus as to eliminate other dialects, but the issues around extensibility remain. Neither Mozilla Foundation nor Microsoft have promised, or could reasonably expected to promise, not to introduce new features that aren't implemented by the other, so... there will be dialects. Dialects are inevitable. What is a 'language' -- in the HTML context there's a complex muddle of what various people write (and what they wrote ten years ago and haven't updated) and what various browsers understand (sometimes in conflicting ways), and there's not a perfect overlap between any of those things, and specifications all define something different again. Traditionally, computer scientist texts define a language as e.g. a set of strings (usually defined by a grammar), sometimes with a definition of its semantics, and that doesn't seem like a useful definition in the context of HTML. The syntax of a language is one of its important components. Certainly verisoning of programming languages, compatibility of compilers are serious issues for any experienced software engineer. The HTML language allows all strings (because all strings can be parsed as HTML), which doesn't seem very useful. Languages have syntax and semantics. The syntax defines the structure of the language, not just the set of admissible strings. Certainly there are a set of strings that are allowable and have syntax and semantics. If HTML wants to allow all strings to be admissible, that's unusual but not really a problem... text/plain also allows all strings. Programming languages are different (and even with ECMAScript they try very hard to stay clear of versioning and succeed reasonably). Compatibility is important, but 'backward' assumes a linear evolution which is often not accurate. A single implementation with a controlled linear evolution can talk about 'backward' compatibility. Whether a language is a 'programming' language or a set of APIs is somewhat irrelevant. Maybe it is if you forget about compatibility between implementations, and look at compatibility between an implementation and the documents that comprise the web (which is what really matters for compatibility) Because then there's a clear linear time ordering, assuming the web doesn't fragment. But there have been many cases of disjoint evolution of the web and Browser Wars 1.0 was only the most visible and egregious and intentional one.Any language which has multiple implmenetations also has to deal with distributed extensions. The HTML "distributed extensibility" question just comes down to whether there is anyting that can be done to manage the extensibility to reduce chaos and future incompatibility problems. ============================ Discussion about <SVG> and other features; Claim: for the Open Web platform to work, there should be feature parity across browsers... if it can be done by a plugin, great, but this isn't a theoretical exercise, this is a matter of pragmatics. Does this mean no browser can ever implement any feature that some other browser doesn't implement, and otherwise the Open Web platform cannot work? But how many browsers count? The Amazon Kindle doesn't implement all the browser features that Mozilla and Chrome implement, so does the Aamazon Kindle hinder the platform? If Microsoft implements something other browsers don't, does that hinder the platform? If MS drops important features, it does hinder it... authors can't rely on their content working on it (assuming enough people are using the kindle as a browsing device). the idea that <canvas> is fast and <Svg> isn't -- is that really true? And are those intrinsic issues with <svg> or just the accident of how much effort has gone into optimizing the <canvas> implementations? it's unreasonable to believe that all desirable web graphics can be supported by canvas. certainly Google Earth or Lively couldn't be. So there has to be some choice about which use cases are important to build in and which ones aren't, and what it means to "mandate" a feature. What's the extensibility story: it harms the platform if 3 out of 4 browsers decide they want it, and the 4th holds out? Mozilla proposes animation extensions to PNG, writes a specification, implements it; someone at Opera thinks it's a good idea and implements it too. Is it that 2 out of 4: minority, 3 out of 4, majority? With APNG, the format is designed to degrade (to a single static image) in browsers that don't support it, so the idea is people will start using it with the static fallback in IE (and most WebKit-based browsers) If it gets sufficiently widely used then the other browser developers will decide it's worth implementing. Is HTML with APNG is a different 'version' than HTML without APNG? Is it a 'standard' feature that's needed for any browser that wants to browse the web? Is APNG is an extension? It might be noted that none of those browser developers will care what an HTML spec says about the feature (they'll just implement it if it seems worth implementing). But at some point that kind of thinking leads you to say "close down the standards group"... the standards group is a place where implementors get together and agree what they're going to implement. If nobody's committed to do that, then what's the point of talking? The "spec" isn't an exercise in prose writing, it's supposed to document the agreement of the concerned parties. People talk about "what browser implementors will do" as if they weren't in the room. With APNG, it's just an image format (like PNG or GIF or animated-GIF or JPEG2000-with-stereo-3D), so it doesn't seem like a different 'version' of HTML at all - it's just a feature of the widely-implemented web platform, and it becomes such a feature by having implementors and authors use it widely. But is there's a clear boundary between things-like-APNG and things-like-SVG ? SVG might be the best exemplar we have at hand, since it is supported in all the major desktop browsers save IE... it's a perfect case to serve as an example for consideration. They're different, but i don't know how to draw the line. Why mandate SVG but don't mandate APNG? Is the the distinction non-technical, but rather how many browsers implement it? ? But what about MathML? it's of limited use so you woudn't mandate it? I think we need to get a handle on "how extensions become mandated". That seems key to the versioning issues. Specifications can be irrelevant if they don't represent the (rough) consensus of those who are intended to implement the specification . The mission of W3C is to "lead the web" to its "full potential". Leadership means getting people to agree to follow. APNG vs MNG is perhaps an example of how specs are largely irrelevant, and features get (relatively) widely implemented based on technical merits as determined by browser developers.
Received on Friday, 19 June 2009 00:30:07 UTC