Re: ZIP-based packages and URI references into them ODF proposal from Ian Hickson on 2008-12-30 (www-tag@w3.org from December 2008)

From: Ian Hickson <ian@hixie.ch>
Date: Tue, 30 Dec 2008 10:12:19 +0000 (UTC)
To: Julian Reschke <julian.reschke@gmx.de>
Cc: Larry Masinter <masinter@adobe.com>, "www-tag@w3.org" <www-tag@w3.org>
Message-ID: <Pine.LNX.4.62.0812300924340.12643@hixie.dreamhostps.com>
Extensibility is but one reason why detailed specifications that don't 
leave things undefined are essential to the successful development of a 
multi-vendor technology stack, there are many others, as discussed in 
earlier e-mails. For the purposes of this e-mail I only look at the 
extensibility aspect, though.

On Tue, 30 Dec 2008, Julian Reschke wrote:
> Ian Hickson wrote:
> >
> > Application statements don't limit innovation. In fact, having 
> > detailed specifications that define the precise rules for parsing and 
> > that give precise rules for the processing models and so forth 
> > dramatically increase the ease with which the respective protocols can 
> > be extended, because there is no guesswork about exactly how various 
> > implementations are going to handle the new syntax.
> 
> They can help with extensibility; but they can also ruin it.

Well certainly it is possible to design the language in a bad way.

With a well-written spec, i.e. an "application statement" or a 
"implementation functional specification" as Larry called them, if the 
language is designed right, innovation is possible. CSS2 is a great 
example of this.

With a poorly-written spec, i.e. one that doesn't fully detail the entire 
processing model but instead leaves things undefined, innovation is 
extremely hard. HTML4 is a good example of this.

The difference is that in the first case, there is a conscious decision 
made about what the processing model should be (including, but not limited 
to, things like error handling). Thus there is the possibility of making 
the right choice and getting an extensible language. In the second case, 
there is no decision to make, and thus there is no chance of ensuring that 
the language is practically extensible.


> For instance, if a specification requires recipients to accept *any* 
> kind of broken input (by specifying how to parse it anyway), it 
> essentially takes away all future extensibility with respect to syntax.

It's not the specification that allows or disallows extensibility, it's 
the behavior of the down-level clients. If the down-level clients all 
handle the desired new syntax in the same way, then extensibility is 
possible. If they don't, then extensibility is hard to impossible. We can 
get consistency by having a spec that defines all the ways of handling 
input, including content that doesn't conform to the current syntax. 
Whether that is "accept any kind of broken input" or "have a fatal error 
as soon as the content is unexpected" is another issue altogether than 
what I am discussing here.

(In practice, there are roughly speaking three ways to handle unexpected 
content -- fatal error, ignore it, or correct it. A fatal error makes it 
extremely hard to extend the language, because it means all extensions 
violate backwards compatibility. Thus, for instance, the difficulty with 
upgrading XML from 1.0 to 1.1. Similarly, if the error handling consists 
of correcting the author intent and handling it in some special way, it is 
hard to extend the language because extensions have to be designed around 
the legacy behavior. The better solution, and the one picked by CSS, is to 
use a "must-ignore" model for all unknown syntax.)


> > For instance, adding a new property or new syntax to CSS is easy, 
> > because CSS defines forward-compatible parsing rules, so there is no 
> > ambiguity about how a down-level browser is going to process new 
> > features. However, adding a new element to HTML is incredibly 
> > difficult, because every browser differs in how it handles unknown 
> > syntax, because the specs never covered this case.
> 
> But HTML5 to some degree has the same problem: as the set of void 
> elements is hard-wired into the spec, no new void elements can be 
> introduced. That by itself would be fine if it actually guaranteed that 
> future versions of the language won't introduce new void elements, which 
> it doesn't.

Yes, HTML has a terrible forward-compatibility story. We have ended up 
forced into this situation mostly because the earlier versions of the spec 
didn't define the full processing model, and thus user agents varied 
greatly in their behavior. Thus, instead of a coherent, well-thought-out 
extension model, we have a de-facto extension model derived from a long 
series of accidental decisions by a wide variety of independent people.

This is another example of what happens when specifications don't have 
clear and fully defined processing models.


> > Similarly, if we were to extend XML's xml:preserve attribute to have a 
> 
> xml:space?

Yes, my apologies.


> > third value, we couldn't do so without checking how all the different XML
> > processors would handle the new value, because XML doesn't define how to
> > handle unknown values. ...
> 
> Yes, it does:
> 
> "The value "default" signals that applications' default white-space processing
> modes are acceptable for this element; the value "preserve" indicates the
> intent that applications preserve all the white space. This declared intent is
> considered to apply to all elements within the content of the element where it
> is specified, unless overridden with another instance of the xml:space
> attribute. This specification does not give meaning to any value of xml:space
> other than "default" and "preserve". It is an error for other values to be
> specified; the XML processor MAY report the error or MAY recover by ignoring
> the attribute specification or by reporting the (erroneous) value to the
> application. Applications may ignore or reject erroneous values." --
> <http://www.w3.org/TR/xml/#sec-white-space>
> 
> So conforming processors either report an error or ignore the attribute; 
> thus xml:space can't be extended *because* it defines error handling, 
> not because it doesn't.

The key part is "the XML processor MAY report the error or MAY recover by 
ignoring the attribute specification or by reporting the (erroneous) value 
to the application". This is the three options I gave above -- report an 
error, ignore the error, or report the value to the next level and let it 
deal with it however it wants, i.e. interpret and "correct" it. That isn't 
going to lead to interoperability. Some UAs will have fatal error 
handling, some will ignore it, some will "correct" it and treat it as 
'preserve', some as 'default', some maybe as yet something else -- and all 
would be conforming! Some will ignore the attribute and have the parent's 
value inherit down as if the attribute wasn't there, others will not 
ignore the attribute and will thus have the value not be inherited...

If you consider this "defining how to handle unknown values", then we have 
very different ideas of what is meant by "define".

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Tuesday, 30 December 2008 10:13:01 UTC