RE: ZIP-based packages and URI references into them ODF proposal

On Fri, 28 Nov 2008, Larry Masinter wrote:
> 
> If you are creating implementations of a particular class of software, 
> then the functional specification of that implementation will, of 
> necessity, need to document the interplay between the components, and it 
> may indeed be useful to specify how to robustly interact with existing 
> legacy components and content that are already widely deployed. However, 
> confusing "implementation functional specification" vs "definition of 
> protocol, format, language" seems like a bad idea.

One of the big problems faced by Internet-related software users today, a 
problem about which we, the Internet standards community, hear complaints 
all the time from users and each other, for example during presentations 
on W3C plenary days, is the poor level of interoperability between 
different software components that are, in principle, interacting using 
common protocols, formats, and languages.

For example, we hear about Web authors finding that different browsers 
render their pages differently. We see validators interpreting Web pages 
in different ways than search engines. We see chat clients mishighlighting 
URLs pasted from other applications. We see HTML pages that work fine in a 
Web browser render in unexpected ways in e-mail clients. We see search 
engines interpreting URLs in HTML pages differently than Web browsers.

Certainly, some problems can be traced to straight forward bugs, errors in 
the implementations of the software relative to the protocols, formats or 
languages they implement. However, a large number of these problems can in 
fact be traced back to ambiguities _in the specifications_, in particular 
in the definitions of how different layers interact.

I like to think of this as a tiling problem -- different working groups 
working on different specifications, all attempting to tile the conceptual 
architectural space of the whole system.

Here is what it might look like today:

    ## CSS #########################
    ################################

      +++++++++++++++++++++++++++++++++++++
      +++ HTML ++++++++
      +++++++++++++++++  --DOM--  ((JS))
                     :::::::::::
    /// TLS ///      ::: FTP :::  <<< data: >>>
    \\\\ HTTP \\\\   :::::::::::

    -------- IRI -------------------------
    ======== URI =========================

...an so forth. Conceptually, there are gaps between the tiles, through 
which lack of interperability can pass -- areas where the specifications 
do not fully define behavior. To lack ambiguity, it should look like this:

    ## CSS ################################
    #######################################
    #######################################
    +++++++++++++++++++++++++++++++++++++++
    +++++ HTML ++++++++++++++++++++++++++++
    ++++++++++++++++++++---DOM---(((JS)))))
    /////////////////:::::::::::<<<<<<<<>>>
    /// TLS /////////::: FTP :::<< data: >>
    \\\\ HTTP \\\\\\\:::::::::::<<<<<<<<>>>
    \\\\\\\\\\\\\\\\\:::::::::::<<<<<<<<>>>
    -------- IRI --------------------------
    ======== URI ==========================

I believe that the difference between these two states is the difference 
between looking at specifications as "definitions of protocol, format, 
language" vs "implementation functional specifications". The former gives 
us a neat set of orthogonal specifications that seem quite simple, but in 
practice, what we need for quality software is the latter.

Now, implementation functional specifications are significantly harder to 
write. One has to worry about interactions between specifications from 
working groups who may have never have spoken to each other, about rare 
edge cases that are of little interest, about what happens when the rules 
of other layers aren't followed. But that doesn't mean we shouldn't do it.


Confusing "implementation functional specification" vs "definition of
 protocol, format, language" does indeed seem like a bad idea.

We need implementation functional specifications.



> You're not seriously arguing that one should specify HTTP with the 
> assumption that TCP might be broken, and that sometimes content is 
> mangled, and put all of the ways of dealing with that into the HTTP 
> specification?

Personally, if TCP is wrong (broken), I think the problem should be 
addressed in the TCP specs. Similarly, if HTTP doesn't define processing 
requirements in detail, that is a problem the HTTP working group should 
solve. If the URL specifications don't define how to handle errors in 
URLs, that is something the URL specifications should define.

The problem that we have seen with HTML5 in particular is that sometimes, 
the people working on the specifications with the problems don't recognise 
the importance (or existence) of the problems. People on the URL mailing 
list were quite clear, for instance, that they were of the opinion that 
they should not take responsibility for defining how software should 
process mistyped URLs. The gap between the tiles was left unfilled by the 
URL specifications. (In this particular case, the HTML "tile" was extended 
instead, to at least solve this problem for URLs in HTML. It doesn't solve 
the problem for SVG, or MathML, or anything else, unfortunately.)

Note that I'm not talking about Web browsers here, I'm talking about _any_ 
software. A link checker, a search engine, a Web browser, and a validator, 
if a user is to have a consistent experience with his software, all have 
to process the user's data, such as a mistyped URL, in the same way. It's 
not an issue that we can leave up to a single implementation's functional 
specification, or even the functional specification of a single 
conformance class. It's a problem that affects all software.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Saturday, 29 November 2008 11:18:26 UTC