Re: Feedback on "Offline Web Applications" (Editor's Draft 17 November 2007) from Ian Hickson on 2008-05-13 (public-html@w3.org from May 2008)

From: Ian Hickson <ian@hixie.ch>
Date: Tue, 13 May 2008 00:35:36 +0000 (UTC)
To: Julian Reschke <julian.reschke@gmx.de>
Cc: "public-html@w3.org" <public-html@w3.org>
Message-ID: <Pine.LNX.4.62.0805130017490.22257@hixie.dreamhostps.com>
On Sun, 18 Nov 2007, Julian Reschke wrote:
> 
> below is some feedback on "Offline Web Applications" (Editor's Draft 17 
> November 2007) (<http://dev.w3.org/html5/offline-webapps/>).

Thanks! Some of your comments were relevant to the spec, so I've taken 
those into account in editing the spec, and have responded to them below.


> 3. Offline Application Caching APIs -- seems the spec defines a new text 
> format for defining the application caching. [...] Not sure why this 
> isn't simply an XML format; instead of defining yet another special text 
> format with (IMHO) quite obscure parsing rules

The main reason not to use XML is that defining the error handling for how 
to process XML is orders of magnitude more complicated than desired here, 
and, more importantly, it is frequently the case that UAs get it wrong. 
For example, many UAs of XML-based vocabularies check the namespace of the 
root element and then ignore the namesapce of other nodes, so that things 
like:

   <manifest xmlns="http://www.w3.org/ns/manifest">
     <file xmlns="http://bogus.example.com/" href="..."/>
   </manifest>

...are treated the same as:

   <manifest xmlns="http://www.w3.org/ns/manifest">
     <file href="..."/>
   </manifest>

...and it is hard work to get UAs to get this right.

Also, XML is really overkill for this. After all, we only want a list of 
URLs and URL pairs, having a syntax that allows arbitrary nesting, 
arbitrary name/value pairs, namespaces, PIs, multiple ways to escape 
characters, multiple encodings, etc, is unnecessary.

Finally, there is the draconian error handling problem. We don't want to 
require that UAs parse the whole manifest before starting to process the 
manifest, and having UAs fail half-way when they hit a well-formedness 
error seems suboptimal.


> (CR only as line delimiter???)

The line terminator is now CR, LF, or CRLF, as desired by the author.


> However, *what* is defined over there ("Note: This is a willful double 
> violation of RFC2046.") makes me nervous.

As I understand it, RFC2046 requires us not to support LF only, which is 
incompatible with typical workflows on the Web, and requires us to not use 
UTF-8 as the default, which is somewhat silly in this day and age.


On Sun, 18 Nov 2007, Julian Reschke wrote:
> Henri Sivonen wrote:
> > 
> > RFC 2046 was created with email legacy considerations in mind. The 
> > encoding rules there are not only unhelpful but downright harmful in 
> > the contemporary HTTP context with UTF-8 decoding readily available.
> > 
> > The Web needs a text/5 spec.
> 
> That may be true, but then take that to the relevant standards body, 
> instead of simply violating a spec on purpose. This seems to follow a 
> pattern of "we ignore what the specs do, we can do better" with which I 
> Strongly disagree.

Is there any chance I can ask you to help us here? You're probably in a 
better position to take it to the relevant standards body than I am. Any 
help you could provide here would be great.


> If you don't like the defaults for a text/* format, use application/*.

It would be sad to deprecate the text/* type just because of an outdated 
spec.


> > Quoting the draft:
> > "Newlines must be represented by U+000A LINE FEED (LF) characters, U+000D
> > CARRIAGE RETURN (CR) characters, or U+000D CARRIAGE RETURN (CR) U+000A LINE
> > FEED (LF) pairs."
> 
> Yep, that's what I meant. Why invent a new text format that has line 
> ending rules other than others? Did anybody consider how well this works 
> with existing language libraries for reading text streams?

These line ending rules are in fact exactly the same as HTML's.


On Mon, 19 Nov 2007, Julian Reschke wrote:
> 
> I don't see how this is relevant here as the spec defines a new MIME 
> type (well, actually it doesn't; it just gives it a name; defining a 
> MIME type is a bit more work).

Yeah, at some point we need to define text/cache-manifest, text/html, and 
text/event-stream (the three MIME types defined by this specification). 
They aren't totally stable yet, so I'd rather wait a bit more before 
writing the appendices to do that.

> Henri wrote:
> > LF, CR and CRLF being all valid line breaks is consistent with XML, 
> > HTML, CSS and the way text/plain is actually implemented in browsers. 
> > Demos: http://hsivonen.iki.fi/test/moz/linebreak/
> 
> That seems to contradict <http://www.w3.org/TR/REC-xml/#sec-line-ends>:
> 
> "To simplify the tasks of applications, the XML processor MUST behave as 
> if it normalized all line breaks in external parsed entities (including 
> the document entity) on input, before parsing, by translating both the 
> two-character sequence #xD #xA and any #xD that is not followed by #xA 
> to a single #xA character."

Consistency with HTML is a higher priority here (what with it being the 
HTML spec and all), but in any case, as Maciej notes, the above is in fact 
just another way of saying that CR, LF, and CRLF must all be treated as 
valid ways of terminating lines.


On Mon, 19 Nov 2007, Julian Reschke wrote:
> 
> I also have to note that while the format allows single-CR as line 
> terminator (which I think is obscure), it doesn't allow a leading BOM 
> (something that you'll get when you edit an UTF-8 encoded text file with 
> Notepad).

The spec now allows a leading BOM.


On Mon, 19 Nov 2007, Julian Reschke wrote:
> 
> Looking at the spec...
> 
> "The first line of an application cache manifest must consist of the 
> string "CACHE", a single U+0020 SPACE character, the string "MANIFEST", 
> and zero or more U+0020 SPACE and U+0009 CHARACTER TABULATION (tab) 
> characters. If any other text is found on the first line, the user agent 
> will ignore the entire file. The first line may optionally be preceded 
> by a U+FEFF BYTE ORDER MARK (BOM) character."
> 
> ...the two last sentences seem to contradict each other.

Fixed, thanks.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Tuesday, 13 May 2008 00:36:13 UTC