Re: several messages about serialising HTML and related subjects from Ian Hickson on 2008-02-29 (public-html@w3.org from February 2008)

From: Ian Hickson <ian@hixie.ch>
Date: Fri, 29 Feb 2008 02:23:03 +0000 (UTC)
To: Simon Pieters <simonp@opera.com>, "Michael A. Puls II" <shadow2531@gmail.com>, Lachlan Hunt <lachlan.hunt@lachy.id.au>, Křištof Želechovski <giecrilj@stegny.2a.pl>, Alexey Proskuryakov <ap@webkit.org>, Boris Zbarsky <bzbarsky@mit.edu>
Cc: public-html <public-html@w3.org>, whatwg <whatwg@whatwg.org>
Message-ID: <Pine.LNX.4.62.0802290133010.6407@hixie.dreamhostps.com>

Executive summary: I did most of the changes suggested below.

On Wed, 15 Aug 2007, Simon Pieters wrote:
> 
> The spec says:
> 
>    Other nodes types (e.g. Attr) cannot occur as children of elements. If
>    they do, this algorithm must raise an INVALID_STATE_ERR exception.
> 
> s/elements/elements or documents/ as the algorithm can be used for documents
> as well.
> 
> What about PIs? They can occur as children of elements or documents. 

How?


On Wed, 15 Aug 2007, Simon Pieters wrote:
> 
> The serializing HTML fragments algorithm talks about "child node" to 
> refer to the current node being processed. This is a bit confusing, and 
> I think "current node" would be clearer.

Done.


On Thu, 16 Aug 2007, Lachlan Hunt wrote:
>
>   There is a possible issue serialising HTML fragments section [1]. The 
> algorithm seems fine for use with things like innerHTML, but there are 
> other issues that should be considered when serialising to a file, 
> database, network stream or something.
> 
> Such serialisers should consider the character encoding.  Although a 
> Unicode encoding should ideally be used, some serialisers may need to 
> serialise to a different encoding at the request of the user or 
> limitations of the environment.  In such cases, the serialisation should 
> output appropriate character references for characters that can't be 
> represented.
> 
> It should also handle outputting the appropriate <meta charset=""> 
> and/or BOM, especially in environments that can't declare it at the 
> transport level like HTTP can.
> 
> Perhaps the spec should say something about this issue somehwhere.
> 
> [1] http://www.whatwg.org/specs/web-apps/current-work/#serialising

The section is specifically for serialising a subtree to a Unicode stream 
without mutation, not to a byte stream. What's the use case that isn't 
covered by "8.1 Writing HTML documents"?


On Mon, 27 Aug 2007, Simon Pieters wrote:
> 
> IE7 and Firefox serialize U+00A0 characters in data and attribute values 
> as "&nbsp;" when getting innerHTML. Safari and Opera don't. Should the 
> spec be aligned with IE7 and Firefox here?
>    
> http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C%21DOCTYPE%20html%3E%0D%0A%3Cscript%3Ewindow.onload%3Dfunction%28%29%7Bw%28document.body.innerHTML%29%7D%3C/script%3E%3Cp%20title%3D%22x%A0x%22%3Ex%A0x

I don't see any great benefit to doing so; do any pages require this?


On Tue, 28 Aug 2007, Alexey Proskuryakov wrote:
> 
>   This has caused a compatibility issue for WebKit at least once. In 
> that case, we got away with evangelizing, but we still track this as a 
> bug that needs to be fixed eventually.
> 
>   http://bugs.webkit.org/show_bug.cgi?id=11947

Ah. Ok then. Done.


On Tue, 28 Aug 2007, Boris Zbarsky wrote:

> For what it's worth, the relevant Mozilla bugs are 
> https://bugzilla.mozilla.org/show_bug.cgi?id=165686 and 
> https://bugzilla.mozilla.org/show_bug.cgi?id=169590

Cool, thanks.


On Tue, 11 Sep 2007, Simon Pieters wrote:
> 
> Consider the following document:
> 
>    <h:p xmlns:h="http://www.w3.org/1999/xhtml"><x/></h:p>
> 
> When getting innerHTML on the root element, should the serialization 
> declare the no namespace explicitly as in <x xmlns=""/>? (I think it 
> should because setting innerHTML will imply namespace declarations so it 
> might change meaning if you insert it somewhere else with innerHTML.)

I've added this:

| If any of the elements in the serialisation are in the null namespace,
| the default namespace in scope for those elements must be explicitly 
| declared as the empty string.

Is that ok?


> Also, the spec says:
> 
>    In an XML context, the innerHTML DOM attribute on HTMLElements and
>    HTMLDocuments, on getting, must return a string in the form of an
>    internal general parsed entity [...]
> 
> ...and then goes on to say that some DocumentType nodes must raise an 
> exception, however internal general parsed entities can't have doctypes 
> in the first place.

Oops. Fixed. Only elements should return internal general parsed entities; 
documents should return document entities. Empty documents now raise an 
exception.


> Finally, the spec lists the following as something that throws:
> 
>    A Text node whose data contains characters that are not matched by the
>    XML Char production. [XML]
> 
> But Text data is not the only case that might not match the Char 
> production in XML. Comment data, CDATASection data, 
> ProcessingInstruction target, and, I think, Attr value.

Fixed.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Friday, 29 February 2008 02:23:16 UTC