Re: [DOM4] Mutation algorithm imposed order on document children from Aryeh Gregor on 2012-06-14 (www-dom@w3.org from April to June 2012)

From: Aryeh Gregor <ayg@aryeh.name>
Date: Thu, 14 Jun 2012 13:23:14 +0300
To: Elliott Sprehn <esprehn@gmail.com>
Cc: Ryosuke Niwa <rniwa@webkit.org>, Boris Zbarsky <bzbarsky@mit.edu>, Ojan Vafai <ojan@chromium.org>, www-dom <www-dom@w3.org>
Message-ID: <CAKA+Ax=OkN5X9a_OTPk1YJ=OuRzShTPWfHKJo708cDt5WKCoUQ@mail.gmail.com>

On Tue, Jun 12, 2012 at 11:55 PM, Elliott Sprehn <esprehn@gmail.com> wrote:
> I'd much rather the spec require serialization to respect the document mode
> and output the doctype required to ensure that reading in the serialized
> document produces the same document state.
>
> That is we should require Unserialize(Serialize(document)) == document

That's impossible in general, because some markup can be created in
the DOM but has no serialized representation.  For instance,

  data:text/html,<script>document.appendChild(document.createComment("-->"))</script>

There you have a DOM that will never be created by an HTML *or* XML
parser, for any input.  (Gecko refuses to create the comment in this
case, but that's a bug.  Anyway, you can come up with other examples
without much trouble, like consecutive or empty text nodes.)

You're talking about beyond the DOM level, though -- you want a
serialized standards-mode document to parse as standards-mode.  This
is an interesting thought, but you could also talk about all sorts of
other metadata that the parser infers and that doesn't later change
even if the DOM does.

For instance, would you also like to ensure that the charset that the
document is serialized in matches the charset that will be parsed?
What about the HTML vs. XML flag -- what if I have an HTML document
and serialize it as XML?  Do you want it to only become an HTML
document, not an XML document?  What happens if the root element has a
manifest attribute, and script removes or changes it later on?  Should
the serializer reinsert it?

What if there was a <base> that was in the markup, and then an <img>,
so the <img>'s src was resolved relative to the <base>, but the <base>
was later removed?  Should the <img>'s src be changed, or a new <base>
inserted, or what?  What if the <base> was removed by a <script> that
also removed itself?

Note that in all of these cases -- including your proposal to inject
doctypes -- the modifications would not only add complexity to the
serializer, but also make the reparsed DOM not match the existing DOM.
 This could break all kinds of things, like if there's a script that
relies on document.firstChild being <html> but now it's the doctype
that you inserted.

HTML pages are dynamic, and serialized markup is never going to fully
or correctly encode the current state of the page if scripts have
messed with it.  All else being equal, it's better if we serialize a
more accurate reflection of the current page's state, but IMO this is
a weak reason by itself to add complexity to serializers.  This would
only be a good idea if authors are actually hitting problems in
practice.

Received on Thursday, 14 June 2012 10:24:08 UTC