[whatwg] [wa1] Status of tree construction section

"Ian Hickson" <ian at hixie.ch> wrote:

> On 7/7/06, Stewart Brodie <stewart.brodie at antplc.com> wrote:
> >
> > I thought I'd have a go at implementing the parsing algorithms,
> > specifically the tree construction algorithms, to see what effect it had
> > on the DOM trees that our parser creates.  Has anybody else here
> > actually implemented this tree construction algorithm?  I'm finding one
> > or two issues that I think may be (minor) mistakes, and I'd like to
> > compare notes to see whether I've just misunderstood it or whether it is
> > a mistake.
> 
> I've been implementing it (to test the spec); I'd be quite happy to
> compare notes (either on this list or off-list, as you wish).
> Note that I'd definitely not consider that part of the spec "done" yet.

I'm happy to post to the list.  The first few issues are quite trivial, I
think:

In the main phase, section 'If the insertion mode is "in row"', the last
option for 'anything else' says "process ... as if ... in table".  I think
that should say "as if ... in table body" instead.  That case will re-throw
the token out to "in table" in any case if it doesn't handle it.

The case immediately above that "An end tag whose tag name is one of: body,
caption, col, colgroup, html, td, th, tr".  The /tr case is already handled
by the second case.  Remove 'tr' from the list here.

In 'If the insertion mode is "in cell"', the absence of a case for an end
tag for CAPTION looks odd.  All the other table-related tags are handled
here explicitly, so why is CAPTION so different (that it should be handled
in the 'treat it as "in body"' way)?

I've come to the conclusion that you need pictures to accompany the
"adoption agency algorithm".  However, I'm not an artist.  Indeed, I'm so
bad at drawing pictures, that in the past, users often sent me replacement
bitmap graphics for my programs because they found my attempts so
distressing :-)

With reference to that algorithm, I think that the text in point 1 should be
re-organised somewhat after the second paragraph to make it a little
clearer.  I've re-organised it and I think it says exactly the same now, but
simpler and with less potential for misunderstanding:

  "If there is a _formatting element_; proceed immediately to step 2

  Otherwise, there is no _formatting element_.  If there is an element in
  the _list of active formatting elements_ that:

  o  [same three steps, but with ", and" appended to the top one]

  then remove the last such element from the _list of active formatting
  elements_.

  In any case, abort these steps."


In the various places where a given operation has to be described multiple
times, you've macroed it (e.g. "insert an HTML element", "clear the list of
active formatting elements up to the last marker").  I suggest adding
another this one that can be used during the Adoption Agency algorithm (I'm
sure that I found I needed to perform this search in other places too -
hence defining it separately - although I can't quite recall exactly where
for the time being, ho hum):

  "The _list of active formatting elements_ is said to *have an element in
   active formatting scope* when the following algorithm terminates in a
   match state:

  1. If the _list of active formatting elements_ is empty, terminate in a
     failure state.

  2. Initialise _entry_ to be the last (most recently added) entry in the
     _list of active formatting elements_.

  3. If _entry_ is a marker, terminate in a failure state.

  4. If _entry_ is an element with a tag name matching the target element
     name, terminate in a match state.

  5. If there are further elements in the _list of active formatting
     elements_, set _entry_ to the previous entry and return to step 3.

  6. Terminate in a failure state (there are no more entries)"


Step 6 in the original 14-step algorithm: "relative position of the
formatting element".  Relative to what?

The "parsing quirks" box lists several issues that I think are important.
The <script> one in particular is so very common.  Unfortunately, I had to
cave in eventually and support that because it broke some customers' own
sites.  I have come across never-opened </br> and </p> too.  I've never
heard of <% ... %> before.  Sometimes, it's really quite depressing the
rubbish that people (and programs!) write out.

I spent a long time trying to work out what I needed to store for each entry
on both the stack of open elements and the list of active formatting
elements.  I think it should be stated up front because this is often an
area of confusion, in my experience.  I frequently get upset with co-workers
over misuse of the terms "element", "tag" and "node", for example :-)

Finally (for now ;-), right at the beginning of the tree construction
section, it says that DOM Mutation events must not fire for changes caused
by the UA parsing the document.  I cannot decide whether or not I agree with
that statement.  My experimentation appears to show that this is indeed what
happens in Firefox, at least. I put a script in the head of my document that
attaches a listener for DOMNodeInserted on the document.documentElement node
(i.e. the HTML element) and it never gets called due to nodes being added by
the parser.  Internally, for me, it's a PITA though, because my node tree
construction code and DOM implementation code use the same internal APIs -
and these automatically trigger the DOM events, which, in turn, get
dispatched to the various internal default event handlers to deal with the
special types of node that require additional behaviour (like IMG, LINK,
META etc.).



> > [http://svn.whatwg.org]  Neither web browsers nor svn itself can talk to
> > that URI.  Am I doing something wrong or is it broken?
> 
> Subversion should be able to talk to that URI... It's the URI I use to
> check in! :-)

I have now tracked this down to an over-zealous company firewall!  I have
been able to use command-line svn on an external machine to check things out
and get diffs and histories for the time being.


-- 
Stewart Brodie
Software Engineer
ANT Software Limited

Received on Monday, 10 July 2006 09:19:14 UTC