Re: [whatwg] Parsing: how to deal with marker while reconstructing the active formatting elements?

On Wed, 08 Apr 2015 14:46:44 +0200, Mikko Rantalainen  
<mikko.rantalainen@peda.net> wrote:

> Simon Pieters (2015-04-08 11:07 Europe/Helsinki):
>> On Wed, 08 Apr 2015 07:55:26 +0200, Mikko Rantalainen
>> <mikko.rantalainen@peda.net> wrote:
>>> The section 12.2.3.3 The list of active formatting elements
>>> (https://html.spec.whatwg.org/multipage/syntax.html#the-list-of-active-formatting-elements)
>>> has steps to "reconstruct the active formatting elements". The steps
>>> include
>>>
>>> [...]
>>> How to deal with the case where the `entry` points to a marker after
>>> step 7? Obviously one cannot create a marker as an HTML element.
>>>
>>> This case seems possible because only the Step 6 checks for a marker  
>>> and
>>> then Step 7 blindly advances the list and may set `entry` to a marker.
>>>
>>> (I'm asking this question because I hit this case while parsing user
>>> input with html5lib PHP implementation and that implemenetation crashes
>>> while trying to create an HTML element from marker.)
>>
>> What is the input that triggers this? I fail to come up with a list of
>> active formatting elements that makes the reconstruct algorithm have a
>> marker as entry in step 8.
>
> A minimal test case that reproduces the problem is
>
> <table><tr><td>
> <p><b>1<span><div><a>2</a></div></span></b></p>
> </td></tr></table>
>
> I'm not sure if some of that is not strictly required but at least this  
> test case causes a crash at  
> https://github.com/PedaNet/html5lib/blob/a11001bb9fd27d8a54228eb7851564cf27c25d6d/php/library/HTML5/TreeBuilder.php#L3307  
> where $entry->cloneNode() is called and $entry in fact contains the  
> self::MARKER instead of a DOMNode. Source code comments refer to "steps  
> to reconstruct the active formatting elements".
>
> If no other parser implementation has issues with this source, I guess  
> it's some another bug in the html5lib PHP implementation which causes an  
> extra marker in the list of active formatting elements.

I don't think that's the issue, since you have one marker and there should  
be one (for <td>). Skipping past the "advance" step could explain this  
situation. Looking at the code it appears $step_seven is not defined for  
the first iteration, so that step will be skipped. Adding $step_seven =  
true; at the top of the function might fix this.

> Could somebody explain the intended contents of list of active  
> formatting elements? Should that list ever contain multiple markers by  
> design?

Sure, e.g. <object><object> will have two markers.

> In the case of crash, the list contains one marker followed by one DOM  
> node.

OK. So I think the crash happens when seeing the <a>, but it's not a bug  
in the spec AFAICT. It also doesn't crash in Blink/WebKit/Gecko/Presto.



<table><tr><td><p><b>1<span>

This is straight-forward.
SoOE: html, body, table, tbody, tr, td, p, b, span
LoAFE: marker (td), b


<table><tr><td><p><b>1<span><div>

"If the stack of open elements has a p element in button scope, then close  
a p element."
->
"Pop elements from the stack of open elements until a p element has been  
popped from the stack."

SoOE: html, body, table, tbody, tr, td, div
LoAFE: marker (td), b


<table><tr><td><p><b>1<span><div><a>

"Reconstruct the active formatting elements, if any."
->
"1. If there are no entries in the list of active formatting elements,  
then there is nothing to reconstruct; stop this algorithm."

There are two entries. Carry on.

"2. If the last (most recently added) entry in the list of active  
formatting elements is a marker, or if it is an element that is in the  
stack of open elements, then there is nothing to reconstruct; stop this  
algorithm."

It's not a marker, it's not in the SoOE. Carry on.

"3. Let entry be the last (most recently added) element in the list of  
active formatting elements."

entry = b

"4. Rewind: If there are no entries before entry in the list of active  
formatting elements, then jump to the step labeled create."

There is an entry before. Carry on.

"5. Let entry be the entry one earlier than entry in the list of active  
formatting elements."

entry = marker

"6. If entry is neither a marker nor an element that is also in the stack  
of open elements, go to the step labeled rewind."

entry is marker. Carry on.

"7. Advance: Let entry be the element one later than entry in the list of  
active formatting elements."

entry = b

"8. Create: Insert an HTML element for the token for which the element  
entry was created, to obtain new element."

This creates a <b> element.

"9. Replace the entry for entry in the list with an entry for new element."

Carry on.

"10. If the entry for new element in the list of active formatting  
elements is not the last entry in the list, return to the step labeled  
advance."

It is the last entry. The algorithm stops here.

HTH,
-- 
Simon Pieters
Opera Software

Received on Thursday, 9 April 2015 07:14:26 UTC