Re: [whatwg] Parsing: how to deal with marker while reconstructing the active formatting elements?

Simon Pieters (2015-04-08 11:07 Europe/Helsinki):
> On Wed, 08 Apr 2015 07:55:26 +0200, Mikko Rantalainen
> <mikko.rantalainen@peda.net> wrote:
>> The section 12.2.3.3 The list of active formatting elements
>> (https://html.spec.whatwg.org/multipage/syntax.html#the-list-of-active-formatting-elements)
>> has steps to "reconstruct the active formatting elements". The steps
>> include
>>
>> [...]
>> How to deal with the case where the `entry` points to a marker after
>> step 7? Obviously one cannot create a marker as an HTML element.
>>
>> This case seems possible because only the Step 6 checks for a marker and
>> then Step 7 blindly advances the list and may set `entry` to a marker.
>>
>> (I'm asking this question because I hit this case while parsing user
>> input with html5lib PHP implementation and that implemenetation crashes
>> while trying to create an HTML element from marker.)
>
> What is the input that triggers this? I fail to come up with a list of
> active formatting elements that makes the reconstruct algorithm have a
> marker as entry in step 8.

A minimal test case that reproduces the problem is

<table><tr><td>
<p><b>1<span><div><a>2</a></div></span></b></p>
</td></tr></table>

I'm not sure if some of that is not strictly required but at least this 
test case causes a crash at 
https://github.com/PedaNet/html5lib/blob/a11001bb9fd27d8a54228eb7851564cf27c25d6d/php/library/HTML5/TreeBuilder.php#L3307 
where $entry->cloneNode() is called and $entry in fact contains the 
self::MARKER instead of a DOMNode. Source code comments refer to "steps 
to reconstruct the active formatting elements".

If no other parser implementation has issues with this source, I guess 
it's some another bug in the html5lib PHP implementation which causes an 
extra marker in the list of active formatting elements.

Could somebody explain the intended contents of list of active 
formatting elements? Should that list ever contain multiple markers by 
design? In the case of crash, the list contains one marker followed by 
one DOM node.

-- 
Mikko

Received on Wednesday, 8 April 2015 12:47:10 UTC