W3C home > Mailing lists > Public > public-html-bugzilla@w3.org > November 2011

[Bug 14284] Need HTML parser algorithm options

From: <bugzilla@jessica.w3.org>
Date: Thu, 24 Nov 2011 15:06:08 +0000
To: public-html-bugzilla@w3.org
Message-Id: <E1RTasK-0008Gb-5s@jessica.w3.org>
http://www.w3.org/Bugs/Public/show_bug.cgi?id=14284

Henri Sivonen <hsivonen@iki.fi> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|annevk@opera.com            |ian@hixie.ch
  Status Whiteboard|awaiting UA experience      |

--- Comment #8 from Henri Sivonen <hsivonen@iki.fi> 2011-11-24 15:06:06 UTC ---
Gecko does the following for HTML in XHR:
 * If there's a HTTP-level charset, use that.
 * Otherwise, if there's a BOM, use that.
 * Otherwise, run the prescan algorithm over the first 1024 and use the result
if there is one.
 * Otherwise, use UTF-8.

So the spec options that the XHR spec needs to flip in the HTML spec are:
 * Use the prescan up to exactly 1024 bytes without a timeout. (FWIW, I think
this should always be done.)
 * Turn off the honoring of tree builder-discovered metas.
 * Turn off heuristic detection.
 * Clamp the last resort encoding to UTF-8 instead of a user-defined encoding.

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Thursday, 24 November 2011 15:06:13 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 24 November 2011 15:06:18 GMT