- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Wed, 2 Apr 2008 11:47:31 +0300
- To: Ian Hickson <ian@hixie.ch>
- Cc: Sam Ruby <rubys@us.ibm.com>, Neil Soiffer <Neils@dessci.com>, public-html@w3.org, www-math@w3.org
On Apr 2, 2008, at 04:57, Ian Hickson wrote: > On Tue, 1 Apr 2008, Neil Soiffer wrote: >> >> I meant that content MathML doesn't need to be directly supported. >> However, it should be accepted as part of <annotation-xml>, where >> it is >> easily ignored. > > HTML5 today has about 110 elements. Presentational MathML has about > 30. > Content MathML has about 140. > > _Doubling_ the number of elements allowed in text/html just so that > all > those elements can be ignored seems like a fundamentally bad idea. (It > also more than doubles the number of elements that the parser has to > know > about.) [...] > Should we really be dedicating _half of the language's vocabulary_ > to such > a small use case? Devil's advocate mode on: Doubling the number of elements indeed seems like a really bad idea on the face of it, especially if the browser isn't doing anything with those elements but passing them to the clipboard for export. However, not having to do anything with those elements means that outside the parser, the per-element implementation cost is zero within the browser. For <msqrt>, you have to implement non-trivial glyph stretching in rendering. For <root>, you don't need to implement anything! Now, within the parser an efficient token interning function is what really is needed. That is, for each known element, there should be an object that has three fields: interned local name, interned namespace URI and a magic enumeration/integer representing a tree builder token treatment class for doing a switch on in the tree builder. That is, when it is time to emit a token, the the value of the current name buffer would be used to locate the corresponding interned token object. Presumably, Content MathML wouldn't even need additional magic enumeration values but could use two magic values that would already be needed for SVG and Presentation MathML: GENERIC_CONTAINER and GENERIC_VOID. The most naive but still not totally silly implementation for the token interning function would be an array sorted by local name and doing a binary search on that array. With binary search, doubling the number of known elements adds only one string comparison per tag! P.S. I'm not sure if a sorted array with binary search even makes the most efficient interning function here. I've observed that within HTML5, for a given element name length the last couple of characters in an element name are enough to prune the number of possible element candidates down to one. Therefore, a possible (generated) interning function which would use fewer virtual method invocations but would increase the code size would first do a switch on the name length, then do a switch on the last character and then on the second-last until there's one candidate and then inspect the remaining characters for a match against the single candidate. But I haven't really analyzed if this approach would beat binary search. -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
Received on Wednesday, 2 April 2008 08:48:26 UTC