- From: Skrol29 <skrol29forum+whatwg@gmail.com>
- Date: Fri, 25 Jun 2010 11:46:12 +0200
On 24 Jun 2010, at 14:11, Benjamin M. Schwartz wrote: >>> Why would it simplify parsing? >> It greatly simplifies parsing when you just want to extract entire >> tags, without immediately parsing the attributes. >If you mean "parsing" with regular expressions, then I think that's a bad practice and shouldn't be encouraged. A agree disallowing ">" chars in attributes greatly simplifies parsing. Not only with regular expressions, but any parsing. If ">" are allowed, it means that in order to found the end of the element you do have to read all attributes before. This is very costy. Just an example but they are many others: let's image you'd like to convert an HTML document into flat text. To simplify you're algorithm you've chosen to retrieve the content of the <body> element and then to delete all elements in it. This is very fast if ">" are not allowed in attributes because you're able found elements bounds just by searching "<" and then ">". But if ">" are allowed, the operation gets much more complicated, and you spend much more time to scan all elements. In my opinion, the gain of allowing ">" is so poor regarding to the troubles it makes, that it should be forbidden in both XML and HTML (any version). > Also take into consideration that even if ">" was forbidden in the > spec, it wouldn't mean it doesn't happen in the wild. Since it works in browsers, you'd still have to support it if you wanted to parse markup from the web. Allowing it in the spec and how the browser should behave if it is anyway are two different things. Regards, Skrol29
Received on Friday, 25 June 2010 02:46:12 UTC