- From: Simon Pieters <simonp@opera.com>
- Date: Tue, 09 Dec 2014 21:58:29 +0100
- To: "Ian Hickson" <ian@hixie.ch>
- Cc: whatwg@whatwg.org, Sanjoy Pal <sanjoy.pal@samsung.com>
On Mon, 08 Dec 2014 21:50:56 +0100, Simon Pieters <simonp@opera.com> wrote: > On Thu, 27 Nov 2014 01:15:20 +0100, Ian Hickson <ian@hixie.ch> wrote: > >> On Wed, 26 Nov 2014, Simon Pieters wrote: >>> >>> - Make the end tag optional and have <menuitem>, <menu> and <hr> >>> generate implied </menuitem> end tags. (Maybe other tags like <li> and >>> <p> can also imply </menuitem>.) The label attribute be honored if >>> specified, otherwise use the textContent with leading and trailing >>> whitespace trimmed. >>> >>> This would allow either syntax unless I'm missing something. >> >> That's another option, yeah. Probably the best so far if we can't just >> power through and break the sites in question. It's not yet clear to me >> how many sites we're talking about here and how possible it is to >> evaneglise them. > > In httparchive > http://bigqueri.es/t/analyzing-html-css-and-javascript-response-bodies/442 > : FTR, the numbers were slightly wrong. I didn't count top-level pages, I counted resources (including e.g. iframes). Also there is a bug in the data with duplicate entries for some pages (https://twitter.com/zcorpan/status/542363458671747072 ). > * 10101 pages use <menuitem> 8929 pages use <menuitem> SELECT page, COUNT(*) as num FROM [httparchive:runs.2014_08_15_requests_body] WHERE mimeType CONTAINS "html" AND REGEXP_MATCH(LOWER(body), r'<menuitem\s') GROUP BY page ORDER BY num desc > * 39 have no label attribute > * 0 have non-whitespace content > * 15 have no end tag > > Based on this, it seems possible to keep it as a void element and only > use the label attribute. > > > SELECT COUNT(*) as num, > CASE > WHEN REGEXP_MATCH(LOWER(body), r'<menuitem\s([^>]+\s)?label\s*=') > THEN "label present" > ELSE "no label" > END as stat > FROM [httparchive:runs.2014_08_15_requests_body] > WHERE mimeType CONTAINS "html" > AND REGEXP_MATCH(LOWER(body), r'<menuitem') > GROUP BY stat > ORDER BY num desc > > Row num stat > 1 10062 label present > 2 39 no label 8900 have label present (so 29 no label). SELECT page, COUNT(*) as num FROM [httparchive:runs.2014_08_15_requests_body] WHERE mimeType CONTAINS "html" AND REGEXP_MATCH(LOWER(body), r'<menuitem\s([^>]+\s)?label\s*=') GROUP BY page ORDER BY num desc > > SELECT COUNT(*) as num, > CASE > WHEN REGEXP_MATCH(LOWER(body), > r'<menuitem[^>]*>(\s*[^<]+)+\s*</menuitem>') THEN "has content" > ELSE "no content" > END as stat > FROM [httparchive:runs.2014_08_15_requests_body] > WHERE mimeType CONTAINS "html" > AND REGEXP_MATCH(LOWER(body), r'<menuitem') > GROUP BY stat > ORDER BY num desc > > Row num stat > 1 10101 no content > > > SELECT COUNT(*) as num, > CASE > WHEN REGEXP_MATCH(LOWER(body), r'</menuitem>') THEN "end tag" > ELSE "no end tag" > END as stat > FROM [httparchive:runs.2014_08_15_requests_body] > WHERE mimeType CONTAINS "html" > AND REGEXP_MATCH(LOWER(body), r'<menuitem') > GROUP BY stat > ORDER BY num desc > > Row num stat > 1 10086 end tag > 2 15 no end tag > -- Simon Pieters Opera Software
Received on Tuesday, 9 December 2014 20:59:31 UTC