- From: Simon Pieters <simonp@opera.com>
- Date: Mon, 08 Dec 2014 23:48:17 +0100
- To: "Ian Hickson" <ian@hixie.ch>
- Cc: whatwg@whatwg.org, Sanjoy Pal <sanjoy.pal@samsung.com>
On Mon, 08 Dec 2014 21:50:56 +0100, Simon Pieters <simonp@opera.com> wrote: > SELECT COUNT(*) as num, > CASE > WHEN REGEXP_MATCH(LOWER(body), > r'<menuitem[^>]*>(\s*[^<]+)+\s*</menuitem>') THEN "has content" > ELSE "no content" > END as stat > FROM [httparchive:runs.2014_08_15_requests_body] > WHERE mimeType CONTAINS "html" > AND REGEXP_MATCH(LOWER(body), r'<menuitem') > GROUP BY stat > ORDER BY num desc > > Row num stat > 1 10101 no content Hixie pointed out that this doesn't catch element children. So flipping it gives: SELECT COUNT(*) as num, CASE WHEN REGEXP_MATCH(LOWER(body), r'<menuitem[^>]*>\s*</menuitem>') THEN "no content" ELSE "has content" END as stat FROM [httparchive:runs.2014_08_15_requests_body] WHERE mimeType CONTAINS "html" AND REGEXP_MATCH(LOWER(body), r'<menuitem') GROUP BY stat ORDER BY num desc Row num stat 1 10085 no content 2 16 has content 15 of these are omitting the end tag, as seen by the other query. So what is the last one doing? SELECT url,body FROM [httparchive:runs.2014_08_15_requests_body] WHERE mimeType CONTAINS "html" AND LOWER(body) CONTAINS '<menuitem' AND LOWER(body) CONTAINS '</menuitem' AND NOT REGEXP_MATCH(LOWER(body), r'<menuitem[^>]*>\s*</menuitem>') Row url body 1 http://www.dod.gr/lib/menuData_v483.php <menus> <!-- BOTTOM NAVIGATION MENU ---> <menu id="BottomNavigationMenu" type="main" x="30" y="30"> <menuitem x="120" y="120"> <image>community.swf</image> <label>community</label> ... Yep, mislabeled XML. For completeness, the 15 pages with no end tags fall in two categories: * for(i=0;i<menuitems.length;i++) * <xml id=""SolpartMenuDI"" onreadystatechange=""if (this.readyState == 'complete') spm_initMyMenu(this, spm_getById('dnn_dnnMENU_ctldnnMENU'))""><root><menuitem id=""2533"" title=""صفحه اصلی"" url=""/Default.aspx?tabid=2533"" lefthtml=""<img alt="*" BORDER="0" src="/images/breadcrumb.gif">"" css="" "" /> Previous conclusion stands. :-) -- Simon Pieters Opera Software
Received on Monday, 8 December 2014 22:47:01 UTC