Re: [whatwg] <menuitem>: Issue reported by the web developers

On Mon, 08 Dec 2014 21:50:56 +0100, Simon Pieters <simonp@opera.com> wrote:

> SELECT COUNT(*) as num,
>   CASE
>    WHEN REGEXP_MATCH(LOWER(body),  
> r'<menuitem[^>]*>(\s*[^<]+)+\s*</menuitem>') THEN "has content"
>    ELSE "no content"
>   END as stat
>  FROM [httparchive:runs.2014_08_15_requests_body]
> WHERE mimeType CONTAINS "html"
>    AND REGEXP_MATCH(LOWER(body), r'<menuitem')
> GROUP BY stat
> ORDER BY num desc
>
> Row num stat 
> 1 10101 no content 

Hixie pointed out that this doesn't catch element children. So flipping it  
gives:

SELECT COUNT(*) as num,
  CASE
   WHEN REGEXP_MATCH(LOWER(body), r'<menuitem[^>]*>\s*</menuitem>') THEN  
"no content"
   ELSE "has content"
  END as stat
 FROM [httparchive:runs.2014_08_15_requests_body]
WHERE mimeType CONTAINS "html"
   AND REGEXP_MATCH(LOWER(body), r'<menuitem')
GROUP BY stat
ORDER BY num desc

Row num stat 
1 10085 no content 
2 16 has content 

15 of these are omitting the end tag, as seen by the other query. So what  
is the last one doing?

SELECT url,body
 FROM [httparchive:runs.2014_08_15_requests_body]
WHERE mimeType CONTAINS "html"
   AND LOWER(body) CONTAINS '<menuitem'
   AND LOWER(body) CONTAINS '</menuitem'
   AND NOT REGEXP_MATCH(LOWER(body), r'<menuitem[^>]*>\s*</menuitem>')

Row url body 
1 http://www.dod.gr/lib/menuData_v483.php <menus> <!-- BOTTOM NAVIGATION  
MENU ---> <menu id="BottomNavigationMenu" type="main" x="30" y="30">  
<menuitem x="120" y="120"> <image>community.swf</image>  
<label>community</label> ...

Yep, mislabeled XML.

For completeness, the 15 pages with no end tags fall in two categories:

* for(i=0;i<menuitems.length;i++)
* <xml id=""SolpartMenuDI"" onreadystatechange=""if (this.readyState ==  
'complete') spm_initMyMenu(this,  
spm_getById('dnn_dnnMENU_ctldnnMENU'))""><root><menuitem id=""2533""  
title=""صفحه اصلی"" url=""/Default.aspx?tabid=2533"" lefthtml=""&lt;img  
alt=&quot;*&quot; BORDER=&quot;0&quot;  
src=&quot;/images/breadcrumb.gif&quot;&gt;"" css="" "" />


Previous conclusion stands. :-)

-- 
Simon Pieters
Opera Software

Received on Monday, 8 December 2014 22:47:01 UTC