- From: Philip Jägenstedt <philipj@opera.com>
- Date: Wed, 03 Feb 2010 00:47:48 +0100
- To: "HTML WG" <public-html@w3.org>
http://dev.w3.org/html5/md/#the-properties-of-an-item The recent changes to this definition go a bit overboard in throwing away properties in order to prevent itemref loops. Any kind of duplicate element when crawling properties on the current item or any of its subitems at any level causes all properties to be thrown away. This is much more than is needed to avoid loops and not very difficult to trigger by accident where it is quite harmless: <div itemscope itemref="x"> <div id="x" itemprop="p">foo</div> </div> (easy to get if you rearrange your markup a bit after adding microdata) <div itemscope itemref="x x"></div> <div id="x" itemprop="p">foo</div> (much like duplicate class names, probably easy to get with machine-generated markup) It's possible to implement this [1], but implementations following the spec strictly would be at a disadvantage to tools that don't do full checks. With cascading errors it also makes it risky to include subitems generated by third parties (code or people) without strict validation of these (compare XML and U+FFFE). I suggest we do something closer to the bare minimum necessary to avoid loops: To crawl the properties of an item: input: top-level item, current item and memory. on first invocation, top-level item=current item and memory=[] 1. if memory is [] and current item is top-level item, it is self-referring, fail. 2. if current item is in memory, return (to stop recursion). 3. collect all itemprop'd elements in children nodes and itemref'd elements recursively into properties (stopping at itemscope) 4. remove any duplicates (these two steps can be optimized easily) 5. for each property which is an item, crawl the properties of that item with current item added to memory, top-level item unchanged, and current item=this property/item. if that fails, remove the property/item. 6. return properties. This isn't exactly how I implemented it [2] and the algorithm may have bugs, but the general idea should be clear. You only need to consider elements with itemscope="" and itemprop="". If you think of these as creating a graph, remove any properties that are part of a loop (not those that just that lead into a loop). Another somewhat sane option is ignoring all properties that lead to infinite recursion, i.e. as above but also including properties that lead into a loop. However I don't think this is a good idea as it propagates the error further than necessary and isn't really easier to implement in practice, in my experience. It's possible that this will have to be tweaked for performance after we have feedback from native browser implementations and that we will end up throwing away slightly more properties, but for now I think my suggestion above will suffice. [1] http://gitorious.org/microdatajs/microdatajs/blobs/ad56522/jquery.microdata.js#line192 [2] http://gitorious.org/microdatajs/microdatajs/blobs/d758c08/jquery.microdata.js#line192 -- Philip Jägenstedt
Received on Tuesday, 2 February 2010 23:49:07 UTC