- From: Philip Jägenstedt <philipj@opera.com>
- Date: Fri, 08 Jul 2011 13:37:30 +0200
On Fri, 08 Jul 2011 00:33:14 +0200, Ian Hickson <ian at hixie.ch> wrote: > On Wed, 8 Jun 2011, Tomasz Jamroszczak wrote: >> >> I've been looking into Microdata specification and it struck me, that >> crawling algorithm is so complex, when it comes to expressing simple >> ideas. I think that foremost the algorithm should be described in the >> specification with explanation what it's supposed to do, before steps of >> what exactly is to be done are written. > > Yeah. Turns out the algorithms involved here are quite badly broken. > > It was intended to expose the microdata graph as completely as possible > while dropping anything that would introduce a loop, at the point where > the first repetition would start (so A->B->C=>A would break at the =), > in the API, in the JSON, and in the conformance rules. I didn't do a good > job speccing that, though! > > I've fixed the algorithms to make sense (I hope). http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#the-properties-of-an-item I had a look at this to verify that it is black-box-equivalent to what Opera has implemented, and only discovered one issue: <div itemprop=""> should not be added to the .properties collection, because it has no properties. My bad for suggesting that the criteria should be the presence of an itemprop attribute, it should be an itemprop attribute containing at least one token. Can you update the spec to match? (I implemented the spec'd algorithm pedantically in <https://gitorious.org/microdatajs/microdatajs/commit/217cc34e7e679e2e4ea3e670a0dcdd155a7b9800> for verification, it passes the unit tests with said modification.) > On Wed, 29 Jun 2011, Philip J?genstedt wrote: >> >> Note also that other algorithms defined in terms of items and their >> properties need to handle loopiness in some way. That's currently RDF, >> vCard and iCal conversion. Perhaps something like "loopy item" could be >> defined and those algorithms could skip loopy items wherever they occur? >> Simply failing is also an acceptable solution, IMO. > > I fixed vCard with a patch that just outputs "AGENT;TYPE=VCARD:ERROR" in > the case of a loop. (Can only happen if the input is non-conforming, so > it > doesn't matter if the output is non-conforming.) WFM > The vEvent stuff was already loop-safe. > > The JSON algorithm now ends the crawl when it hits a loop, and replaces > the offending duplicate item with the string "ERROR". WFM > The RDF algorithm preserves the loops, since doing so is possible with > RDF. Turns out the algorithm almost did this already, looks like it was > an > oversight. WFM, but note step 3: "Add a mapping from the item item to the subject subject in memory, if there isn't one already." Step 1 guarantees that there is no entry for item, so step 3 can be unconditional. > On Wed, 29 Jun 2011, Philip J?genstedt wrote: >> >> Indeed, multiple types doesn't work at all if you want to mix different >> types. I was assuming that the use case was to extend types, kind of >> like http://schema.org/Person/Governor. However, it doesn't work all >> that well even in that case, since there's no way to know which type is >> the extension of the other and which properties exist only on the >> extended type. > > I don't really understand this use case. Can you elaborate on the problem > that needs solving here? It's whatever problem <http://schema.org/docs/extension.html> is trying to solve, which is something like "allow people to geek out with more specific vocabularies without interfering with search results". I whined a bit in <http://groups.google.com/group/schemaorg-discussion/browse_thread/thread/6de3a1761b115271>, the short story being: * extensibility encoded with a microsyntax in the URL, making it not-so-opaque * such URLs make the DOM API less useful Perhaps bending Microdata to accommodate for this is not the best idea. If I were schema.org, I would just encourage people to do this: <div itemscope itemtype="http://schema.org/Person"> <div id="wrapper"> <div itemprop="name">Arnold</div> <div itemscope itemtype="http://example.com/Governor" itemref="wrapper"> <div itemprop="state">California</div> </div> </div> </div> Making extensions unsightly is probably a good thing, to discourage people from going too crazy with it. This way it's also clear which properties only apply to the extended type. -- Philip J?genstedt Core Developer Opera Software
Received on Friday, 8 July 2011 04:37:30 UTC