[whatwg] Microdata feedback

On Wed, 8 Jun 2011, Tomasz Jamroszczak wrote:
> 
> I've been looking into Microdata specification and it struck me, that 
> crawling algorithm is so complex, when it comes to expressing simple 
> ideas.  I think that foremost the algorithm should be described in the 
> specification with explanation what it's supposed to do, before steps of 
> what exactly is to be done are written.

Yeah. Turns out the algorithms involved here are quite badly broken.

It was intended to expose the microdata graph as completely as possible 
while dropping anything that would introduce a loop, at the point where 
the first repetition would start (so A->B->C=>A would break at the =), 
in the API, in the JSON, and in the conformance rules. I didn't do a good 
job speccing that, though!

I've fixed the algorithms to make sense (I hope).


> Let's see, what are the properties of Microdata item from HTML element 
> with id=up from following HTML:
> 
> <div itemscope id=up itemprop=prop0>
>   <div itemscope id=down itemprop=prop1 itemref="up"></div>
> </div>

The element id=up has one property, prop1, whose value is an item on the 
element id=down. The element id=down has one property, prop0, whose value 
is the item on the element with id=up. If you crawl from id=up, my intent 
was to have the prop0 be dropped from the graph. If you crawl from 
id=down, my intent was to have prop1 be dropped from the graph. In 
addition, the document is intended to be non-conforming. If you serialise 
it for JSON, my intent was for the item on id=up to be the "top" one, and 
for it to have one property whose value is the item on id=down, which 
would itself have no values.

Note that the above would be non-conforming on its own because there are 
no top-level microdata items in the above snippet.


> I can imagine good usages of loops of Microdata items, for example "John 
> knows Amy, Amy knows John":
> 
> <div itemscope id="john" itemprop>
>   <div itemprop="friends" itemref="fred1 jenny2 amy1"></div>
> </div>
>
> <div itemscope id="amy1" itemprop>
>   <div itemprop="friends" itemref="john"></div>
> </div>
> 
> There's loop:  jonh->amy1->john->... .

itemref="" doesn't reference items for property values. It just references 
an element to get a list of properties for an item.

The example above is non-conforming because itemref="" can only be 
specified on an itemscope="" element, itemprop="" is not value without a 
value, and there's no top-level items.

The right way to do what you describe above is (provided the vocabulary 
is defined in a way that supports this):

 <div itemscope itemid="http://example.com/john" itemtype="...">
   <meta itemprop="friends"
         content="http://example.com/fred1 http://example.com/jenny2 http://example.com/amy1">
 </div>

 <div itemscope itemid="http://example.com/amy1" itemtype="...">
   <meta itemprop="friends"
         content="http://example.com/john">
 </div>


> If the loop is to be excluded, and thus recursion, the same data could 
> be written as:
> 
> <div itemscope>
>   <div itemprop=addressbook_id>1</div>
>   <div itemprop=name>John</div>
>   <div itemprop=knows>2</div>
> </div>
>
> <div itemscope>
>   <div itemprop=addressbook_id>2</div>
>   <div itemprop=name>Amy</div>
>   <div itemprop=knows>1</div>
> </div>.

That's another way to do it, yes.


> maybe with some <meta> instead of <div> or more verbosely:
> 
> <p itemscope itemid="#john" id="#john">John knows <a 
> itemprop="http://xmlns.com/foaf/0.1/knows" href="#amy">Amy</a>.</p>
>
> <p itemscope itemid="#amy" id="#amy">Amy knows <a 
> itemprop="http://xmlns.com/foaf/0.1/knows" href="#john">John</a>.</p>

That works too.


> The problem I'm addressing revolves around meaning of link between 
> itemref and id attributes.  Is it meant to be a part of Microdata data 
> model?

No, it's just syntactic sugar to allow pages to use microdata without 
having to twist their markup into a pretzel to make it work.


> Or maybe it is introduced to cope with the fact that Microdata graph is 
> defined on top of existing data, which is something completely 
> different, and is meant to be rendered to the user (that is on top of 
> HTML tree)?

Right.


> So the meaning of itemref attribute should also hint interpretation of 
> it inside the specification.

Done.


On Fri, 10 Jun 2011, Philip J?genstedt wrote:
> 
> I don't think the spec needs to be giving suggestions for efficient 
> implementation for live collections, because we inevitable won't 
> implement exactly that algorithm anyway.

The aim wasn't to give suggestions for efficient implementations. The aim 
was to give algorithms for which an efficient implementation existed, 
rather than requiring something nigh on impossible to implement 
efficiently. The aim wasn't reached, though, in that the algorithm in the 
spec was just completely bogus. Sorry about that.


On Tue, 28 Jun 2011, Tomasz Jamroszczak wrote:
> 
> For sure itemRef attribute of Microdata have to stay, because it makes 
> possible separation of data (the Microdata item properties, the 
> semantics) and view (where contents of those properties should be laid 
> out for browser user). Without itemRef, Microdata becomes "Picodata".

That may not be all bad. :-)

You know something is done not when there's nothing new to add, but when 
there's nothing left to remove.


> But then, what to do when translating Microdata to other format, such as 
> stringification to JSON in Drag'n'drop?  The JSON itself is quite 
> primitive when it comes to stringification loops - it just throws an 
> exception.  We thought we'll be more flexible.  We'll make 
> stringification "as best as possible", and cutting only the last 
> offending link of a cycle.  See 
> http://people.opera.com/tjamroszczak/microdata/microdata-loops.png . 
> Unfortunately it means that items which belong to Microdata item loops 
> sometimes will lose properties, and it depends on from where the cycle 
> was reached (see point A1 and A2 in the image).

This was actually the intent of the spec originally, so it works out well 
that this is what you've opted for!

I've done the same in the spec.


On Wed, 29 Jun 2011, Philip J?genstedt wrote:
> 
> Note also that other algorithms defined in terms of items and their 
> properties need to handle loopiness in some way. That's currently RDF, 
> vCard and iCal conversion. Perhaps something like "loopy item" could be 
> defined and those algorithms could skip loopy items wherever they occur? 
> Simply failing is also an acceptable solution, IMO.

I fixed vCard with a patch that just outputs "AGENT;TYPE=VCARD:ERROR" in 
the case of a loop. (Can only happen if the input is non-conforming, so it 
doesn't matter if the output is non-conforming.)

The vEvent stuff was already loop-safe.

The JSON algorithm now ends the crawl when it hits a loop, and replaces 
the offending duplicate item with the string "ERROR".

The RDF algorithm preserves the loops, since doing so is possible with 
RDF. Turns out the algorithm almost did this already, looks like it was an 
oversight.



On Wed, 8 Jun 2011, Dan Brickley wrote:
> 
> Section '5.2.3 Names: the itemprop attribute' states something important 
> about Microdata's data model,
> 
> "Within an item, the properties are unordered with respect to each 
> other, except for properties with the same name, which are ordered in 
> the order they are given by the algorithm that defines the properties of 
> an item."

(Which is tree order, currently, though it wasn't always so.)


> ... and gives an example "In the following example, the "a" property
> has the values "1" and "2", in that order,  ...
>
> <div itemscope itemref="x">
>  <p itemprop="b">test</p>
>  <p itemprop="a">2</p>
> </div>
>
> <div id="x">
>  <p itemprop="a">1</p>
> </div>"
> 
> However '5.2.1 The microdata model' does not mention anything of this 
> data model feature. If property values (for some specific property/item 
> context), this should be mentioned when introducing the data model; if 
> only by copying or linking the above sentence ("Within an item, ...").

I've added a brief sentence mentioning that names aren't ordered but 
values of a name are.


> Is the expectation that Microdata vocabulary authors can decide whether 
> such ordering is meaningful, when they define / describe their 
> properties?

Yes.


> For example, in academic publishing where they care about being first 
> named author, the ordering of 'itemprop="author"' might seem to matter.

Right.


> 5.2.3 suggests that the ordering information is at least preserved in 
> Microdata's data model. If someone creates an 'author' property for 
> Microdata, should they state that property ordering is meaningful, or is 
> that not their decision?

They can state that the order is not meaningful.

It's similar to how the order of children element nodes in an XML element 
can be important or not, as defined by the vocabulary.


On Sat, 11 Jun 2011, Brett Zamir wrote:
> On 4/27/2011 9:06 PM, Benjamin Hawkes-Lewis wrote:
> > On Wed, Apr 27, 2011 at 3:54 AM, Brett Zamir<brettz9 at yahoo.com>  wrote:
> > > Thanks for the references. While this may be relevant for the likes 
> > > of blogs and other documents whose requirements for semantic density 
> > > is limited enough to allow such reshaping for practical effect and 
> > > whose content is reshapeable by the content creator (as opposed to 
> > > republishing of already completed books), for more semantically 
> > > dense content, such as the types of classical documents marked up by 
> > > TEI, it is simply not possible to expose text for each bit of 
> > > semantic information or to generate new text to meet that need. And 
> > > of course, even with microformats/microdata as it is now, the 
> > > semantic content itself is not necessarily exposed just because text 
> > > is visible on the page.
> > > 
> > > The issue of discoverability is I think more related to how it will 
> > > be consumed or may be consumed. And even if some pieces of 
> > > information are less discoverable, it does not mean that they have 
> > > no value. For such rich documents, a lot of attention is being paid 
> > > to these texts since they are themselves considered important enough 
> > > to be worth the time.
> > > 
> > > If the Declaration of Independence of the United States was marked 
> > > up with hidden information about prior emendations, their likely 
> > > reasons, etc., or about suspected authors of particular passages, or 
> > > the United Nations Declaration of Human Rights were marked up to 
> > > indicate which countries have expressed reservations 
> > > (qualifications) about which rights, while a browsing application or 
> > > query tool ought to be able (optionally) expose this hidden 
> > > information, there is no automatic need for the markup to be 
> > > polluted with extra (hidden) (and especially URI-based or other 
> > > non-textual) tags when an attribute would suffice.
> > > 
> > > For things that are truly important, there may be a great deal of 
> > > care put into building up many layers which are meant to be peeled 
> > > away, and it is worth allowing some of that information 
> > > (particularly the non-textual information, e.g., the conditions of 
> > > authorship, publisher, etc.), especially which the original 
> > > publication did not expose, to be still selectively revealed to 
> > > queries and deeper browsing.
> > > 
> > > If a site like Wikisource (the online library sister project of 
> > > Wikipedia's) would be able to offer such officially sanctioned 
> > > semantic attributes, classic texts could become enhanced in this way 
> > > over time, with the wiki exposing the hidden semantic information, 
> > > which indeed may not be as important as the visible text, but with 
> > > queries by interested to users, any problems in encoding could be 
> > > discovered just as well.
> >
> > Your email challenges the principle of visible data on four different 
> > grounds:
> > 
> > 1. You note even proponents of visible data do not always show their 
> > data. But the microformats community only endorse hidden metadata for 
> > annotating human-friendly visible data (e.g. "mercredi prochain") with 
> > a machine-readable equivalent (e.g. an ISO 8601 formatted date). They 
> > do not endorse hidden metadata without visible equivalents against 
> > which it can be cross-checked.
> > 
> > 2. You imply editorial effort can offset the error-proneness of hidden 
> > metadata. But the same extraordinary editorial effort would yield even 
> > greater accuracy if it went towards creating visible data rather than 
> > hidden metadata.
> > 
> > 3. You claim tool-assisted queries by end-users against the hidden 
> > metadata will reveal errors at the same rate as visible data. But this 
> > is doubtful, in so far as many queries will obfuscate context whereas 
> > simply reading through the text encourages serendipitous error 
> > discovery. For example, I could issue a query asking what proportion 
> > of the Declaration of Independence is suspected to be authored by John 
> > Adams. A percentage answer would not reveal the odd misattributed 
> > passage. By contrast, if I'm a scholar of the Declaration and am 
> > reading through the text and I happen to see a suspiciously 
> > Jeffersonian passage visibly attributed to John Adams, I'm much more 
> > likely to notice the error.
>
> Of course a visible attribution is helpful, but one cannot possibly 
> visibly represent all information one might wish to add, especially if 
> one does not wish to clutter the view hopelessly. Meta-data can be 
> available to searching, and if search engines don't wish to take 
> advantage of it, at least individual document queries can do so.

The question is, will enough "individual document queries" do so to make 
it worth it?

Or to put it another way, is there sufficient compelling need for a way to 
represent all the information one _might_ wish to add in a non-visible yet 
generically processable manner?

It's not clear to me that there is. What are the concrete use cases? Is 
anyone actually interested in marking up the United Nations Declaration of 
Human Rights in HTML to indicate which countries have expressed 
reservations in a manner intended to be processable by tools not 
specifically designed for that use case?

One can definitely imagine that someone might want to create an 
application specifically for marking up and exposing such information, but 
if it's a self-contained app, the data-* feature is sufficient. The 
question is, is anyone actually trying to write a generic app of this 
nature right now? More, even: are enough people doing so that we need to 
solve this problem?


> > 4. You assert that it is not viable to make multiple layers of rich 
> > data visible in a single view. I'd make the counterargument that on 
> > the web, unlike in print, it is economical to dynamically construct 
> > different views and filters of a document and its various visible data 
> > streams on the client, on the server, on the client, or on some 
> > combination of the two. The HTML5 specification itself is a great 
> > example of this. The source text is kept in a repository that stores 
> > changes to the text, along with date and rationale. Multiple views of 
> > this source text are then generated serverside: the source text is 
> > carved up into multiple draft specs for W3C and a single mammoth 
> > specification for WHATWG. The HTML spec is provided in a 
> > browser-crashing single document view and in a multipage view. On top 
> > of this, there is clientside filtering in the form of an in-page 
> > control that can produce a web author view by hiding technical text 
> > aimed at browser vendors.
> 
> Sometimes projects simply wish to make the meta-data available and let 
> consumers determine how to display it.

Which projects? Are they unable to do it today? What are their concrete 
needs and desires? What level of user engagement are they getting?


> If someone has a good idea about how to manage the display (or editing) 
> of meta-data, all power to them, but this does not mean that the 
> original document creator should be forced to create every possible use 
> when their interest and responsibility may simply be properly defining 
> the semantics in use.

It is not a goal of this effort to address the desires of people who 
simply want to mark up data without a concrete use case. If anything, it 
has been our goal to _discourage_ such authors, so as to help them focus 
their efforts on work that has an actual immediate end-user benefit, 
rather than merely a theoretical future payoff.


> In any case, the specification has allowed in-body <meta/> as you point 
> out, so hidden meta-data is thankfully available to authors.

Indeed. Also, data-*="" for in-page applications, and <script> for in-page 
data blobs, and of course it's also possible to reference external data 
files for processors.

If these are sufficient for what people want to do, then our job is done.


> > If you're keen on using the TEI vocabulary to meet the Wikisource use 
> > case, there's no particular reason why you couldn't convert Wiki 
> > markup to TEI source text, serve TEI directly over the web, and also 
> > generate various HTML views of visible rich data from the TEI (for 
> > example, with XSLT). The Perseus project uses TEI and HTML in 
> > combination a bit like that:
> > 
> > http://www.perseus.tufts.edu/hopper/text?doc=Perseus%3atext%3a1999.01.0199
>
> Thanks, but I'm not a fan of custom solutions, since, similar to the 
> "many eyes" view you are espousing for exposing meta-data visually, I 
> believe such solutions leave different semantic communities out of the 
> benefits of utilizing and contributing to general purpose solutions.

It isn't a "custom solution". As far as I can tell, it's using TEI for its 
intended purpose. It's a generic TEI solution.


> For example, I'd like TEI to be serialized such that it can take 
> advantage of tools exclusive to HTML such as WYSIWYG editors, wikis 
> which whitelist only certain elements and attributes, etc., and have the 
> TEI community engaged in enhancing the same Microdata schemas (such as 
> those detailed on http://schema.org) available to all on the web.

Why? Why not use TEI tools? Why not XML tools? Or JSON tools?

We can't possibly design HTML on the assumption that every other language 
is going to be mapped to it to use its tool. That is a completely 
unscalable solution. HTML is a generic document and application platform, 
it's not feasible for it to be everything for every purpose. It's hard 
enough to make it good for writing Web apps and docs, let alone making it 
good for everything! :-)


> > But let's say you were determined to serve up a single HTML document 
> > with lots of hidden metadata. None of microformats, microdata, and 
> > RDFa were designed to do this. But both microdata and RDFa allow you 
> > to do so in a conforming manner using the @content attribute. In 
> > WHATWG HTML, this is restricted to the "meta" element, but the "meta" 
> > element is now allowed amidst body text so it can apply to individual 
> > sections of the document, rather than just the whole document. In W3C 
> > HTML+RDFa, the @content attribute is allowed on any element.
> > 
> > In other words, where your examples currently abuse the skinning layer 
> > ("display: none") to preserve logical text flow, they should actually 
> > be using meta at content instead; there is no need for "ugly hacks" even 
> > if the markup becomes more verbose than you might like.
>
> I had not been aware of <meta/> being available in-body, thank you.
> 
> However, my item-* proposal, besides being more succinct in the case of
> attribute content, allows for targeted styling of elements which <meta/>
> currently would not.
> 
> For example, to take a water-damaged text (e.g., for the TEI element
> http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-damage.html ) which in
> TEI could be expressed as:
> 
> <damage agent="water" xmlns="http://www.tei-c.org/ns/1.0/">Some water damaged
> words</damage>
> 
> might be represented currently in Microdata as:
> 
> <span itemprop="damage" itemscope="" itemtype="http://www.tei-c.org/ns/1.0/">
> <meta itemprop="agent" content="water"/>
>     Some water damaged words
> </span>

It's not clear to me what the context of this is, but itemprop="damage" is 
probably wrong in the above unless this is intended to be in another item.


> But there is no "parent combinator" selector such that the following (also
> cumbersome) selector would work:
> 
> span[itemprop=damage] <  meta[itemprop=agent][content=water] {
>     text-shadow: 2px 2px 16px #2b2b2b;
> }

You really don't want to be using CSS on microdata, that way lies madness. 
Microdata doesn't define a mapping from its properties to the elements in 
the page. There is no semantic difference between these three lines at the 
microdata level:

   <p itemscope><span itemprop=a>foo</span></p>

   <p itemscope itemref=x>foo</p> <meta id=x itemprop=a content=foo>

   <meta id=x itemprop=a content=foo> <p itemscope itemref=x>foo</p>

So if you use CSS to style this, your styles are dependent on a 
non-semantic syntactic detail, which is very brittle.


> In my item-* proposal, it would be nicely expressed as:
> 
> <span itemprop="damage" item-agent="water" itemscope=""
> itemtype="http://www.tei-c.org/ns/1.0/">
>     Some water damaged words
> </span>
> 
> which works fairly well in CSS too:
> 
> span[itemprop=damage][item-agent=water] {
>     text-shadow: 2px 2px 16px #2b2b2b;
> }
> 
> This offers a conveniently condensed syntax, while also ensuring 
> discoverability of the prefixed Microdata attributes.
> 
> Especially as more attributes are needed (kept simple for this example), 
> it becomes easier to handle (and cleaner), even if it admittedly adds a 
> little work to crawlers to detect this different approach.

I don't really understand what this is doing.


> > Note HTML also has other extension points that are available, 
> > including dumping data in script elements,
>
> Not a standard approach and not likely to work in restricted 
> whitelisting environment.

It is a standard approach.


> > dumping data in class attributes,
>
> Suffers, as schema.org implies at http://schema.org/docs/faq.html#14 , 
> from a lack of extensibility/namespacing.

If your problem is you want to style something, then the class="" 
attribute is exactly the way to do it. It's not clear why you need 
extensibility or namespacing here if your problem is styling.


> > and mixing XHTML and other XML vocabularies in a compound document.
> 
> Suffers from a lack of support in the HTML serialization

Sure. HTML suffers from a lack of support in the XML serialization, too. 
Why is that a problem?


> and from a lack of a uniform means of discoverability.

Not sure what this means. Namespaces are quite discoverable.


> > Beware that even where a conforming hidden metadata mechanism is 
> > provided, consumers of such documents may well distrust hidden 
> > metadata that is not a machine-readable equivalent to visible data. 
> > For example, Google say:
> > 
> > "In general, Google won't display content that is not visible to the 
> > user. In other words, don't show content to users in one way, and use 
> > hidden text to mark up information separately for search engines and 
> > web applications. You should mark up the text that actually appears to 
> > your users when they visit your web pages."
> > 
> > http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=146898
> 
> I'm skeptical that this would exclude (or need to exclude) 
> namespace-aware Microdata searches since the user is clearly seeking 
> this information explicitly.

Hidden information is often out of date or wrong. There's a strong 
incentive for search tools to ignore hidden information and rely only on 
visible data.

The user doesn't want bad data, even if in practice that's what it appears 
the user is asking for.


On Mon, 13 Jun 2011, Brett Zamir wrote:
> 
> With the likes of Google offering Microdata-aware searches, I think it 
> makes a whole lot of sense to allow rich documents such as TEI ones to 
> enter as regular document citizens of the web, whereby the limited 
> resources of such specialized semantic communities can leverage the 
> general purpose and better-supported services such as Google's Microdata 
> tool, while also having their documents editable within the likes of 
> WYSIWYG HTML text editors, and stored on sites such as discussion forums 
> or wikis where only HTML may be allowed and supported.
> 
> I think such a focus would also enable the TEI community to benefit from 
> reusing search-engine-recognized schemas where available, as well as 
> helping the web community build new schemas for the unique needs of 
> encoding academic texts.

I don't understand what you're proposing here. The schema.org microdata 
stuff isn't generic, it's specific vocabularies.


On Mon, 13 Jun 2011, Tab Atkins Jr. wrote:
> 
> Additionally, while we recognize that non-visible data is sometimes 
> necessary to embed, we'd like to discourage its use as much as possible 
> (in general, non-visible data rots much faster).  One way to do that is 
> to make the syntax slightly cumbersome or ugly - when you really need 
> it, you can use it, but your aesthetic sense will keep it from being the 
> first tool you reach for.  So, making it easier or prettier to embed 
> non-visible triples is actually something we'd like to avoid if we can.

Indeed.


On Wed, 22 Jun 2011, Brett Zamir wrote:
>
> HTML could have been created without attributes too--but if one is going 
> to use it frequently enough, concision is a big selling point (as is 
> non-redundant styleability).

HTML is hardly concise. :-)


> People who are going to go to the trouble of adding semantics which do 
> nothing for visual rendering are probably going to have some idea of 
> what they are doing.

Do you have data to support this?

From what I can tell, it's not an accurate statement.


> It could be useful to a search engine. If I remembered that some text 
> was water-damaged, I could specify that I only wanted to look for 
> water-damaged text (with the TEI itemtype).

Why not just search for the English text "water-damaged"?


On Tue, 14 Jun 2011, Philip J?genstedt wrote:
>
> A question came up in the Schema.org discussion group today:
> 
> http://groups.google.com/group/schemaorg-discussion/browse_thread/thread/69b733066ae7aaaa?pli=1
> 
> The question was how to fix http://www.2gc.co.uk/a2gc-people to link 
> together properties that were in different parts of the document into a 
> single item. The answer is of course to use itemref, here simplified 
> even further to illustrate:
> 
> <div itemscope itemtype="http://schema.org/Organization">
>  <p itemprop="name">2GC Active Management</p>
>  <div class="photogrid">
>    <div class="photoitem" itemprop="employees" itemscope itemtype="http://schema.org/Person" itemref="GL">
>      <img itemprop="image" src="/images/GJGL.jpg" alt="Gavin Lawrie - Managing Director">
>      <div itemprop="name">Gavin Lawrie</div>
>      <div itemprop="jobTitle">Founder &amp; Managing Director</div>
>    </div>
>    <!-- more employees -->
>  </div>
>  <div id="bio-display" itemscope>
>    <div class="bio-text" id="GL"><dl>
>      <dt>Gavin Lawrie: Founder &amp; Managing Director</dt>
>      <dd itemprop="description">Gavin is ...</dd>
>    </dl></div>
>    <!-- more employees -->
>  </div>
> </div>
> 
> The ugly: <div id="bio-display" itemscope>. That itemscope is there only 
> to prevent the description property of the Person from applying to the 
> organization, and does so because the algorithm to crawl the properties 
> of an item stops at itemscope. This is a silly hack, because it is not 
> an item, and I don't expect many people would find this solution even if 
> they knew about the problem.

This solution is not really a solution, since it generates a new item.

I'd suggest instead putting the top-level itemscope onto a <meta> element, 
and using itemref to link to the first <p> and the class="photogrid" <div>.


> Should we have yet another property like "itemunscope" that stops the 
> crawl algorithm but does not create a new item?

It doesn't seem necessary. Just don't nest properties in items they don't 
apply to.


> Could we tweak the validity definitions so that this kind of thing would 
> cause validators to complain, or should we leave it completely to 
> vocabulary-specific validators to spot this kind of thing? (They can't 
> if they operate on the microdata level and not DOM level, which I think 
> they should.)

It's not clear exactly what we would make non-conforming.


On Sun, 26 Jun 2011, John Giannandrea wrote:
>
> In the user feedback from the schema.org proposal, which uses microdata 
> as its syntax, we have seen several use cases that would seem to require 
> multiple itemtypes per itemscope.

The type of an item is the vocabulary the item uses. It doesn't make sense 
to use more than one vocabulary, as far as I can tell.

If an item defined with a particular vocabulary belongs to several things 
(e.g. a person that is a lawyer and an engineer), then you just want a new 
property that lists the categories the item belongs to.


> We suggest that itemtype be changed to allow multiple space separated 
> types (just like itemprop), but only if the origin domain of the types 
> is the same.  This would allow a vocabulary provider to allow multiple 
> types and to take responsibility for what the property vocabulary 
> definition is in the context of more than one type.

I like this idea, but as others have pointed out, it seems bad to 
arbitrarily restrict this kind of extension to same-domain types. Better, 
IMHO, to not overload itemtype="" in this way, and to just use a new 
property, as here:

   <div itemscope itemtype="http://example.com/">
     <meta itemprop="kind" content="A B">
     ...
   </div>


On Tue, 28 Jun 2011, John Giannandrea wrote:
> 
> Cross-origin extensions can still be handled with full URIs for the 
> extension properties.

This would prevent extension vocabularies from using short names, which 
seems like a serious limitation. (Or it would require that vocabularies 
define both a short and long name for each property, which would be a 
really bad idea, IMHO. Implementations rarely handle such cases well, and 
the microdata model doesn't define the order of multiple values if they 
use different but equivalent names, for example. You should never have 
multiple names with the same semantic.)


> On Tue, Jun 28, 2011 at 6:53 PM, Ian Hickson <ian at hixie.ch> wrote:
> > 
> >   <div itemscope itemtype="http://example.com/">
> >     <meta itemprop="kind" content="A B">
> >     ...
> >   </div>
> 
> How does "kind" relate to "type"?  Would it be the case that kind 
> informs the short itemprop names also? Is this the same proposal as 
> having an itemprop="type" which means the same thing as itemtype?

The idea here is just to have one vocabulary, and then say what categories 
of things the item falls in.


On Tue, 28 Jun 2011, Lin Clark wrote:
>
> Itemtype in my example is like 'kind' in your example. You are right, it 
> is basically a privileged property.
> 
> Because the words type and kind are pretty synonymous (groups of things 
> that have common characteristics), I would think that specifying an 
> itemtype and then specifying a separate kind might be confusing to 
> users, just because of the wording.

"itemtype" gives the type of vocabulary. This is distinct from "kind" in 
this example, where it gives the categories that the item belongs to.

For example, "itemtype" could be "the schema.org vocabulary", while "kind" 
could be "person" or "organisation".


On Wed, 29 Jun 2011, Philip J?genstedt wrote:
> 
> Indeed, multiple types doesn't work at all if you want to mix different 
> types. I was assuming that the use case was to extend types, kind of 
> like http://schema.org/Person/Governor. However, it doesn't work all 
> that well even in that case, since there's no way to know which type is 
> the extension of the other and which properties exist only on the 
> extended type.

I don't really understand this use case. Can you elaborate on the problem 
that needs solving here?

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Thursday, 7 July 2011 15:33:14 UTC