Re: Microdata Issues [was Microdata design philosophies] from Philip Jägenstedt on 2009-10-18 (public-html@w3.org from October 2009)

From: Philip Jägenstedt <philipj@opera.com>
Date: Sun, 18 Oct 2009 12:22:42 +0200
To: martin@weborganics.co.uk, "Tab Atkins Jr." <jackalmage@gmail.com>
Cc: "Leif Halvard Silli" <xn--mlform-iua@xn--mlform-iua.no>, "Ian Hickson" <ian@hixie.ch>, public-html@w3.org
Message-ID: <op.u1zq34bjsr6mfa@worf>
On Sun, 18 Oct 2009 00:52:56 +0200, Martin McEvoy  
<martin@weborganics.co.uk> wrote:

> Martin McEvoy wrote:
>> Tab Atkins Jr. wrote:
>>> On Fri, Oct 16, 2009 at 4:19 PM, Martin McEvoy  
>>> <martin@weborganics.co.uk> wrote:
>>>
>>>> look at this example:
>>>>
>>>> http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#the-basic-syntax  
>>>> <div itemscope id="amanda"><itemref refid="a"><itemref  
>>>> refid="b"></div>
>>>> <p id="a">Name: <span itemprop="name">Amanda</span></p>
>>>> <div id="b" itemprop="band" itemscope id="jazzband"><itemref
>>>> refid="c"></div>
>>>> <div id="c">
>>>> <p>Band: <span itemprop="name">Jazz Band</span></p>
>>>> <p>Size: <span itemprop="size">12</span> players</p>
>>>> </div>
>>>>
>>>>
>>>> What is the above example trying to attempt?
>>>>
>>>
>>> It's marking up someone's participation in some band, apparently.
>>>
>>
>> Really if you say so....
>>>
>>>> What does itemscope mean?
>>>>
>>>
>>> Have you read the Microdata section?
>>
>> Of course I have...
>>
>>> @itemscope says "This chunk of
>>> html defines a chunk of microdata."  It scopes any children of the
>>> element to be part of that parent item (rather than being just random
>>> unconnected bits of data).
>>>
>>
>> And you want me to tell that to my students?  or anyone else for that  
>> matter.
>>
>>>
>>>> look at those funny little bits of mark up <itemref refid="a"><itemref
>>>> refid="b">, do itemref and refid confuse you? again what do they mean?
>>>>
>>>
>>> Again, have you read the Microdata section?
>>
>> Again yes I have...
>>> <itemref> allows you to
>>> include data from elements that aren't children of the @itemscope.
>>>
>>
>> kind of like the include pattern in microformats would you say?
>>
>>>> Look at every bit of content for example <span  
>>>> itemprop="size">12</span>,
>>>> what does size mean or band or any of the attribute contents?
>>>> How Is a newcomer to HTML or the semantic web going to make of all  
>>>> that?
>>>> Does the above seem a little much just to mark up around 18  
>>>> characters of
>>>> data?
>>>> Do you think a search engine will understand the above example,  
>>>> knowing that
>>>> they cant reason like humans.
>>>>
>>>
>>> It's some example vocabulary used to illustrate the principls.
>>>
>>
>> An example that may get copied and pasted around the internet...
>>> Assume, for a moment, that a similar vocabulary existed in RDF, and
>>> the example was instead marked up in RDFa.
>>>
>>> How is a newcomer to HTML or the semantic web going to make of all  
>>> that RDFa?
>>> Doesn't the RDFa seem a bit much just to mark up around 18 characters  
>>> of data?
>>> Do you think a search engine would understand the RDFa, knowing that
>>> they can't reason like humans?
>>>
>>
>> Well at least you have a chance with either microformats or RDFa.
>>
>> You still didn't answer my question...
>>> All of these concerns you have are *exactly* applicable to RDFa, or
>>> really *any* method of marking up metadata in a page (such as CRDF,
>>> GRDDL, etc.).
>>>
>>
>> Thank you for that last paragraph I'm glad you worked that one out,  
>> microdata doesn't actually solve any problems does it?
> Tab, After answering my questions with other questions you could have  
> made this point...
>
> "The examples in the previous section show how information could be  
> marked up on a page that doesn't expect its microdata to be re-used"
> http://dev.w3.org/html5/spec/Overview.html#typed-items
>
> Pardon?
>
> You mean I have to go through all that, (see the example at the top of  
> this email)  thinking I am embedding some real semantics using some  
> pretty fancy attributes and elements, that really has no semantic value  
> outside my own website?

Is there anything in the spec up to that point that leads you to think  
that the data would have any defined semantics outside of your own  
website? It takes actual effort to make data inter-operable, just using a  
particular markup cannot automatically create semantic value.

> Why cant I just use good semantic class names?

If you happen to already have appropriate class names and the structure is  
simple enough, sure. However, for something more complex, the microdata  
DOM API is useful in that it allows scripts to read and modify the data of  
a document with a relatively simple syntax. Otherwise you'd have to write  
many helper functions or wrapper objects to hide the underlying DOM tree.  
In any event, there's no reason to require that all microdata markup us an  
external vocabulary when the same syntax is perfectly good for  
site-specific uses like this.

> Here is another example of microdata "falling over" quite badly from:  
> http://dev.w3.org/html5/spec/Overview.html#typed-items again...
>
> <section itemscope itemtype="http://example.org/animals#cat">
>  <h1 itemprop="name">Hedral</h1>
>  <p itemprop="desc">Hedral is a male american domestic shorthair, with a  
> fluffy black fur with white paws and belly.</p>
>   <img itemprop="img" src="hedral.jpeg" alt="" title="Hedral, age 18  
> months">
> </section>
>
> Apart from the *obvious* indirection mechanism the spec says this about  
> the above...
>
> "In this example the "http://example.org/animals#cat" item has three  
> properties,  a "name" ("Hedral"),  a "desc" ("Hedral is..."),  and an  
> "img" ("hedral.jpeg")."
>
> ok we will take that as said, how about If I add some more microdata to  
> that, just some plain vanilla stuff that I want to use for my website  
> that I don't expect to re-used elsewhere....
>
> <section itemscope itemtype="http://example.org/animals#cat">
>  <h1 itemprop="name title">Hedral</h1>
>  <p itemprop="desc">Hedral is a male american domestic shorthair, with a  
> fluffy black fur with white paws and belly.</p>
>   <img itemprop="img my-cat" src="hedral.jpeg" alt="" title="Hedral, age  
> 18 months">
> </section>
>
> Good eh? notice there is a "title" and my picture is called "my-cat"
>
> According to the description of the original markup my item now has  
> *five* properties, but only three of them are part of  the vocabulary  
> defined at http://example.org/animals#cat, how does a parser tell which  
> property is part of my cat vocabulary and which is not? its not clear is  
> it?

http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#names:-the-itemprop-attribute

"If the item is a typed item" then each itemprop token must be "a defined  
property name allowed in this situation according to the specification  
that defines the relevant type for the item" In other words, the above  
markup is non-conforming. Validators of this vocabulary should give errors  
for properties that aren't part of it. Any sane parser of said vocabulary,  
however, would ignore properties it does not recognize, as doing otherwise  
is guaranteed to fail as soon as someone makes the mistake above or when  
the vocabulary is updated.

For a generic microdata parser the processing is very clear:

http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#associating-names-with-items

It simply extracts the properties, itemtype is not involved in any way.

-- 
Philip Jägenstedt
Opera Software
Received on Sunday, 18 October 2009 10:21:26 UTC