Re: Microdata Issues [was Microdata design philosophies] from Martin McEvoy on 2009-10-19 (public-html@w3.org from October 2009)

From: Martin McEvoy <martin@weborganics.co.uk>
Date: Mon, 19 Oct 2009 13:13:39 +0100
To: Philip Jägenstedt <philipj@opera.com>
CC: "Tab Atkins Jr." <jackalmage@gmail.com>, Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>, Ian Hickson <ian@hixie.ch>, public-html@w3.org
Message-ID: <4ADC57F3.9020600@weborganics.co.uk>
Philip Jägenstedt wrote:
> On Sun, 18 Oct 2009 00:52:56 +0200, Martin McEvoy 
> <martin@weborganics.co.uk> wrote:
>
>> Martin McEvoy wrote:
>>> Tab Atkins Jr. wrote:
>>>> On Fri, Oct 16, 2009 at 4:19 PM, Martin McEvoy 
>>>> <martin@weborganics.co.uk> wrote:
>>>>
>>>>> look at this example:
>>>>>
>>>>> http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#the-basic-syntax 
>>>>> <div itemscope id="amanda"><itemref refid="a"><itemref 
>>>>> refid="b"></div>
>>>>> <p id="a">Name: <span itemprop="name">Amanda</span></p>
>>>>> <div id="b" itemprop="band" itemscope id="jazzband"><itemref
>>>>> refid="c"></div>
>>>>> <div id="c">
>>>>> <p>Band: <span itemprop="name">Jazz Band</span></p>
>>>>> <p>Size: <span itemprop="size">12</span> players</p>
>>>>> </div>
>>>>>
>>>>>
>>>>> What is the above example trying to attempt?
>>>>>
>>>>
>>>> It's marking up someone's participation in some band, apparently.
>>>>
>>>
>>> Really if you say so....
>>>>
>>>>> What does itemscope mean?
>>>>>
>>>>
>>>> Have you read the Microdata section?
>>>
>>> Of course I have...
>>>
>>>> @itemscope says "This chunk of
>>>> html defines a chunk of microdata."  It scopes any children of the
>>>> element to be part of that parent item (rather than being just random
>>>> unconnected bits of data).
>>>>
>>>
>>> And you want me to tell that to my students?  or anyone else for 
>>> that matter.
>>>
>>>>
>>>>> look at those funny little bits of mark up <itemref 
>>>>> refid="a"><itemref
>>>>> refid="b">, do itemref and refid confuse you? again what do they 
>>>>> mean?
>>>>>
>>>>
>>>> Again, have you read the Microdata section?
>>>
>>> Again yes I have...
>>>> <itemref> allows you to
>>>> include data from elements that aren't children of the @itemscope.
>>>>
>>>
>>> kind of like the include pattern in microformats would you say?
>>>
>>>>> Look at every bit of content for example <span 
>>>>> itemprop="size">12</span>,
>>>>> what does size mean or band or any of the attribute contents?
>>>>> How Is a newcomer to HTML or the semantic web going to make of all 
>>>>> that?
>>>>> Does the above seem a little much just to mark up around 18 
>>>>> characters of
>>>>> data?
>>>>> Do you think a search engine will understand the above example, 
>>>>> knowing that
>>>>> they cant reason like humans.
>>>>>
>>>>
>>>> It's some example vocabulary used to illustrate the principls.
>>>>
>>>
>>> An example that may get copied and pasted around the internet...
>>>> Assume, for a moment, that a similar vocabulary existed in RDF, and
>>>> the example was instead marked up in RDFa.
>>>>
>>>> How is a newcomer to HTML or the semantic web going to make of all 
>>>> that RDFa?
>>>> Doesn't the RDFa seem a bit much just to mark up around 18 
>>>> characters of data?
>>>> Do you think a search engine would understand the RDFa, knowing that
>>>> they can't reason like humans?
>>>>
>>>
>>> Well at least you have a chance with either microformats or RDFa.
>>>
>>> You still didn't answer my question...
>>>> All of these concerns you have are *exactly* applicable to RDFa, or
>>>> really *any* method of marking up metadata in a page (such as CRDF,
>>>> GRDDL, etc.).
>>>>
>>>
>>> Thank you for that last paragraph I'm glad you worked that one out, 
>>> microdata doesn't actually solve any problems does it?
>> Tab, After answering my questions with other questions you could have 
>> made this point...
>>
>> "The examples in the previous section show how information could be 
>> marked up on a page that doesn't expect its microdata to be re-used"
>> http://dev.w3.org/html5/spec/Overview.html#typed-items
>>
>> Pardon?
>>
>> You mean I have to go through all that, (see the example at the top 
>> of this email)  thinking I am embedding some real semantics using 
>> some pretty fancy attributes and elements, that really has no 
>> semantic value outside my own website?
>
> Is there anything in the spec up to that point that leads you to think 
> that the data would have any defined semantics outside of your own 
> website?

Well there was this...
http://blog.whatwg.org/this-summer-in-html-5-episode-33

"Microdata is designed to allow authors to include additional semantics 
in their pages for which there is no appropriate HTML element or attribute."

and ...

"There are a number of other technologies with goals similar to 
microdata, including microformats and RDFa."

... suggests that there may be *some* kind of common, semantics 
somewhere involved, why would Mark and Ian say those things if  in 
actual fact (from your assertion) that's not true?

> It takes actual effort to make data inter-operable, just using a 
> particular markup cannot automatically create semantic value.

Indeed ....
>
>> Why cant I just use good semantic class names?
>
> If you happen to already have appropriate class names and the 
> structure is simple enough, sure. 

Thank you ...

> However, for something more complex,

Some examples of what you view as more complex would be useful here 
perhaps? ....
again from : http://blog.whatwg.org/this-summer-in-html-5-episode-33

"HTML is not expressive enough to mark up a contact in an address book 
(complete with individual fields for name, street address, email, and 
phone number) or an event on a calendar (complete with start date, end 
date, and location). Instead of creating new elements and attributes for 
every possible vocabulary, you can use the microdata attributes to 
enhance existing elements."

Its not a very good problem statement is it? all of what has been 
described above has been solved, years ago by microformats, and later by 
RDFa.

> the microdata DOM API is useful in that it allows scripts to read and 
> modify the data of a document with a relatively simple syntax. 
> Otherwise you'd have to write many helper functions or wrapper objects 
> to hide the underlying DOM tree. In any event, there's no reason to 
> require that all microdata markup us an external vocabulary when the 
> same syntax is perfectly good for site-specific uses like this.
>

As I have said many times microdata is only for machines, and not at all 
people friendly, microdata is just a set of hooks for scripts.

>> Here is another example of microdata "falling over" quite badly from: 
>> http://dev.w3.org/html5/spec/Overview.html#typed-items again...
>>
>> <section itemscope itemtype="http://example.org/animals#cat">
>>  <h1 itemprop="name">Hedral</h1>
>>  <p itemprop="desc">Hedral is a male american domestic shorthair, 
>> with a fluffy black fur with white paws and belly.</p>
>>   <img itemprop="img" src="hedral.jpeg" alt="" title="Hedral, age 18 
>> months">
>> </section>
>>
>> Apart from the *obvious* indirection mechanism the spec says this 
>> about the above...
>>
>> "In this example the "http://example.org/animals#cat" item has three 
>> properties,  a "name" ("Hedral"),  a "desc" ("Hedral is..."),  and an 
>> "img" ("hedral.jpeg")."
>>
>> ok we will take that as said, how about If I add some more microdata 
>> to that, just some plain vanilla stuff that I want to use for my 
>> website that I don't expect to re-used elsewhere....
>>
>> <section itemscope itemtype="http://example.org/animals#cat">
>>  <h1 itemprop="name title">Hedral</h1>
>>  <p itemprop="desc">Hedral is a male american domestic shorthair, 
>> with a fluffy black fur with white paws and belly.</p>
>>   <img itemprop="img my-cat" src="hedral.jpeg" alt="" title="Hedral, 
>> age 18 months">
>> </section>
>>
>> Good eh? notice there is a "title" and my picture is called "my-cat"
>>
>> According to the description of the original markup my item now has 
>> *five* properties, but only three of them are part of  the vocabulary 
>> defined at http://example.org/animals#cat, how does a parser tell 
>> which property is part of my cat vocabulary and which is not? its not 
>> clear is it?
>
> http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#names:-the-itemprop-attribute 
>
>
> "If the item is a typed item" then each itemprop token must be "a 
> defined property name allowed in this situation according to the 
> specification that defines the relevant type for the item" In other 
> words, the above markup is non-conforming. Validators of this 
> vocabulary should give errors for properties that aren't part of it. 
> Any sane parser of said vocabulary, however, would ignore properties 
> it does not recognize, as doing otherwise is guaranteed to fail as 
> soon as someone makes the mistake above or when the vocabulary is 
> updated.
>
> For a generic microdata parser the processing is very clear:
>
> http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#associating-names-with-items 
>
>
> It simply extracts the properties, itemtype is not involved in any way.

You missed the point, How does my parser tell between attribute, is part 
of a vocabulary defined at "itemtype" and which atribute is part of a 
custom vocab made for just my use as useful script or css hooks?

My parser cant of course,  my parser would have to be re-built or 
modified every time someone wants to define a new vocab.

Microdata gets around all this by saying this....

"An item can only have one type. The type gives the context for the 
properties."
http://dev.w3.org/html5/spec/Overview.html#typed-items

The above suggest that I can only define *one* vocab?  or does is? again 
not very clear.

I will give you all some peace now, this conversation is too long and 
not very easy to keep track of because of all the "round the houses" 
answers. I dont like microdata, its an ugly syntax that doesn't answer 
any problems for either the microformats community or the RDFa 
community. If browser vendors want to Implement it, go ahead, I just 
think it a little premature, when you could add some real value to your 
browsers implementing Microformats or RDFa or both.

Microdata should drop the "micro" bit as this is dishonest because of 
the immediate association with "microformats", its not micro at all, 
maybe it should be called "machinedata" or "darkdata".

Thanks.

-- 
Martin McEvoy

http://weborganics.co.uk/

"You may find it hard to swallow the notion that anything as large and apparently inanimate as the Earth is alive."
Dr. James Lovelock, The Ages of Gaia
Received on Monday, 19 October 2009 12:14:09 UTC