Re: Comments on HTML Microdata, W3C Working Draft 24 June 2010 from Nathan on 2010-12-08 (public-html-comments@w3.org from December 2010)

From: Nathan <nathan@webr3.org>
Date: Wed, 08 Dec 2010 09:17:03 +0000
To: Ian Hickson <ian@hixie.ch>
CC: public-html-comments@w3.org, Thomas Baker <tbaker@tbaker.de>
Message-ID: <4CFF4D0F.1090006@webr3.org>
Ian Hickson wrote:
> On Tue, 7 Dec 2010, Nathan wrote:
>> Ian Hickson wrote:
>>> On Tue, 7 Dec 2010, Nathan wrote:
>>>> Ian Hickson wrote:
>>>>> I've used dce: and dct:, since now the example has both.
>>>> A general comment, microdata appears to be incredibly verbose for authors
>>>> when using multiple vocabularies to describe things, the example at
>>>> http://dev.w3.org/html5/md/#examples is almost painful to read, let alone
>>>> write.
>>>>
>>>> Is there no way to reduce the repetition of long URIs for properties and
>>>> types as illustrated by the Turtle equivalent in the referred to example?
>>>> Does HTML or Microdata cater for this in any way?
>>> When we did the usability studies for this we found that in practice (and
>>> much to my surprise) the verbosity had no impact on the usability of the
>>> language, so we didn't do anything to reduce it.
>> I'd love to see those results, any chance of a link to them?
> 
> I blogged about it here at the time:
> 
>    http://blog.whatwg.org/usability-testing-html5
> 
> For privacy reasons I'm not able to make the actual raw videos available, 
> but if you have any specific questions then I can try to answer them. In 
> general I would encourage people to try to reproduce these results as that 
> is the best way to check them.

I'm glad to see you did some usability testing, although a little 
surprised at the number of people and ack of variety in the tests. 
However, I'm here looking towards the future and genuinely concerned 
about data-in-html...

>>> Furthermore, in practice, most use cases for microdata don't involve 
>>> multiple vocabularies but a single vocabulary explicitly named using 
>>> itemtype="", for which the vocabulary's short names are used.
>> If I understand correctly, that's because microformats constrain 
>> vocabularies to only describing a single type of thing, and this has 
>> spilled through in to microdata thus constraining descriptions of things 
>> to only use a single vocabulary.
> 
> No, I'm talking about use cases here, not syntax. When designing 
> microdata, I collected a long list of use cases, for which it was 
> subsequently designed. The vast majority of those use cases only involve 
> one vocabulary at a time.
> 
> It may be that microdata is not designed for the same use cases that you 
> are interested in, in which case it would make sense that you would have a 
> different point of view on this.

Great, and hopefully my point of view and use-cases for using open, well 
defined vocabularies, such as dublin core and the various vocabs on 
w3.org, will be just as valid as your own and those previously tested?

Also, as far as I can tell in your initial usability tests, it was never 
assessed whether using some for of URI compacting made Microdata more 
useable, so it would probably be wise to consider that too, especially 
since millions already use it in countless other web-centric technologies.

Furthermore, I'm quite concerned that:

  - Vocabularies are encouraged not to be dereferencable, as opposed to 
being encouraged to dereference to a vocab which is both human and 
machine readable (for instance published with microdata annotations).

  - The process for creating URI identifiers for microformat properties 
is so complex ( uri + "microdata#" + urlencode(itemtype + "#:" + 
property ), that this process is hidden in specs and not well known, and 
that the description of those properties is only available in the spec, 
in plain text, and has to be hard coded. for example:
   http://www.whatwg.org/specs/web-apps/current-work/#licensing-works

  - There's no clear path between microdata and full linkeddata 
annotations, in say RDFa, indeed it uses entirely different properties 
in an entirely different way, if anything it should be a subset, or RDFa 
a superset. A single unified story on how to publish machine readable 
data in HTML.

I'm sure that there are countless people, including myself, who would be 
more than happy to look at the use cases and design requirements for 
microdata, and come up with a proposal that addressed all of these 
concerns, such that microdata+microformats complemented linkeddata+rdfa.

I feel it's very important to take the lessons learned within the 
general web development community, and the semantic web community, and 
apply them to data in HTML in order to best serve all potential 
audiences. Rather than vs, or one precluding the other, they should 
complement, whilst recognising that there are different use-cases and 
audiences, and also that audiences will need to transition between both 
depending on the use case, changing requirements + levels of 
understanding over time.

Best,

Nathan
Received on Wednesday, 8 December 2010 09:18:03 UTC