Re: Consumer guidance from Jeni Tennison on 2011-11-23 (public-html-data-tf@w3.org from November 2011)

From: Jeni Tennison <jeni@jenitennison.com>
Date: Wed, 23 Nov 2011 14:16:37 +0000
To: Ivan Herman <ivan@w3.org>
Cc: HTML Data Task Force WG <public-html-data-tf@w3.org>
Message-Id: <CCB118C6-25C5-4ED0-8915-9208194DF183@jenitennison.com>
Ivan,

On 23 Nov 2011, at 09:48, Ivan Herman wrote:
> On Nov 22, 2011, at 22:48 , Jeni Tennison wrote:
>> On 22 Nov 2011, at 10:05, Ivan Herman wrote:
>>> I miss some other factors that may have to be listed as part of the publishing/consuming decision.
>>> 
>>> - Are you bound to one vocabulary or more. If only one, I guess RDFa/md/mf provide a more or less equal environment in this respect; but if you are bound to several vocabularies (now or in future) within the content, eg, to combined it with Linked Data, RDFa is much more appropriate
>> 
>> Are you talking about for publishing or consuming?
>> 
>> For publishing, the page
>> 
>> http://www.w3.org/wiki/Mixing_HTML_Data_Formats
> 
> Ah, my bad. It is just a bit unclear to me how these different wiki pages will end up in one? two? more? documents as a report…

I plan to consolidate them into a ReSpec document once it seems people are more or less happy with their thrust.

>> talks about the mechanics of using multiple vocabularies. I'm not sure what to add there? Perhaps something more in the part in the page
>> 
>> http://www.w3.org/wiki/Choosing_an_HTML_Data_Format#Publishing_in_Multiple_Formats
>> 
>> like:
>> 
>> If your target consumers will all accept the same syntax, it is usually
>> easiest to use that single syntax in your pages. However, microdata does
>> not support multiple types for a single entity, so if your target 
>> consumers expect different vocabularies to be used for the same entities 
>> you may find it easier to mix syntaxes or use RDFa or microformats, which
>> do support multiple vocabularies.
>> 
>> Would that address your concern?
> 
> Yes.

OK, I've made that change.

>> On the consuming side, perhaps it's worth adding something like:
>> 
>> While adopting existing vocabularies is generally a good idea, be aware
>> that it can be hard for publishers to use multiple vocabularies to
>> describe a single entity, particularly if they use microdata to do so.
>> It will generally work best to consume a single base vocabulary on top
>> of which you understand additional properties.
>> 
>> I don't know if that's along the lines you were thinking?
> 
> Hm. I am not 100% sure I understand what this means... I would rather say something along the lines that the consumer should be prepared to the fact that the published material may include references to other vocabularies (typing, predicates, etc) that the consumer does not necessarily know about. In such a case, the consumer should ignore those references but should by no means influence consuming vocabulary items that it understands. This is, for example, a very important aspect of schema.org that was not made clear at the initial announcement: consumer may safely mix schema.org and, say, good relations terms for the same resource; schema.org will just pick its own terms out of the structure and live happily with that.

OK, the document does say that within

  http://www.w3.org/wiki/Choosing_an_HTML_Data_Format#Good_Consumption_Practice

Perhaps that needs to be more prominent. Where would you recommend moving it to?

>>> - I am not sure you want to raise the datatype issue, but there are again differences there that may influence the publishing and consuming choices
>> 
>> OK, I think that probably comes under vocabulary design. As far as I can see, the only time the ability to annotate values with datatypes makes a difference is if the type of the value of a property cannot be inferred from the property and the syntax of the value. Personally, I've been convinced that vocabularies in which that's the case are hard to use and likely to lead to bad data.
> 
> If you refer to an automatic inference of type, I tend to agree with you. What it means for publishers is that if the data and its consumption is dependent on datatypes (or at least would be significantly better using them) then RDFa is a better choice which provides a clear typing facility. (The only exception may be the <time> element.) 

Let me pull this out in a separate thread.

>> I think that's a good thing to mention in the vocabulary design page. I'll have a go at some wording...
>> 
>>> - If you rely on javascripting together with the structured data, there are again differences: microformats, as far as I know (may be wrong!) does not have a dedicated API; microdata has that as part of its definition; RDFa has some drafts around but they are not on the same level of maturity as their counterpart in microdata. A somewhat similar issue is the access to the data in json.
>> 
>> Yes. The section on Tooling Considerations at
>> 
>> http://www.w3.org/wiki/Choosing_an_HTML_Data_Format#Tooling_Considerations
>> 
>> is meant to cover that, but of course it's hard to give general advice there both because we can't list all available tools and because the tooling landscape changes so rapidly.
> 
> Sure. And listing explicit tools and libraries would not really be a good idea. 
> 
> But I think making it clear that, at present at least, only microdata has an API as part of its specification is worth mentioning; that is important if developers want to use, say, Javascript (although, at this moment, I am not sure any of the browsers implement this API). We could/should also mention that similar work is considered for the RDFa landscape, but its maturity (as of now) is not on the same level as microdata.  


OK, I added some pointers at

  http://www.w3.org/wiki/Choosing_an_HTML_Data_Format#Tooling_Considerations

Could you take a look and see if that's enough?

Thanks,

Jeni
-- 
Jeni Tennison
http://www.jenitennison.com
Received on Wednesday, 23 November 2011 14:17:12 UTC