W3C home > Mailing lists > Public > public-html-data-tf@w3.org > November 2011

Re: Datatypes (Was: Re: Consumer guidance)

From: Ivan Herman <ivan@w3.org>
Date: Wed, 23 Nov 2011 17:15:11 +0100
Cc: HTML Data Task Force WG <public-html-data-tf@w3.org>
Message-Id: <13A1EE6A-1D92-4922-8BD2-FEF521BED422@w3.org>
To: Jeni Tennison <jeni@jenitennison.com>

On Nov 23, 2011, at 15:53 , Jeni Tennison wrote:

> 
> On 23 Nov 2011, at 09:48, Ivan Herman wrote:
>> On Nov 22, 2011, at 22:48 , Jeni Tennison wrote:
>>> As far as I can see, the only time the ability to annotate values with datatypes makes a difference is if the type of the value of a property cannot be inferred from the property and the syntax of the value. Personally, I've been convinced that vocabularies in which that's the case are hard to use and likely to lead to bad data.
>> 
>> If you refer to an automatic inference of type, I tend to agree with you.
> 
> No, I mean that vocabularies that have properties where the type of the value of the property cannot be automatically inferred by the syntax of the value are badly designed.

Ah. I certainly agree with that statement.


> 
> But you are right that if you do have a vocabulary like that then you need to have a syntax that enables you to label values with datatypes, which means using RDFa.
> 
>> What it means for publishers is that if the data and its consumption is dependent on datatypes (or at least would be significantly better using them) then RDFa is a better choice which provides a clear typing facility. (The only exception may be the <time> element.) 
> 
> 
> I think that this is an area where there is some deep disagreement. 
> 

Wow! At last we deeply disagree on something!! Let us get a good fight! :-)

Wait... we may not disagree so deeply... Sigh...


> On one hand, the argument is that if publishers are given the ability to label the datatypes of data in their pages then consumers can do something useful with it even when they don't know the vocabulary. For example, items that have some property where all the values are numeric can be sorted numerically without a processor knowing whether the items are products and the numbers are prices or the items are people and the numbers are IQs or whatever.
> 

Right, that is what I meant.

> On the other hand, the argument is that useful consumers always have built-in knowledge about the vocabulary that they understand, so they know what datatypes to expect for each property. Given that, relying on publishers to supply a datatype for each value is problematic because (a) they might get it wrong, by assigning an incorrect datatype or no datatype at all, so a consumer always has to fix up those mistakes anyway and (b) it gives publishers more work to do when we want to make their lives easy.
> 

Yeah... but that means you have to have vocabulary aware processors all the way down. If I push the data down a datatype aware inference engine, for example, (say, Pellet) then somebody has to fill in the missing bits. I understand that it some cases that is possible, of course. And it may require a not-always-obvious interpretation of the data

> I think we can probably square this by saying quite near the beginning of the Publisher guidance something like:
> 
>  Most consumers of HTML data will only recognise particular vocabularies
>  that cover the information that they are interested in. Generic
>  consumers perform operations don't require up-front knowledge of the
>  vocabulary, either by using only the information available in the page
>  (in particular datatype information) or by fetching a machine-readable 
>  representation of the vocabulary in order to display things like labels
>  or explanatory text. This second form of consumer is only supported by
>  RDFa -- both microdata and microformats assume that consumers will have
>  built-in knowledge of the vocabulary that they are consuming.
> 

Yes, I can agree with that.

> We can reiterate that in talking about syntax considerations.
> 
> We can also make a similar point on the consumption side: that generic consumers can only pick up information from RDFa.
> 
> And we can bring out the guidance on the vocabulary side about not making vocabularies where the datatype of a value can't be determined from the property and its syntax.
> 
> Does that sound right?

Yes. Well, we will have to find another topic to have a fight on...

Cheers

Ivan


> 
> Jeni
> -- 
> Jeni Tennison
> http://www.jenitennison.com
> 
> 


----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
FOAF: http://www.ivan-herman.net/foaf.rdf
Received on Wednesday, 23 November 2011 16:12:13 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 23 November 2011 16:12:16 GMT