Re: HTML Data Guide draft from Ivan Herman on 2011-12-20 (public-html-data-tf@w3.org from December 2011)

From: Ivan Herman <ivan@w3.org>
Date: Tue, 20 Dec 2011 09:18:14 +0100
To: Tantek Çelik <tantek@cs.stanford.edu>
Cc: Jeni Tennison <jeni@jenitennison.com>, HTML Data Task Force WG <public-html-data-tf@w3.org>
Message-Id: <ADB19F02-0BE9-4B92-A1D2-2F69BB580F5E@w3.org>
On Dec 20, 2011, at 02:55 , Tantek Çelik wrote:

> On Mon, Dec 19, 2011 at 03:45, Ivan Herman <ivan@w3.org> wrote:
>> Jeni,
>> 
>> I am beginning to get out of my post-surgery torpor, although only very slowly.
> 
> Best wishes for a speedy recovery Ivan!

Thanks!

> 
> 
>> But I did go through the document today, and have noted a bunch of comments, see them below. There is no priority order, I just made the notes as I read. I am sure there are more issues coming up, but this may be a good start.
>> 
>> And thanks...
>> 
>> Ivan
>> 
>> In the introduction: I wonder whether it is worth emphasizing that adding structured data to HTML was also pioneered by the Semantic Web community in an attempt to bridge the world of documents and of data on the web. Many data publishers use their HTML pages as an alternative syntax to publish their data (using RDFa); examples include the authorities' pages of the Library of Congress, or DBPedia. (Yes, I know, I am biased!)
> 
> Ivan, I'm not sure what you mean by "pioneered" in this context.

Tantek, you are right, it was not the right choice of words. In a sense, my reaction was more to the fact that the current text has a very strong bias towards the search engine usage. And the usage of HTML as a vehicle to bind to Linked Data has been a strong motivation in the past few years. That is all...

[snip]

>> 
>> Actually, if we go there, it may be worth emphasizing that microformats are language agnostic in the sense that they are bound to the usage of CSS only (ie, microformats could be used in SVG, too).
> 
> To be clear, microformats make use of the *HTML* class attribute, but
> can work directly in any XML language with a class attribute (like
> SVG, MathML), or XML without class attributes by embedding XHTML.
> 

Just out of curiosity (nothing to do with the current issues): does the microformat definition strictly entail that it must be a class attribute of HTML? I always considered to be more liberal and workable, as you say, with MathML and SVG out of the box...


> 
>> In the introduction: it may be worth emphasizing that microdata has been defined for HTML5 only. Ie, if authors care about validation, they should not used microdata with, say, XHTML 1.0.
>> 
>> I realise you come back to this issue in section 2, but the introduction really sets the tone...
>> 
>> -----
>> 
>> 1.1 Scope: is it correct that one can add RDF/XML into a script element? I must admit I have never seen that in practice and is fairly problematic in XML point of view (unless CDATA is used). I think it would be safer not to refer to that.
> 
> Yeah this has been tried in the past (at least with XML in the defunct
> StructuredBlogging effort) and is should be considered an obsolete
> technique not worthy of an intro.

I agree. 

> 
> If you're looking to document dead-end web data efforts for historical
> purposes, there's plenty more to add to the list, e.g.
> StructuredBlogging, Google Base, and most recently CommonTag.
> 
> 
>> Section 2.1.1: I am not sure we should list it here, but maybe: a particular issue with microformats is that the vocabularies use the @class attribute value.
>> 
>> This may clash with the class attribute value as used in the CSS files that are, eg, part of a corporate publishing environment.
> 
> This is a known documented issue for microformats-1 style microformats:
> 
> http://microformats.org/wiki/microformats-issues#class-collisions
> 
> Is it the purpose of this document to list issues with all approaches?
> If so, there's plenty more.
> 
> 

Good question. I let Jeni decide on that, as editor of the document. You are right that we have to avoid giving the impression of arbitrary choices.

> 
>> Authors should carefully check this before they decide to use microformats, otherwise they are in for major surprises in the way their page will be rendered...
> 
> Ivan, do you know of specific examples where this has occurred?
> 
> Please provide them so I can add them to the documentation of the
> issue on our microformats wiki.
> 
> Absent such documentation, I don't see why this would be worthy of a
> warning in this document.

Honestly: not in my personal experience. I have not done any extensive search either. However, interestingly, this is one of the issue Google claims to have ran into for rich snippets. Yes, this is what the US law practice calls 'hearsay', ie, you can object to it:-)

> 
> 
>> An additional issue is that microformats may require, say, the usage of the <abbr> element and this may clash with accessibility considerations of a particular publishing environment...
> 
> This issue has been long since (2+ years) addressed and resolved with
> the microformats value-class-pattern.
> 
> http://microformats.org/wiki/value-class-pattern
> 
> 

.... which may be worth noting then? (Not sure where the borderline is.)


>> Still on 2.1.1: If the publishers wants to make use of Linked Data, for example, by making it easy/possible to link the data in the page to other linked data vocabularies easily then, probably, RDFa is a much better choice, simply because it is inherently bound to RDF.
> 
> I think this merits explanation, and I don't accept "inherently bound
> to RDF" as a good argument here.
> 
> As long as a syntax can produce properties and data bound in URLs
> (which I believe is possible with all the current syntaxes being
> discussed in this TF), isn't that all that's necessary for Linked
> Data?
> 

Well, I do not want to go into a long discussion on what Linked Data is; this is not the right place. Yes, URI-s are important. But, to be a little bit more orthodox at this point, the issue is whether the data in a given vocabulary can or cannot be mapped onto RDF through a clearly specified manner. While this is not an issue for RDFa, there are issues, essentially per vocabulary, both for microformats or for microdata. AFAIK, and you may correct me, not all microformats vocabularies give clear definitions on what URI-s to use, for example; and the microdata vocabulary specification requirements are even more convoluted in this respect. Ie, at least for some vocabularies, there might be issues that the authors should know about.

Also, there is a very dynamic scene on the linked data world in the definition of various vocabularies. Some of those (bibliography, music, dublin core, to take just three) have a clear usage within an HTML page, too. Those can be used directly from within RDFa, it is a bit complicated to do it with other syntaxes (even if the vocabulary itself is simple, like DC) because it needs an extra mapping phase to those syntaxes. 

> 
>> Still on 2.1.1: what about datatypes? If a vocabulary requires the usage of datatypes then... well, RDFa is the only one handling that. That being said, it may be a very special case that we may not want to address here (and I know it is addressed elsewhere.)
> 
> What do you mean for a vocabulary to "require the usage of datatypes"?

That the range of a property is defined to be a of a given datatype.

> 
> In microformats experience, we've never found a need to "require" a datatype.

You are right that the word 'required' is wrong. What I meant is what is above.

I just picked one vocabulary, that has clearly its possible role on a web page, namely the music ontology:

http://www.musicontology.com/

that refers to xsd:int and others for a number of its properties.

Of course, it is perfectly possible, from a SW point of view, to produce data that does _not_ explicitly defines, say, a <span>10</span>, to be an integer; after all, a range specification is a license to infer and not a restriction. But practice is that authors may want to reinforce this using the datatype. If that is intended then, well, we may have a problem.

> 
> On the contrary, the more such requirements were attempted, the higher
> the barrier, or the lower the data quality as authors get it wrong.
> 
> If anything, we should say something like:
> 
> "Avoid vocabularies that require the use of datatypes."

I think such a statement is too restrictive, and I would not agree with that. See my example above. 'Watch out' instead of 'Avoid' might be better... 

[snip]

> 
> 
>> On Dec 11, 2011, at 18:22 , Jeni Tennison wrote:
>> 
>>> Hi,
>>> 
>>> I've pulled together much of the documentation from our wiki into a single document, at:
>>> 
>>>  https://dvcs.w3.org/hg/htmldata/raw-file/default/html-data-guide/index.html
>>> 
>>> Please take the time to read this as it is the main product of this Task Force, and raise any comments here.
> 
> 
> Jeni, given the extensive feedback from Ivan (and the comments I and
> myself have made), the one general item of feedback I'd say is:
> 
> Perhaps it's better to just keep this document on the wiki for now, to
> enable/encourage more collaborative iteration/updating - especially
> with minor fixes.
> 
> hg is just less accessible/usable than MediaWiki.

A bit. It does require hg knowledge, that is:-)

> 
> If you're looking to generate something that looks like a W3C note,
> perhaps it would be better to simply auto-generate such a note from a
> specific wiki page.

I am not sure we have tools to do that (would be good). Ie, I am a little bit afraid that this would mean an extra load on Jeni at the end...

Thanks!

Ivan



> After all, most of the semantics should directly
> map right?
> 
> Thanks,
> 

> Tantek
> 
> -- 
> http://tantek.com/ - I made an HTML5 tutorial! http://tantek.com/html5


----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
FOAF: http://www.ivan-herman.net/foaf.rdf
Received on Tuesday, 20 December 2011 08:23:29 UTC