Re: HTML Data Guide draft

Jeni,

I am beginning to get out of my post-surgery torpor, although only very slowly. But I did go through the document today, and have noted a bunch of comments, see them below. There is no priority order, I just made the notes as I read. I am sure there are more issues coming up, but this may be a good start. 

And thanks...

Ivan

In the introduction: I wonder whether it is worth emphasizing that adding structured data to HTML was also pioneered by the Semantic Web community in an attempt to bridge the world of documents and of data on the web. Many data publishers use their HTML pages as an alternative syntax to publish their data (using RDFa); examples include the authorities' pages of the Library of Congress, or DBPedia. (Yes, I know, I am biased!)

-----

In the introduction: the bulleted item on RDFa. Unfortunately, one of the fallacies spread of RDFa is that it is bound to XHTML1. Which was indeed the case for RDFa 1.0, but it is not true any more for RDFa 1.1. I think it would be better to make this clear right in the intro to avoid misunderstandings. The current formulation is still misunderstandable and reinforces that view. 

What about saying: 

[[[
RDFa reuses existing HTML attributes such as @href and @rel and adds a few of its own to enable data to be extracted from HTML pages as RDF. Though it was originally designed for XHTML 1.1, its latest version (RDFa 1.1) is also usable with HTML5 and other markup languages like SVG.
]]]

Actually, if we go there, it may be worth emphasizing that microformats are language agnostic in the sense that they are bound to the usage of CSS only (ie, microformats could be used in SVG, too).

-----

In the introduction: it may be worth emphasizing that microdata has been defined for HTML5 only. Ie, if authors care about validation, they should not used microdata with, say, XHTML 1.0.

I realise you come back to this issue in section 2, but the introduction really sets the tone...

-----

1.1 Scope: is it correct that one can add RDF/XML into a script element? I must admit I have never seen that in practice and is fairly problematic in XML point of view (unless CDATA is used). I think it would be safer not to refer to that.

-----

Section 2, item on HTML5: "how exacting your publishing guidelines are" -> "how exactly your publishing guidelines are"

-----

Section 2.1.1, Structured HTML values: "emphasised" -> "emphasized" :-( I know you hate it and my schooling of English is also UK English, but the rule at W3C is that documents should use American English:-( I have not checked the whole document for that, but this one just reminded me of this issue...

-----

Section 2.1.1: I am not sure we should list it here, but maybe: a particular issue with microformats is that the vocabularies use the @class attribute value. This may clash with the class attribute value as used in the CSS files that are, eg, part of a corporate publishing environment. Authors should carefully check this before they decide to use microformats, otherwise they are in for major surprises in the way their page will be rendered... An additional issue is that microformats may require, say, the usage of the <abbr> element and this may clash with accessibility considerations of a particular publishing environment...

-----

Still on 2.1.1: If the publishers wants to make use of Linked Data, for example, by making it easy/possible to link the data in the page to other linked data vocabularies easily then, probably, RDFa is a much better choice, simply because it is inherently bound to RDF.

----

Still on 2.1.1: what about datatypes? If a vocabulary requires the usage of datatypes then... well, RDFa is the only one handling that. That being said, it may be a very special case that we may not want to address here (and I know it is addressed elsewhere.)

----

Section 2.2.1.2: the reference to the RDFa 1.1 Core initial context is wrong. It should say: http://www.w3.org/2011/rdfa-context/rdfa-1.1.html

----

Section 2.2.1.2: I would prefer not to mention @xmlns at all. In the HTML5 version this may not even be allowed at all... 

(If you agree, then in the paragraph that follows: "last three" -> "last two")

----

Section 2.2.1.3. This may be a rathole... but... 2.2.1.2 refers to IRI-s, whereas this section uses URL. Is this intentional? Or can I use mailto:ivan@w3.org as a property value in microdata (not that I would like to, but you take the point...)? I know that there is a separate section on some of the issues later, but the reader, getting to that point, might be confused...

-----

I am a little bit worried about the complexity of the 2.2.1.3 section. I realize that this is the nature of the beast, because mixing vocabularies in microdata is simply complicated, but it really breaks the flow of the reading that the relevant rdfa and microformat sections are both 1-2 paragraphs, whereas the microdata one goes on for pages. If a potential author reads this for the first time, it is easy to be lost.

Maybe putting that part into an appendix, and keeping the main text short, essentially saying that this is complex in microdata, here are the main lines of solving that, and see the appendix for the technical details?

Also, it may be worth (in the appendix) adding the similar examples for microformats and rdfa, too. In general, maybe some good comparative sections in the appendix may be very helpful for newcomers...

-----

For 2.2., it may be worth mentioning that a companion document, extracting RDF from microdata, is also in the making, ie, such a processor would interpret the microdata in turtle, too (show example?). I would expect, at some points, processors coming up that would 'distill' both RDFa and microdata from a document and merge that in RDF; I am sure Gregg will do that, and I may do the same at some point when the microdata->RDF mapping gels... 

-----

2.2.4: "any property elements" -> "any property attribute"

-----

3.1.2.2.

If JSON-LD is listed (which is fine!) then I think we should also list Turtle. After all, both are syntaxes of the same data model...

That is it for now...



On Dec 11, 2011, at 18:22 , Jeni Tennison wrote:

> Hi,
> 
> I've pulled together much of the documentation from our wiki into a single document, at:
> 
>  https://dvcs.w3.org/hg/htmldata/raw-file/default/html-data-guide/index.html
> 
> Please take the time to read this as it is the main product of this Task Force, and raise any comments here.
> 
> Thanks,
> 
> Jeni
> -- 
> Jeni Tennison
> http://www.jenitennison.com
> 
> 


----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
FOAF: http://www.ivan-herman.net/foaf.rdf

Received on Monday, 19 December 2011 11:45:20 UTC