Re: HTML Data Guide draft from Jeni Tennison on 2011-12-21 (public-html-data-tf@w3.org from December 2011)

From: Jeni Tennison <jeni@jenitennison.com>
Date: Wed, 21 Dec 2011 20:47:41 +0000
To: Ivan Herman <ivan@w3.org>
Cc: HTML Data Task Force WG <public-html-data-tf@w3.org>
Message-Id: <98A8DB26-2EB9-413C-9F5A-A1D8485AB0B0@jenitennison.com>
Ivan,

On 19 Dec 2011, at 11:45, Ivan Herman wrote:
> I am beginning to get out of my post-surgery torpor, although only very slowly.

Very glad to have you back with us, and thank you for the comments.

> In the introduction: I wonder whether it is worth emphasizing that adding structured data to HTML was also pioneered by the Semantic Web community in an attempt to bridge the world of documents and of data on the web. Many data publishers use their HTML pages as an alternative syntax to publish their data (using RDFa); examples include the authorities' pages of the Library of Congress, or DBPedia. (Yes, I know, I am biased!)

I added half a sentence [1] in the introduction: "... and by the open linked data community seeking to bridge the gap between documents and data on the web". OK?

> In the introduction: the bulleted item on RDFa. Unfortunately, one of the fallacies spread of RDFa is that it is bound to XHTML1. Which was indeed the case for RDFa 1.0, but it is not true any more for RDFa 1.1. I think it would be better to make this clear right in the intro to avoid misunderstandings. The current formulation is still misunderstandable and reinforces that view. 
> 
> What about saying: 
> 
> [[[
> RDFa reuses existing HTML attributes such as @href and @rel and adds a few of its own to enable data to be extracted from HTML pages as RDF. Though it was originally designed for XHTML 1.1, its latest version (RDFa 1.1) is also usable with HTML5 and other markup languages like SVG.
> ]]]
> 
> Actually, if we go there, it may be worth emphasizing that microformats are language agnostic in the sense that they are bound to the usage of CSS only (ie, microformats could be used in SVG, too).


See [2].

> In the introduction: it may be worth emphasizing that microdata has been defined for HTML5 only. Ie, if authors care about validation, they should not used microdata with, say, XHTML 1.0.

See [3].

> 1.1 Scope: is it correct that one can add RDF/XML into a script element? I must admit I have never seen that in practice and is fairly problematic in XML point of view (unless CDATA is used). I think it would be safer not to refer to that.

Yes, you can add RDF/XML into a script element. I believe it doesn't need to be wrapped in a CDATA section unless you're using XHTML. I've removed the specific pointer to RDF/XML but kept the information about embedding XML-based data which is also mentioned in [4]. See [5].

> Section 2, item on HTML5: "how exacting your publishing guidelines are" -> "how exactly your publishing guidelines are"

No, "exacting" is what I meant here: "rigid or severe in demands or requirements" (see [6])

> Section 2.1.1, Structured HTML values: "emphasised" -> "emphasized" :-( I know you hate it and my schooling of English is also UK English, but the rule at W3C is that documents should use American English:-( I have not checked the whole document for that, but this one just reminded me of this issue…

Fixed that one. If you spot others let me know.

> Section 2.1.1: I am not sure we should list it here, but maybe: a particular issue with microformats is that the vocabularies use the @class attribute value. This may clash with the class attribute value as used in the CSS files that are, eg, part of a corporate publishing environment. Authors should carefully check this before they decide to use microformats, otherwise they are in for major surprises in the way their page will be rendered... An additional issue is that microformats may require, say, the usage of the <abbr> element and this may clash with accessibility considerations of a particular publishing environment…

I haven't done anything about this comment partly as I think these issues are addressed by microformats-2 and by the use of the <time> element.

> Still on 2.1.1: If the publishers wants to make use of Linked Data, for example, by making it easy/possible to link the data in the page to other linked data vocabularies easily then, probably, RDFa is a much better choice, simply because it is inherently bound to RDF.

I haven't done anything about this either, because the approach we are advocating for publishers is consumer-centric: to think "I want to support consumer X (eg a generic linked data consumer) and that consumer requires format Y (eg RDFa) so I will use RDFa". The reasons why consumers might choose to consume RDFa (and hence encourage publishers to publish using it) rather than other formats are already covered in detail in [7].

> Still on 2.1.1: what about datatypes? If a vocabulary requires the usage of datatypes then... well, RDFa is the only one handling that. That being said, it may be a very special case that we may not want to address here (and I know it is addressed elsewhere.)

Again, I think this is a choice on the consumer end rather than the publisher end. If the consumer is a generic consumer or adopts a vocabulary that requires publishers to explicitly indicate the datatypes of values then that consumer will require RDFa, and publishers that want to target that consumer will use RDFa.

I also think that we should not be encouraging vocabulary authors to create vocabularies that rely on values being annotated by datatypes in order to be understood, because this adds burden to publishers using that vocabulary. Recommending that people use RDFa to support something that is regarded as bad practice doesn't really make sense. :)

> Section 2.2.1.2: the reference to the RDFa 1.1 Core initial context is wrong. It should say: http://www.w3.org/2011/rdfa-context/rdfa-1.1.html

I have changed it according to Steph's recommendation to http://www.w3.org/2011/rdfa-context/rdfa-1.1. If I still have that wrong let me know (See [8])

> Section 2.2.1.2: I would prefer not to mention @xmlns at all. In the HTML5 version this may not even be allowed at all... 
> 
> (If you agree, then in the paragraph that follows: "last three" -> "last two")

OK, I've removed it [9]

> Section 2.2.1.3. This may be a rathole... but... 2.2.1.2 refers to IRI-s, whereas this section uses URL. Is this intentional? Or can I use mailto:ivan@w3.org as a property value in microdata (not that I would like to, but you take the point...)? I know that there is a separate section on some of the issues later, but the reader, getting to that point, might be confused…

I've tried to use the term "IRI" when things are really IRIs (as from RDFa) and URL when things are URLs as defined by HTML5. I've added a paragraph to this effect in the Terminology section [10]

> I am a little bit worried about the complexity of the 2.2.1.3 section. I realize that this is the nature of the beast, because mixing vocabularies in microdata is simply complicated, but it really breaks the flow of the reading that the relevant rdfa and microformat sections are both 1-2 paragraphs, whereas the microdata one goes on for pages. If a potential author reads this for the first time, it is easy to be lost.
> 
> Maybe putting that part into an appendix, and keeping the main text short, essentially saying that this is complex in microdata, here are the main lines of solving that, and see the appendix for the technical details?

Good suggestion. See [11]

> Also, it may be worth (in the appendix) adding the similar examples for microformats and rdfa, too. In general, maybe some good comparative sections in the appendix may be very helpful for newcomers…

I'm sure they would but I don't think I have time to put them together. If there is a volunteer to put together some text I will gladly incorporate it.

> For 2.2., it may be worth mentioning that a companion document, extracting RDF from microdata, is also in the making, ie, such a processor would interpret the microdata in turtle, too (show example?). I would expect, at some points, processors coming up that would 'distill' both RDFa and microdata from a document and merge that in RDF; I am sure Gregg will do that, and I may do the same at some point when the microdata->RDF mapping gels... 

I wasn't exactly sure where to add this, so I ended up adding the example of a microdata/RDF processor extracting RDF from microdata markup in [12]. If that doesn't work for you, please suggest some wording and where it should be placed.

> 2.2.4: "any property elements" -> "any property attribute"

Fixed [13]

> If JSON-LD is listed (which is fine!) then I think we should also list Turtle. After all, both are syntaxes of the same data model...


I added RDF/XML as well, see [14].

Thanks for the review, Ivan. Let me know if you spot anything more or if you aren't happy with my actions above.

Cheers,

Jeni

[1] https://dvcs.w3.org/hg/htmldata/rev/f59eca060106
[2] https://dvcs.w3.org/hg/htmldata/rev/ba4b375cb21f 
[3] https://dvcs.w3.org/hg/htmldata/rev/b187767fc1b8
[4] http://www.w3.org/2010/html-xml/snapshot/#uc04
[5] https://dvcs.w3.org/hg/htmldata/rev/b90ce8a5d037
[6] http://dictionary.reference.com/browse/exacting
[7] https://dvcs.w3.org/hg/htmldata/raw-file/default/html-data-guide/index.html#choosing-a-syntax-to-consume
[8] https://dvcs.w3.org/hg/htmldata/rev/063f74830375
[9] https://dvcs.w3.org/hg/htmldata/rev/ffddea8db417
[10] https://dvcs.w3.org/hg/htmldata/rev/41c5fce1c6b9
[11] https://dvcs.w3.org/hg/htmldata/rev/8ca69129dd76
[12] https://dvcs.w3.org/hg/htmldata/raw-file/default/html-data-guide/index.html#mixing-syntaxes
[13] https://dvcs.w3.org/hg/htmldata/rev/915fd629432f
[14] https://dvcs.w3.org/hg/htmldata/rev/c6c5c47accc2
-- 
Jeni Tennison
http://www.jenitennison.com
Received on Wednesday, 21 December 2011 20:48:08 UTC