W3C home > Mailing lists > Public > public-html-data-tf@w3.org > December 2011

Re: HTML Data Guide draft

From: Tantek Çelik <tantek@cs.stanford.edu>
Date: Mon, 19 Dec 2011 17:55:14 -0800
Message-ID: <CAOACb=LwWdU7LSgiJWn_g=uQEv-w10szZwpV7wdRbCKz2NuW-A@mail.gmail.com>
To: Ivan Herman <ivan@w3.org>, Jeni Tennison <jeni@jenitennison.com>
Cc: HTML Data Task Force WG <public-html-data-tf@w3.org>
On Mon, Dec 19, 2011 at 03:45, Ivan Herman <ivan@w3.org> wrote:
> Jeni,
> I am beginning to get out of my post-surgery torpor, although only very slowly.

Best wishes for a speedy recovery Ivan!

> But I did go through the document today, and have noted a bunch of comments, see them below. There is no priority order, I just made the notes as I read. I am sure there are more issues coming up, but this may be a good start.
> And thanks...
> Ivan
> In the introduction: I wonder whether it is worth emphasizing that adding structured data to HTML was also pioneered by the Semantic Web community in an attempt to bridge the world of documents and of data on the web. Many data publishers use their HTML pages as an alternative syntax to publish their data (using RDFa); examples include the authorities' pages of the Library of Congress, or DBPedia. (Yes, I know, I am biased!)

Ivan, I'm not sure what you mean by "pioneered" in this context.

If you're referring to the use of visible HTML markup to publish
structured data to the web, that practice was pioneered by modern web
designers in the early 2000s which produced the general practice of
representing the information architecture (IA) of a site as class
names in their (X)HTML.

If you mean establishing conventions for said practice, AFAIK, XFN was
the first to do so with the rel attribute in 2003, and microformats
was the first proposal to formalize conventions for a general approach
of using visible HTML markup to publish structured data to the web. I
know because I presented it in 2004 at O'Reilly's ETech conference,
followed up with proposing hCard and hCalendar later that year.

There were many previous attempts to formalize conventions for using
*invisible* HTML (whether meta tags, SGML comments etc.), but those go
back to much earlier days of the web that predate any of the current
HTML data efforts.

Citations and specific dates available here:


And to be clear, I'm definitely interested in any citations about data
in visible HTML from 2004 or earlier. The history of all this stuff is
not well documented in popular sources (like Wikipedia), and the more
citations we can gather, the better chance that Wikipedia editors will
improve their historical documentation.

> In the introduction: the bulleted item on RDFa. Unfortunately, one of the fallacies spread of RDFa is that it is bound to XHTML1. Which was indeed the case for RDFa 1.0, but it is not true any more for RDFa 1.1. I think it would be better to make this clear right in the intro to avoid misunderstandings. The current formulation is still misunderstandable and reinforces that view.
> What about saying:
> [[[
> RDFa reuses existing HTML attributes such as @href and @rel and adds a few of its own to enable data to be extracted from HTML pages as RDF. Though it was originally designed for XHTML 1.1, its latest version (RDFa 1.1) is also usable with HTML5 and other markup languages like SVG.
> ]]]
> Actually, if we go there, it may be worth emphasizing that microformats are language agnostic in the sense that they are bound to the usage of CSS only (ie, microformats could be used in SVG, too).

To be clear, microformats make use of the *HTML* class attribute, but
can work directly in any XML language with a class attribute (like
SVG, MathML), or XML without class attributes by embedding XHTML.

> In the introduction: it may be worth emphasizing that microdata has been defined for HTML5 only. Ie, if authors care about validation, they should not used microdata with, say, XHTML 1.0.
> I realise you come back to this issue in section 2, but the introduction really sets the tone...
> -----
> 1.1 Scope: is it correct that one can add RDF/XML into a script element? I must admit I have never seen that in practice and is fairly problematic in XML point of view (unless CDATA is used). I think it would be safer not to refer to that.

Yeah this has been tried in the past (at least with XML in the defunct
StructuredBlogging effort) and is should be considered an obsolete
technique not worthy of an intro.

If you're looking to document dead-end web data efforts for historical
purposes, there's plenty more to add to the list, e.g.
StructuredBlogging, Google Base, and most recently CommonTag.

> Section 2.1.1: I am not sure we should list it here, but maybe: a particular issue with microformats is that the vocabularies use the @class attribute value.
> This may clash with the class attribute value as used in the CSS files that are, eg, part of a corporate publishing environment.

This is a known documented issue for microformats-1 style microformats:


Is it the purpose of this document to list issues with all approaches?
If so, there's plenty more.

> Authors should carefully check this before they decide to use microformats, otherwise they are in for major surprises in the way their page will be rendered...

Ivan, do you know of specific examples where this has occurred?

Please provide them so I can add them to the documentation of the
issue on our microformats wiki.

Absent such documentation, I don't see why this would be worthy of a
warning in this document.

> An additional issue is that microformats may require, say, the usage of the <abbr> element and this may clash with accessibility considerations of a particular publishing environment...

This issue has been long since (2+ years) addressed and resolved with
the microformats value-class-pattern.


> Still on 2.1.1: If the publishers wants to make use of Linked Data, for example, by making it easy/possible to link the data in the page to other linked data vocabularies easily then, probably, RDFa is a much better choice, simply because it is inherently bound to RDF.

I think this merits explanation, and I don't accept "inherently bound
to RDF" as a good argument here.

As long as a syntax can produce properties and data bound in URLs
(which I believe is possible with all the current syntaxes being
discussed in this TF), isn't that all that's necessary for Linked

> Still on 2.1.1: what about datatypes? If a vocabulary requires the usage of datatypes then... well, RDFa is the only one handling that. That being said, it may be a very special case that we may not want to address here (and I know it is addressed elsewhere.)

What do you mean for a vocabulary to "require the usage of datatypes"?

In microformats experience, we've never found a need to "require" a datatype.

On the contrary, the more such requirements were attempted, the higher
the barrier, or the lower the data quality as authors get it wrong.

If anything, we should say something like:

"Avoid vocabularies that require the use of datatypes."

In order to prefer easier to use vocabularies, and to increase data quality.

> I am a little bit worried about the complexity of the section. I realize that this is the nature of the beast, because mixing vocabularies in microdata is simply complicated, but it really breaks the flow of the reading that the relevant rdfa and microformat sections are both 1-2 paragraphs, whereas the microdata one goes on for pages. If a potential author reads this for the first time, it is easy to be lost.
> Maybe putting that part into an appendix, and keeping the main text short, essentially saying that this is complex in microdata, here are the main lines of solving that, and see the appendix for the technical details?
> Also, it may be worth (in the appendix) adding the similar examples for microformats and rdfa, too. In general, maybe some good comparative sections in the appendix may be very helpful for newcomers...

Indeed, that could be quite useful.

> On Dec 11, 2011, at 18:22 , Jeni Tennison wrote:
>> Hi,
>> I've pulled together much of the documentation from our wiki into a single document, at:
>>  https://dvcs.w3.org/hg/htmldata/raw-file/default/html-data-guide/index.html
>> Please take the time to read this as it is the main product of this Task Force, and raise any comments here.

Jeni, given the extensive feedback from Ivan (and the comments I and
myself have made), the one general item of feedback I'd say is:

Perhaps it's better to just keep this document on the wiki for now, to
enable/encourage more collaborative iteration/updating - especially
with minor fixes.

hg is just less accessible/usable than MediaWiki.

If you're looking to generate something that looks like a W3C note,
perhaps it would be better to simply auto-generate such a note from a
specific wiki page. After all, most of the semantics should directly
map right?



http://tantek.com/ - I made an HTML5 tutorial! http://tantek.com/html5
Received on Tuesday, 20 December 2011 01:56:24 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 20:08:26 UTC