Re: HTML Data Guide draft from Ivan Herman on 2013-11-22 (public-html-data-tf@w3.org from November 2013)

From: Ivan Herman <ivan@w3.org>
Date: Fri, 22 Nov 2013 17:51:33 +0100
To: Tantek Çelik <tantek@cs.stanford.edu>
Cc: Jeni Tennison <jeni@jenitennison.com>, HTML Data Task Force WG <public-html-data-tf@w3.org>
Message-Id: <D4731F98-3CDB-4EA2-B90F-08E6F6B19A9D@w3.org>
This was something very very strange, and I do not know what happened; I moved to a new Mac this morning, using a backup to transfer my old data, started my mailer which seems to have picked up this very old mail... I simply no idea why. My surgery was several years ago, no new surgery in sight:-)

Sorry about the noise.

Ivan

On 22 Nov 2013, at 17:41 , Tantek Çelik <tantek@cs.stanford.edu> wrote:

> Ivan,
> 
> Did you just send this reply or did it just get copied to this list?
> 
> http://lists.w3.org/Archives/Public/public-html-data-tf/2013Nov/0001.html
> 
> Did you post it because there are thoughts around updating this
> document with errata?
> 
> Thanks,
> 
> Tantek
> 
> 
> On Tue, Dec 20, 2011 at 12:23 AM, Ivan Herman <ivan@w3.org> wrote:
>> 
>> On Dec 20, 2011, at 02:55 , Tantek Çelik wrote:
>> 
>> On Mon, Dec 19, 2011 at 03:45, Ivan Herman <ivan@w3.org> wrote:
>> 
>> Jeni,
>> 
>> 
>> I am beginning to get out of my post-surgery torpor, although only very
>> slowly.
>> 
>> 
>> Best wishes for a speedy recovery Ivan!
>> 
>> 
>> Thanks!
>> 
>> 
>> 
>> But I did go through the document today, and have noted a bunch of comments,
>> see them below. There is no priority order, I just made the notes as I read.
>> I am sure there are more issues coming up, but this may be a good start.
>> 
>> 
>> And thanks...
>> 
>> 
>> Ivan
>> 
>> 
>> In the introduction: I wonder whether it is worth emphasizing that adding
>> structured data to HTML was also pioneered by the Semantic Web community in
>> an attempt to bridge the world of documents and of data on the web. Many
>> data publishers use their HTML pages as an alternative syntax to publish
>> their data (using RDFa); examples include the authorities' pages of the
>> Library of Congress, or DBPedia. (Yes, I know, I am biased!)
>> 
>> 
>> Ivan, I'm not sure what you mean by "pioneered" in this context.
>> 
>> 
>> Tantek, you are right, it was not the right choice of words. In a sense, my
>> reaction was more to the fact that the current text has a very strong bias
>> towards the search engine usage. And the usage of HTML as a vehicle to bind
>> to Linked Data has been a strong motivation in the past few years. That is
>> all...
>> 
>> [snip]
>> 
>> 
>> Actually, if we go there, it may be worth emphasizing that microformats are
>> language agnostic in the sense that they are bound to the usage of CSS only
>> (ie, microformats could be used in SVG, too).
>> 
>> 
>> To be clear, microformats make use of the *HTML* class attribute, but
>> 
>> can work directly in any XML language with a class attribute (like
>> 
>> SVG, MathML), or XML without class attributes by embedding XHTML.
>> 
>> 
>> 
>> Just out of curiosity (nothing to do with the current issues): does the
>> microformat definition strictly entail that it must be a class attribute of
>> HTML? I always considered to be more liberal and workable, as you say, with
>> MathML and SVG out of the box...
>> 
>> 
>> 
>> In the introduction: it may be worth emphasizing that microdata has been
>> defined for HTML5 only. Ie, if authors care about validation, they should
>> not used microdata with, say, XHTML 1.0.
>> 
>> 
>> I realise you come back to this issue in section 2, but the introduction
>> really sets the tone...
>> 
>> 
>> -----
>> 
>> 
>> 1.1 Scope: is it correct that one can add RDF/XML into a script element? I
>> must admit I have never seen that in practice and is fairly problematic in
>> XML point of view (unless CDATA is used). I think it would be safer not to
>> refer to that.
>> 
>> 
>> Yeah this has been tried in the past (at least with XML in the defunct
>> 
>> StructuredBlogging effort) and is should be considered an obsolete
>> 
>> technique not worthy of an intro.
>> 
>> 
>> I agree.
>> 
>> 
>> If you're looking to document dead-end web data efforts for historical
>> 
>> purposes, there's plenty more to add to the list, e.g.
>> 
>> StructuredBlogging, Google Base, and most recently CommonTag.
>> 
>> 
>> 
>> Section 2.1.1: I am not sure we should list it here, but maybe: a particular
>> issue with microformats is that the vocabularies use the @class attribute
>> value.
>> 
>> 
>> This may clash with the class attribute value as used in the CSS files that
>> are, eg, part of a corporate publishing environment.
>> 
>> 
>> This is a known documented issue for microformats-1 style microformats:
>> 
>> 
>> http://microformats.org/wiki/microformats-issues#class-collisions
>> 
>> 
>> Is it the purpose of this document to list issues with all approaches?
>> 
>> If so, there's plenty more.
>> 
>> 
>> 
>> 
>> Good question. I let Jeni decide on that, as editor of the document. You are
>> right that we have to avoid giving the impression of arbitrary choices.
>> 
>> 
>> Authors should carefully check this before they decide to use microformats,
>> otherwise they are in for major surprises in the way their page will be
>> rendered...
>> 
>> 
>> Ivan, do you know of specific examples where this has occurred?
>> 
>> 
>> Please provide them so I can add them to the documentation of the
>> 
>> issue on our microformats wiki.
>> 
>> 
>> Absent such documentation, I don't see why this would be worthy of a
>> 
>> warning in this document.
>> 
>> 
>> Honestly: not in my personal experience. I have not done any extensive
>> search either. However, interestingly, this is one of the issue Google
>> claims to have ran into for rich snippets. Yes, this is what the US law
>> practice calls 'hearsay', ie, you can object to it:-)
>> 
>> 
>> 
>> An additional issue is that microformats may require, say, the usage of the
>> <abbr> element and this may clash with accessibility considerations of a
>> particular publishing environment...
>> 
>> 
>> This issue has been long since (2+ years) addressed and resolved with
>> 
>> the microformats value-class-pattern.
>> 
>> 
>> http://microformats.org/wiki/value-class-pattern
>> 
>> 
>> 
>> 
>> .... which may be worth noting then? (Not sure where the borderline is.)
>> 
>> 
>> Still on 2.1.1: If the publishers wants to make use of Linked Data, for
>> example, by making it easy/possible to link the data in the page to other
>> linked data vocabularies easily then, probably, RDFa is a much better
>> choice, simply because it is inherently bound to RDF.
>> 
>> 
>> I think this merits explanation, and I don't accept "inherently bound
>> 
>> to RDF" as a good argument here.
>> 
>> 
>> As long as a syntax can produce properties and data bound in URLs
>> 
>> (which I believe is possible with all the current syntaxes being
>> 
>> discussed in this TF), isn't that all that's necessary for Linked
>> 
>> Data?
>> 
>> 
>> 
>> Well, I do not want to go into a long discussion on what Linked Data is;
>> this is not the right place. Yes, URI-s are important. But, to be a little
>> bit more orthodox at this point, the issue is whether the data in a given
>> vocabulary can or cannot be mapped onto RDF through a clearly specified
>> manner. While this is not an issue for RDFa, there are issues, essentially
>> per vocabulary, both for microformats or for microdata. AFAIK, and you may
>> correct me, not all microformats vocabularies give clear definitions on what
>> URI-s to use, for example; and the microdata vocabulary specification
>> requirements are even more convoluted in this respect. Ie, at least for some
>> vocabularies, there might be issues that the authors should know about.
>> 
>> Also, there is a very dynamic scene on the linked data world in the
>> definition of various vocabularies. Some of those (bibliography, music,
>> dublin core, to take just three) have a clear usage within an HTML page,
>> too. Those can be used directly from within RDFa, it is a bit complicated to
>> do it with other syntaxes (even if the vocabulary itself is simple, like DC)
>> because it needs an extra mapping phase to those syntaxes.
>> 
>> 
>> Still on 2.1.1: what about datatypes? If a vocabulary requires the usage of
>> datatypes then... well, RDFa is the only one handling that. That being said,
>> it may be a very special case that we may not want to address here (and I
>> know it is addressed elsewhere.)
>> 
>> 
>> What do you mean for a vocabulary to "require the usage of datatypes"?
>> 
>> 
>> That the range of a property is defined to be a of a given datatype.
>> 
>> 
>> In microformats experience, we've never found a need to "require" a
>> datatype.
>> 
>> 
>> You are right that the word 'required' is wrong. What I meant is what is
>> above.
>> 
>> I just picked one vocabulary, that has clearly its possible role on a web
>> page, namely the music ontology:
>> 
>> http://www.musicontology.com/
>> 
>> that refers to xsd:int and others for a number of its properties.
>> 
>> Of course, it is perfectly possible, from a SW point of view, to produce
>> data that does _not_ explicitly defines, say, a <span>10</span>, to be an
>> integer; after all, a range specification is a license to infer and not a
>> restriction. But practice is that authors may want to reinforce this using
>> the datatype. If that is intended then, well, we may have a problem.
>> 
>> 
>> On the contrary, the more such requirements were attempted, the higher
>> 
>> the barrier, or the lower the data quality as authors get it wrong.
>> 
>> 
>> If anything, we should say something like:
>> 
>> 
>> "Avoid vocabularies that require the use of datatypes."
>> 
>> 
>> I think such a statement is too restrictive, and I would not agree with
>> that. See my example above. 'Watch out' instead of 'Avoid' might be
>> better...
>> 
>> [snip]
>> 
>> 
>> 
>> On Dec 11, 2011, at 18:22 , Jeni Tennison wrote:
>> 
>> 
>> Hi,
>> 
>> 
>> I've pulled together much of the documentation from our wiki into a single
>> document, at:
>> 
>> 
>> https://dvcs.w3.org/hg/htmldata/raw-file/default/html-data-guide/index.html
>> 
>> 
>> Please take the time to read this as it is the main product of this Task
>> Force, and raise any comments here.
>> 
>> 
>> 
>> Jeni, given the extensive feedback from Ivan (and the comments I and
>> 
>> myself have made), the one general item of feedback I'd say is:
>> 
>> 
>> Perhaps it's better to just keep this document on the wiki for now, to
>> 
>> enable/encourage more collaborative iteration/updating - especially
>> 
>> with minor fixes.
>> 
>> 
>> hg is just less accessible/usable than MediaWiki.
>> 
>> 
>> A bit. It does require hg knowledge, that is:-)
>> 
>> 
>> If you're looking to generate something that looks like a W3C note,
>> 
>> perhaps it would be better to simply auto-generate such a note from a
>> 
>> specific wiki page.
>> 
>> 
>> I am not sure we have tools to do that (would be good). Ie, I am a little
>> bit afraid that this would mean an extra load on Jeni at the end...
>> 
>> Thanks!
>> 
>> Ivan
>> 
>> 
>> 
>> After all, most of the semantics should directly
>> 
>> map right?
>> 
>> 
>> Thanks,
>> 
>> 
>> 
>> Tantek
>> 
>> 
>> --
>> 
>> http://tantek.com/ - I made an HTML5 tutorial! http://tantek.com/html5
>> 
>> 
>> 
>> ----
>> Ivan Herman, W3C Semantic Web Activity Lead
>> Home: http://www.w3.org/People/Ivan/
>> mobile: +31-641044153
>> FOAF: http://www.ivan-herman.net/foaf.rdf


----
Ivan Herman, W3C 
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
GPG: 0x343F1A3D
FOAF: http://www.ivan-herman.net/foaf
Received on Friday, 22 November 2013 16:52:10 UTC