W3C home > Mailing lists > Public > public-html-data-tf@w3.org > November 2013

Re: HTML Data Guide draft

From: Tantek Çelik <tantek@cs.stanford.edu>
Date: Fri, 22 Nov 2013 08:41:46 -0800
Message-ID: <CAEV2_WaN_uEfgCHqt7OQi=jvt99YtLORHhdfVEGOBBedW6+hmg@mail.gmail.com>
To: Ivan Herman <ivan@w3.org>
Cc: Tantek Çelik <tantek@cs.stanford.edu>, Jeni Tennison <jeni@jenitennison.com>, HTML Data Task Force WG <public-html-data-tf@w3.org>
Ivan,

Did you just send this reply or did it just get copied to this list?

http://lists.w3.org/Archives/Public/public-html-data-tf/2013Nov/0001.html

Did you post it because there are thoughts around updating this
document with errata?

Thanks,

Tantek


On Tue, Dec 20, 2011 at 12:23 AM, Ivan Herman <ivan@w3.org> wrote:
>
> On Dec 20, 2011, at 02:55 , Tantek Çelik wrote:
>
> On Mon, Dec 19, 2011 at 03:45, Ivan Herman <ivan@w3.org> wrote:
>
> Jeni,
>
>
> I am beginning to get out of my post-surgery torpor, although only very
> slowly.
>
>
> Best wishes for a speedy recovery Ivan!
>
>
> Thanks!
>
>
>
> But I did go through the document today, and have noted a bunch of comments,
> see them below. There is no priority order, I just made the notes as I read.
> I am sure there are more issues coming up, but this may be a good start.
>
>
> And thanks...
>
>
> Ivan
>
>
> In the introduction: I wonder whether it is worth emphasizing that adding
> structured data to HTML was also pioneered by the Semantic Web community in
> an attempt to bridge the world of documents and of data on the web. Many
> data publishers use their HTML pages as an alternative syntax to publish
> their data (using RDFa); examples include the authorities' pages of the
> Library of Congress, or DBPedia. (Yes, I know, I am biased!)
>
>
> Ivan, I'm not sure what you mean by "pioneered" in this context.
>
>
> Tantek, you are right, it was not the right choice of words. In a sense, my
> reaction was more to the fact that the current text has a very strong bias
> towards the search engine usage. And the usage of HTML as a vehicle to bind
> to Linked Data has been a strong motivation in the past few years. That is
> all...
>
> [snip]
>
>
> Actually, if we go there, it may be worth emphasizing that microformats are
> language agnostic in the sense that they are bound to the usage of CSS only
> (ie, microformats could be used in SVG, too).
>
>
> To be clear, microformats make use of the *HTML* class attribute, but
>
> can work directly in any XML language with a class attribute (like
>
> SVG, MathML), or XML without class attributes by embedding XHTML.
>
>
>
> Just out of curiosity (nothing to do with the current issues): does the
> microformat definition strictly entail that it must be a class attribute of
> HTML? I always considered to be more liberal and workable, as you say, with
> MathML and SVG out of the box...
>
>
>
> In the introduction: it may be worth emphasizing that microdata has been
> defined for HTML5 only. Ie, if authors care about validation, they should
> not used microdata with, say, XHTML 1.0.
>
>
> I realise you come back to this issue in section 2, but the introduction
> really sets the tone...
>
>
> -----
>
>
> 1.1 Scope: is it correct that one can add RDF/XML into a script element? I
> must admit I have never seen that in practice and is fairly problematic in
> XML point of view (unless CDATA is used). I think it would be safer not to
> refer to that.
>
>
> Yeah this has been tried in the past (at least with XML in the defunct
>
> StructuredBlogging effort) and is should be considered an obsolete
>
> technique not worthy of an intro.
>
>
> I agree.
>
>
> If you're looking to document dead-end web data efforts for historical
>
> purposes, there's plenty more to add to the list, e.g.
>
> StructuredBlogging, Google Base, and most recently CommonTag.
>
>
>
> Section 2.1.1: I am not sure we should list it here, but maybe: a particular
> issue with microformats is that the vocabularies use the @class attribute
> value.
>
>
> This may clash with the class attribute value as used in the CSS files that
> are, eg, part of a corporate publishing environment.
>
>
> This is a known documented issue for microformats-1 style microformats:
>
>
> http://microformats.org/wiki/microformats-issues#class-collisions
>
>
> Is it the purpose of this document to list issues with all approaches?
>
> If so, there's plenty more.
>
>
>
>
> Good question. I let Jeni decide on that, as editor of the document. You are
> right that we have to avoid giving the impression of arbitrary choices.
>
>
> Authors should carefully check this before they decide to use microformats,
> otherwise they are in for major surprises in the way their page will be
> rendered...
>
>
> Ivan, do you know of specific examples where this has occurred?
>
>
> Please provide them so I can add them to the documentation of the
>
> issue on our microformats wiki.
>
>
> Absent such documentation, I don't see why this would be worthy of a
>
> warning in this document.
>
>
> Honestly: not in my personal experience. I have not done any extensive
> search either. However, interestingly, this is one of the issue Google
> claims to have ran into for rich snippets. Yes, this is what the US law
> practice calls 'hearsay', ie, you can object to it:-)
>
>
>
> An additional issue is that microformats may require, say, the usage of the
> <abbr> element and this may clash with accessibility considerations of a
> particular publishing environment...
>
>
> This issue has been long since (2+ years) addressed and resolved with
>
> the microformats value-class-pattern.
>
>
> http://microformats.org/wiki/value-class-pattern
>
>
>
>
> .... which may be worth noting then? (Not sure where the borderline is.)
>
>
> Still on 2.1.1: If the publishers wants to make use of Linked Data, for
> example, by making it easy/possible to link the data in the page to other
> linked data vocabularies easily then, probably, RDFa is a much better
> choice, simply because it is inherently bound to RDF.
>
>
> I think this merits explanation, and I don't accept "inherently bound
>
> to RDF" as a good argument here.
>
>
> As long as a syntax can produce properties and data bound in URLs
>
> (which I believe is possible with all the current syntaxes being
>
> discussed in this TF), isn't that all that's necessary for Linked
>
> Data?
>
>
>
> Well, I do not want to go into a long discussion on what Linked Data is;
> this is not the right place. Yes, URI-s are important. But, to be a little
> bit more orthodox at this point, the issue is whether the data in a given
> vocabulary can or cannot be mapped onto RDF through a clearly specified
> manner. While this is not an issue for RDFa, there are issues, essentially
> per vocabulary, both for microformats or for microdata. AFAIK, and you may
> correct me, not all microformats vocabularies give clear definitions on what
> URI-s to use, for example; and the microdata vocabulary specification
> requirements are even more convoluted in this respect. Ie, at least for some
> vocabularies, there might be issues that the authors should know about.
>
> Also, there is a very dynamic scene on the linked data world in the
> definition of various vocabularies. Some of those (bibliography, music,
> dublin core, to take just three) have a clear usage within an HTML page,
> too. Those can be used directly from within RDFa, it is a bit complicated to
> do it with other syntaxes (even if the vocabulary itself is simple, like DC)
> because it needs an extra mapping phase to those syntaxes.
>
>
> Still on 2.1.1: what about datatypes? If a vocabulary requires the usage of
> datatypes then... well, RDFa is the only one handling that. That being said,
> it may be a very special case that we may not want to address here (and I
> know it is addressed elsewhere.)
>
>
> What do you mean for a vocabulary to "require the usage of datatypes"?
>
>
> That the range of a property is defined to be a of a given datatype.
>
>
> In microformats experience, we've never found a need to "require" a
> datatype.
>
>
> You are right that the word 'required' is wrong. What I meant is what is
> above.
>
> I just picked one vocabulary, that has clearly its possible role on a web
> page, namely the music ontology:
>
> http://www.musicontology.com/
>
> that refers to xsd:int and others for a number of its properties.
>
> Of course, it is perfectly possible, from a SW point of view, to produce
> data that does _not_ explicitly defines, say, a <span>10</span>, to be an
> integer; after all, a range specification is a license to infer and not a
> restriction. But practice is that authors may want to reinforce this using
> the datatype. If that is intended then, well, we may have a problem.
>
>
> On the contrary, the more such requirements were attempted, the higher
>
> the barrier, or the lower the data quality as authors get it wrong.
>
>
> If anything, we should say something like:
>
>
> "Avoid vocabularies that require the use of datatypes."
>
>
> I think such a statement is too restrictive, and I would not agree with
> that. See my example above. 'Watch out' instead of 'Avoid' might be
> better...
>
> [snip]
>
>
>
> On Dec 11, 2011, at 18:22 , Jeni Tennison wrote:
>
>
> Hi,
>
>
> I've pulled together much of the documentation from our wiki into a single
> document, at:
>
>
> https://dvcs.w3.org/hg/htmldata/raw-file/default/html-data-guide/index.html
>
>
> Please take the time to read this as it is the main product of this Task
> Force, and raise any comments here.
>
>
>
> Jeni, given the extensive feedback from Ivan (and the comments I and
>
> myself have made), the one general item of feedback I'd say is:
>
>
> Perhaps it's better to just keep this document on the wiki for now, to
>
> enable/encourage more collaborative iteration/updating - especially
>
> with minor fixes.
>
>
> hg is just less accessible/usable than MediaWiki.
>
>
> A bit. It does require hg knowledge, that is:-)
>
>
> If you're looking to generate something that looks like a W3C note,
>
> perhaps it would be better to simply auto-generate such a note from a
>
> specific wiki page.
>
>
> I am not sure we have tools to do that (would be good). Ie, I am a little
> bit afraid that this would mean an extra load on Jeni at the end...
>
> Thanks!
>
> Ivan
>
>
>
> After all, most of the semantics should directly
>
> map right?
>
>
> Thanks,
>
>
>
> Tantek
>
>
> --
>
> http://tantek.com/ - I made an HTML5 tutorial! http://tantek.com/html5
>
>
>
> ----
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> FOAF: http://www.ivan-herman.net/foaf.rdf
>
>
>
>
>
Received on Friday, 22 November 2013 16:42:54 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 20:08:27 UTC