Re: microdata use cases and Getting data out of poorly written Web pages from Henri Sivonen on 2009-05-14 (public-html@w3.org from May 2009)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Thu, 14 May 2009 09:36:05 +0300
To: Ben Adida <ben@adida.net>
Cc: public-html@w3.org
Message-Id: <535547D0-CA88-46FA-AF1B-728426D468D1@iki.fi>

On May 13, 2009, at 20:18, Ben Adida wrote:

> Yes, it's important to validate theories. Henri's theory that
> xmlns:foo would be impossible/difficult to parse correctly in text/
> html proved to be a fairly weak argument in practice (Google, Yahoo,
> and my Firefox bookmarklet do just fine.)

Please don't mischaracterize what I have said in order to dismiss it.

I have said:

1) If xmlns:foo is parsed the way it's currently specced and
implemented in Gecko, WebKit and Opera, a namespace-aware API
representation of it is different for text/html and application/xhtml
+xml DOMs. (This is not theory. It is a testable statement of fact.)
If the namespace-aware representation is different for text/html and
application/xhtml+xml DOMs, applications that support both need
divergent code paths on the application layer to paper over the
difference. Having divergent code paths is bad. (Browser-internal APIs
in Gecko are in the namespace-aware category as are the Level 2 parts
of DOM.)

2) Some namespace-aware representations don't allow an attribute with
local name "xmlns:foo" in no namespace to be represented at all. XOM
is such a representation. (It throws if you try to set such an
attribute.)

3) Finding out whether it is feasible to change text/html parsing to
make the data model representation of xmlns:foo match the
representation parsed from application/xhtml+xml would involve
shipping such an implementation, which means finding out entails non-
trivial cost. (Note that I have not claimed that it would be
impossible--just that the experiment has such cost characteristics
that I think a microdata solution should avoid the need to perform
that experiment.)

Your bookmarklet doesn't refute any of the above points because it
uses a namespace-unaware API (DOM Level 1).

It is unclear if recent services unveiled by Google and Yahoo! refute
any of my points. Do either of them support both text/html and
application/xhtml+xml? If yes, is it known if they have divergent code
paths internally? Is it known if they use conforming HTML and XML
parsers?

Furthermore, according to Hixie, Yahoo! treats prefix as meaningful (http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2009Mar/0100.html
), which is evidence *against* prefix-based indirection doing just fine.

So far, I haven't seen evidence showing whether Google implements
CURIEs per spec (to the extent to which there is a spec; RDFa has not
been specced for text/html) or whether they, too, give meaning to the
prefix.

--
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Thursday, 14 May 2009 06:36:46 UTC