W3C home > Mailing lists > Public > public-html@w3.org > May 2009

Re: microdata use cases and Getting data out of poorly written Web pages

From: Henri Sivonen <hsivonen@iki.fi>
Date: Thu, 14 May 2009 09:36:05 +0300
Cc: public-html@w3.org
Message-Id: <535547D0-CA88-46FA-AF1B-728426D468D1@iki.fi>
To: Ben Adida <ben@adida.net>
On May 13, 2009, at 20:18, Ben Adida wrote:

> Yes, it's important to validate theories. Henri's theory that  
> xmlns:foo would be impossible/difficult to parse correctly in text/ 
> html proved to be a fairly weak argument in practice (Google, Yahoo,  
> and my Firefox bookmarklet do just fine.)


Please don't mischaracterize what I have said in order to dismiss it.

I have said:

1) If xmlns:foo is parsed the way it's currently specced and  
implemented in Gecko, WebKit and Opera, a namespace-aware API  
representation of it is different for text/html and application/xhtml 
+xml DOMs. (This is not theory. It is a testable statement of fact.)  
If the namespace-aware representation is different for text/html and  
application/xhtml+xml DOMs, applications that support both need  
divergent code paths on the application layer to paper over the  
difference. Having divergent code paths is bad. (Browser-internal APIs  
in Gecko are in the namespace-aware category as are the Level 2 parts  
of DOM.)

2) Some namespace-aware representations don't allow an attribute with  
local name "xmlns:foo" in no namespace to be represented at all. XOM  
is such a representation. (It throws if you try to set such an  
attribute.)

3) Finding out whether it is feasible to change text/html parsing to  
make the data model representation of xmlns:foo match the  
representation parsed from application/xhtml+xml would involve  
shipping such an implementation, which means finding out entails non- 
trivial cost. (Note that I have not claimed that it would be  
impossible--just that the experiment has such cost characteristics  
that I think a microdata solution should avoid the need to perform  
that experiment.)

Your bookmarklet doesn't refute any of the above points because it  
uses a namespace-unaware API (DOM Level 1).

It is unclear if recent services unveiled by Google and Yahoo! refute  
any of my points. Do either of them support both text/html and  
application/xhtml+xml? If yes, is it known if they have divergent code  
paths internally? Is it known if they use conforming HTML and XML  
parsers?

Furthermore, according to Hixie, Yahoo! treats prefix as meaningful (http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2009Mar/0100.html 
), which is evidence *against* prefix-based indirection doing just fine.

So far, I haven't seen evidence showing whether Google implements  
CURIEs per spec (to the extent to which there is a spec; RDFa has not  
been specced for text/html) or whether they, too, give meaning to the  
prefix.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Thursday, 14 May 2009 06:36:46 UTC

This archive was generated by hypermail 2.3.1 : Monday, 29 September 2014 09:39:03 UTC