Re: Is this something for the primer? from Gregg Kellogg on 2011-04-20 (public-rdfa-wg@w3.org from April 2011)

From: Gregg Kellogg <gregg@kellogg-assoc.com>
Date: Wed, 20 Apr 2011 13:15:47 -0400
To: Ivan Herman <ivan@w3.org>
CC: W3C RDFWA WG <public-rdfa-wg@w3.org>
Message-ID: <89AB908B-31F4-42AF-BA4A-1E6F3B0A49B7@kellogg-assoc.com>
It would be useful to try to have pathological test cases in the RDFa Test Suite, could you add a test or two?

Also, regarding use of @profile vs @prefix, it was my understanding that best practices would have the serialization not rely on the content of @profile which could change in the future and potentially change the semantics of the document. My serializations always set @prefix for any explicitly used prefixes. If this is the case, then this could still cause a sniffing problem. Perhaps the primmer should encourage setting minimally-defined @prefix on the <head> and <body> elements instead of the <html> element in order to make the <meta> more likely to be in scope of the sniffer.

Gregg

On Apr 20, 2011, at 4:45 AM, Ivan Herman wrote:

> I have run, recently, into a nasty bug in my implementation; I wonder whether the reasons for this bug should be added to the primer. Not sure...
> 
> The manifestation of the bug: in some cases the encoding (ie, the UTF-8 encoding) of literals went wrong.
> 
> The reasons were not any of the obvious bugs (missing encoding on output, stuff like that) though of course this is where I started. I then realized that this goes wrong only when I use the HTML5 parser and it looked a bit random at first... To make the long story short, it is related to content sniffing.
> 
> What happens (I guess) is the following: the HTML5 parser ignores the possible <? instruction for encoding (which is of course o.k.); instead, it looks into the header to see if a <meta> for encoding is present or not. If it finds it, it goes for the encoding specified there, otherwise it falls back to the default, which is the windows encoding. That being said, the parser can be started with an explicit encoding parameter specifying the encoding, in which case that is used.
> 
> Why does it go wrong? If one has an RDFa source with lots of prefix definition in the HTML element, then sniffing may go wrong because sniffing looks at the first ??? characters only (I do not remember the number from the top of my head). And that was the reason of my bug.
> 
> What I did to counter the bug is that I look now into the return HTTP header (which I did already), and if I find an encoding there, I use it as an explicit encoding for the parsing. But that, of course, presupposes that the content encoding and the HTTP return is in synchrony (which should be the case, but, well...). And, of course, this does not work for local files (which I simply consider as UTF-8).
> 
> So... we have a potential for practical problems here. This is alleviated by the default profile mechanism because many of the prefix definitions may become unnecessary (foaf, rdf, etc), and the profile mechanism in general. But there is a potential problem nevertheless.
> 
> What the primer could say is to draw attention to this problem and give the advice to concentrate the prefix definitions on the <body> element instead of the <html> element. If done there, no problem occurs.
> 
> Opinions?
> 
> Ivan
> 
> P.S. Yes, it sounds simple once solved, but it took me about an hour or maybe more to realize that this was the problem! It was a way for me to fight against jet-lag, I realized the problem in my hotel room in Hyderabad...
> 
> ----
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> PGP Key: http://www.ivan-herman.net/pgpkey.html
> FOAF: http://www.ivan-herman.net/foaf.rdf
> 
> 
> 
> 
> 
>
Received on Wednesday, 20 April 2011 17:16:33 UTC