W3C home > Mailing lists > Public > public-rdf-in-xhtml-tf@w3.org > May 2008

Re: 2 RDFa SPARQL Test Harness Issues

From: Manu Sporny <msporny@digitalbazaar.com>
Date: Sun, 18 May 2008 13:55:50 -0400
Message-ID: <48306DA6.6020503@digitalbazaar.com>
To: "Seaborne, Andy" <andy.seaborne@hp.com>
CC: RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>, Benjamin Nowack <bnowack@semsol.com>, Dave Beckett <dave@dajobe.org>

Seaborne, Andy wrote:
>> We could remove it - but it's valid[1][2] UTF-8, isn't it? Technically,
>> we should be able to feed that to SPARQL and the engine should deal with
>> it, right?
> 
> I am not an expert on Unicode - but not by my reading of the Unicode 
> - it's in the middle of the URL string.

I found some exact wording in RFC3629 to support your interpretation:

"It is important to understand that the character U+FEFF appearing at
any position other than the beginning of a stream MUST be interpreted
with the semantics for the zero-width non-breaking space, and MUST
NOT be interpreted as a signature."[1]

So, it is valid Unicode, but it's pre-pended to ASK - which is an
illegal SPARQL command per your implementation as you don't treat the
"zero-width non-breaking space" as valid whitespace.

> So, the parser it looks much like: "xASK ..." for some 
> character x and xASK is not legal at this point.

Right. Thanks Andy - we'll change TCs #60 and #108 to remove the BOM.

-- manu

[1] http://www.rfc-editor.org/rfc/rfc3629.txt

-- 
Manu Sporny
President/CEO - Digital Bazaar, Inc.
blog: DB Launches Medical Record Sales Service with Shepherd Medical
http://blog.digitalbazaar.com/2008/02/24/health2trade/
Received on Sunday, 18 May 2008 17:56:31 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Sunday, 18 May 2008 17:56:32 GMT