- From: Manu Sporny <msporny@digitalbazaar.com>
- Date: Sun, 18 May 2008 13:55:50 -0400
- To: "Seaborne, Andy" <andy.seaborne@hp.com>
- CC: RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>, Benjamin Nowack <bnowack@semsol.com>, Dave Beckett <dave@dajobe.org>
Seaborne, Andy wrote: >> We could remove it - but it's valid[1][2] UTF-8, isn't it? Technically, >> we should be able to feed that to SPARQL and the engine should deal with >> it, right? > > I am not an expert on Unicode - but not by my reading of the Unicode > - it's in the middle of the URL string. I found some exact wording in RFC3629 to support your interpretation: "It is important to understand that the character U+FEFF appearing at any position other than the beginning of a stream MUST be interpreted with the semantics for the zero-width non-breaking space, and MUST NOT be interpreted as a signature."[1] So, it is valid Unicode, but it's pre-pended to ASK - which is an illegal SPARQL command per your implementation as you don't treat the "zero-width non-breaking space" as valid whitespace. > So, the parser it looks much like: "xASK ..." for some > character x and xASK is not legal at this point. Right. Thanks Andy - we'll change TCs #60 and #108 to remove the BOM. -- manu [1] http://www.rfc-editor.org/rfc/rfc3629.txt -- Manu Sporny President/CEO - Digital Bazaar, Inc. blog: DB Launches Medical Record Sales Service with Shepherd Medical http://blog.digitalbazaar.com/2008/02/24/health2trade/
Received on Sunday, 18 May 2008 17:56:31 UTC