- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Sun, 13 Mar 2005 22:18:07 -0500
- To: Arjohn Kampman <arjohn.kampman@aduna.biz>
- Cc: public-rdf-dawg-comments@w3.org
- Message-ID: <20050314031807.GA10717@w3.org>
Clarification and notes -- this response was not considered by the DAWG: On Thu, Feb 03, 2005 at 04:10:58PM +0100, Arjohn Kampman wrote: > > Dear all, > > The SPARQL Protocol as described at [1] suggests that SPARQL queries are > going to be sent over the line as simple www-urlencoded strings. I would > like to point out that we have tried this approach in Sesame and that it > fails to handle multi-byte characters properly [2]. Main reason for this > is that the used %xx patterns cannot encode any byte values larger than > 255. > > In Sesame, we "solved" this issue by switching to multipart/form-data > encoded POST requests. I presume you are using the charset parameter [[ [2388] Each part of a multipart/form-data is supposed to have a content- type. In the case where a field element is text, the charset parameter for the text indicates the character encoding used. ]] and that the clients tend to encoding the characters in charsets that the servers tend to understand. I phrase it this way because I'm looking at the trade-offs between: - transaction-specified encoding. - transaction-specified encoding with manditory support for at least one common encoding. - fixed-encoding (eg. utf-8), the only one used by the protocol. What encodings do you RDQL servers support? noting related RFCs ('cause I need to write it down somewhere): [2045] MIME Part One: Format of Internet Message Bodies: transfer encodings interacting with character encodings. [2046] MIME Part Two: Media Types 4.1.2. Charset Parameter 5.1. Multipart Media Type [2388] Returning Values from Forms: multipart/form-data 4.5 Charset of text in form data > Main drawback of this solution is that we use > POST-requests all the time, even when GET-requests would be more > natural. The DAWG's Use Cases and Requirements [UC&R] has Addressable Query Results as a design objective. This was motivated by a TAG finding [GET]. [[ "Use GET if: * The interaction is more like a question (i.e., it is a safe operation such as a query, read operation, or lookup)." ]] > Another option would be to enforce an UTF-8 characters-to- > octets mapping to the query before adding it as a parameter value. We could also include the charset in the GET, but I'm hoping that the simplest approach (which I take to be fixed-encoding) will suffice. > Hope you can use this feedback to improve the protocol. > > Regards, > > Arjohn Kampman > > > [1] http://www.w3.org/TR/rdf-sparql-protocol/ > [2] http://www.openrdf.org/issues/secure/ViewIssue.jspa?key=SES-84 [2045] http://www.faqs.org/rfcs/rfc2045.html [2046] http://www.faqs.org/rfcs/rfc2046.html [2388] http://www.faqs.org/rfcs/rfc2388.html [UC&R] http://www.w3.org/TR/2004/WD-rdf-dawg-uc-20041012/ [GET] http://www.w3.org/2001/tag/doc/whenToUseGet.html -- -eric office: +81.466.49.1170 W3C, Keio Research Institute at SFC, Shonan Fujisawa Campus, Keio University, 5322 Endo, Fujisawa, Kanagawa 252-8520 JAPAN +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA cell: +81.90.6533.3882 (eric@w3.org) Feel free to forward this message to any list for any purpose other than email address distribution.
Received on Monday, 14 March 2005 03:18:07 UTC