Re: Semantic meaning of double quotation marks delimiting quoted-string from Geoffrey Sneddon on 2007-10-28 (ietf-http-wg@w3.org from October to December 2007)

From: Geoffrey Sneddon <foolistbar@googlemail.com>
Date: Sun, 28 Oct 2007 21:39:53 +0000
To: Julian Reschke <julian.reschke@gmx.de>
Cc: ietf-http-wg@w3.org
Message-Id: <00A04396-B61F-48EE-96D5-94958E09D6CD@googlemail.com>

On 28 Oct 2007, at 20:53, Julian Reschke wrote:

> Geoffrey Sneddon wrote:
>>> The simple answer is: the double quotes are part of the entity  
>>> tag. So a response header such as
>>>
>>>     ETag: x
>>>
>>> would simply be invalid and should be ignored.
>> I am aware ? but how is the receiving end meant to deal with them?  
>> Is it meant to keep the quotation marks around any quoted-string,  
>> even when that therefore results in non-exist things like a  
>> character set called "UTF-8" (with quotes)? Or does the behaviour  
>> need to be specific to each and every use of quoted-string need to  
>> have it defined separately?
>
> That may be the case.

I was referring to my original example, i.e.,

> If you don't [parse the quotation marks out], you can end up with  
> character sets such as "UTF-8" (i.e., including the quotation  
> marks) in headers like Content-Type: text/plain;charset="UTF-8".

Which really is the question: what are we meant to do with the  
delimiting quotation marks in quoted-string?

If we take UTF-8 as a string, we can escape this as a quoted-string  
in several ways, including:

- "UTF-8"
- "\U\T\F\-\8"

Now, are we meant to unescape every quoted-string we come across  
(therefore including entity-tag), or only some? I think we can all  
agree that "\U\T\F\-\8" is not, in itself, a valid character set. If  
only some, which? As it stands now, it is not clear if you should  
ever unescape them.

- Geoffrey.

Received on Sunday, 28 October 2007 21:40:13 UTC