Re: [CSS21] Escaping characters (comment, editorial) from Alan Gresley on 2011-04-06 (www-style@w3.org from April 2011)

From: Alan Gresley <alan@css-class.com>
Date: Thu, 07 Apr 2011 04:48:03 +1000
To: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
CC: timeless <timeless@gmail.com>, www-style@w3.org
Message-ID: <4D9CB563.1020802@css-class.com>
On 6/04/2011 10:58 PM, Leif Halvard Silli wrote:
> Alan Gresley, Tue, 05 Apr 2011 02:21:16 +1000:
>> On 4/04/2011 4:18 AM, Leif Halvard Silli wrote:
>>> timeless, Sun, 3 Apr 2011 15:51:22 +0300:
>>>> On Sun, Feb 20, 2011 at 7:14 PM, Leif Halvard Silli wrote:
>
>>>>>    Insert the wording "backslash newline escape" etc:
>>>>>      "First, inside a string, a plain backslash newline escape (backslash
>>>>>       followed by newline) cancels the meaning of the newline so that
>>>>>       the string is deemed to not contain whether backslash or newline."
>    [variant b]
>>>       "is deemed to not contain whether backslash nor newline."
>>> It is meant to match sentences such as this one: "we do not know
>>> whether it will cost nor how much", see
>    [variant c]
>>>       "deemed not to contain the backslash nor the newline."
>
>> I don't think that is what the spec is saying in 4.1.3.
>
> May be because you didn't got the fine points of was it said? See below.
>
> [ snip ]
>
>> If white-space was added, then this,
>>
>>      a[title="a not s\
>>   o very long title"] {background:green}
>> becomes.
>>      a[title="a not s o very long title"] {background:green}
>>
>> I see this above as different wording.
>
> Note the words "so that": "cancels the meaning of the newline _so that_
> the string is deemed to not contain whether backslash or newline".


You do mean 'neither' instead of 'whether'?


>>>>> First, inside a string, a plain backslash newline escape
>>>>> (backslash followed by newline) cancels the meaning of the
>>>>> newline so that  the string is deemed to not contain whether
>>>>> backslash or newline."
>>
>> An escape (backslash) does not cancel the meaning of a newline since
>> an escape at the end of the current line causes a newline to appear.
>
> Again, note that "so that" begins a phrase which describes what
> "cancels" means. The "special meaning" that newline has inside a string
> is that it is illegal. But with \ before it becomes legal as well as
> ignored.


This is not true. There are two definition of a newline. One is the 
escape newline combo. The other definition of a newline ('U+0085') is 
'that which is outside a string'. Each line below is a newline 'U+0085' 
(which is outside a string).

     <p title="long title">...</p>


l\
o\
n\
g\
\
  \
\
t\
i\
t\
l\
e\
"
] { background:green }


Note how I can only add extra escape newline combos before and after 
white-space.


You state this:

> The "special meaning" that newline has inside a string
> is that it is illegal. But with \ before it becomes legal as well as
> ignored.


If this was true, then the above string could appear like this.


     a[title="\
l\
\
o\
\
n\
\
g\
  \
t\
\
i\
\
t\
\
l\
\
e\
"
] { background:green }


Take this example.


/* escape below */
/

/* newline above */



How can both the escape ending one line and a newline at the beginning 
of a newline cancels the meaning of the literal newline 'U+0085' that is 
outside of a string?


>> You can not have "backslash followed by newline" neither since it is
>> the backslash that has causes a newline to appear.
>
> Inside "a string", no newline appears.


It does since I can press ENTER on a keyboard for a newline 'U+0085' 
(which is outside a string).


> It doesn't disappears either. It
> is simply "deemed" to not be there.


Yes this is true but the same pseudo disappearance applies equally to 
the escape. It's like neither existed in a string.


     ie. between '"..."' or ''...''.


>> I believe the
>> below word wording is more to the point.
>>
>>    >  First, inside a string, an escape (backslash) followed
>>    >  by a newline cancels the meaning of both the escape and
>>    >  the newline (i.e., the string is deemed not to contain
>>    >  either the backslash or the newline).
>
> An escape cannot cancel the meaning of itself. ;-) The special escape
> character meaning of \ is not cancelled  ...


The special meaning of both tokens are canceled. Not just one. Please 
see above.


>> but only Firefox 3.6~4b parses both of theses.
>>       \
>> div { background: green }
>    [ snip ]
>> If a backslash followed by a newline stands for itself outside a
>> string, then the above statements which Firefox 3.6~4b parses would
>> appear like this I think.
>>
>>       \div { background: green }
>
> No. Outside a string, the newline becomes a space:
>
>         \ div{}


No it doesn't. Now you saying that an escape can create white-space. 
Please try this test.


<!DOCTYPE html>
<style type="text/css">
div::after {content: '\
div' }
</style>
<div>X</div>


There is no white-space so all we have is just this '/div' and the 
generated contents has 'Xdiv'. Now try this.


<!DOCTYPE html>
<style type="text/css">
div::after {content: '\000A
div' }
</style>
<div>X</div>


The generated contents now has white-space 'X div'


> The Firefox bug is that it does not discern between inside and outside
> a string. There ought to be a test case to catch that.


That's what my test cases are showing. Here are some more. These are 
newlines outside of strings.


http://css-class.com/test/css21testsuite/newline-001.xht

http://css-class.com/test/css21testsuite/newline-002.xht



>> With '*', we get something different.
>>
>> \* { background: green } /* WebKit parses */
>
> Yes. I have asked for a test case for that thing.
>
> http://lists.w3.org/Archives/Public/public-css-testsuite/2011Feb/0089.html


I saw that message. That is when I stared testing. Did you see this 
message on this list?


http://lists.w3.org/Archives/Public/www-style/2011Mar/0555.html


>> For any other browser, the remainder of the style sheet is thrown out
>> unless it something like 'p' is pulled in behind like what happen
>> with '\p { background: green }'.
>>
>> The nesting of identifiers is important it seems.
>>      body \0064\0069\0076 { background: lime } /* parsed */
>>      \0062\006F\0064\0079 div { background: lime } /* dropped */
>
> There needs to be one more space before the div. Read the spec - I'll
> just quote one bit:
>
> ]]  Note that this means that a "real" space after the escape sequence
> must be doubled. [[
>
> Thus:
>       \0062\006F\0064\0079  div { background: lime } /* parsed */


Can you point me to this spec please. I can not find those words. I will 
need to re-test these examples.



> In theory, this should have worked, though, but doesn't:
>   \000062\00006F\000064\000079 div { background: lime } /*dropped*/
> Above there are 6 digits in each escape, and thus the double space
> ought not to have been necessary. But, alas, it seems to be needed.
>
> Some place you also had these example:
>
>>      \p { background: green } /* parsed */
>>      \ p { background: green } /* dropped */
>>
>>      \div { background: green } /* dropped */
>>      \ div { background: green } /* dropped */
>
> The difference between \p and \div is related to the fact that the
> Unicode escapes uses the characters in the 0-9 and A-F range.


I am very aware of this.


http://archivist.incutio.com/viewlist/css-discuss/115350


> Thus, in
> case of \div, then it looks to the user agent as an incorrect/illegal
> escape sequence. I don't know if the spec talks about that problem, but
> at least it says this:
>
>   ]] If a character in the range [0-9a-fA-F] follows the hexadecimal
> number, the end of the number needs to be made clear. [[k
>
> You can also try this document, to check how<f>  and<g>  are treated
> differently:
>
> <!DOCTYPE html><style>\f,\g{background: lime;}</style><f
> class=f>f</f><g class=g>g</g>
>
>
>   [...]
>
>> I would like to answer the rest of the email but I have already spent
>> about 10 hours on this one and testing. I will reply more in part
>> later.
>
> Bear in mind that this letter is meant to be an *editorial* comment.


And that is what I am making. Showing again one of the examples which 
only Firefox 3.6~4b parses.

\0064\
\0069\0076 { \0062\
\0061\0063\006B\0067\0072\006F\0075\006E\0064: \0067\
\0072\0065\0065\006E }


Firefox 3.6~4b is treating this,


\0064\
\0069\


as an escape inside a string. this since wrong since '\0064\0069\0076' 
('div') is a identifier.




-- 
Alan http://css-class.com/

Armies Cannot Stop An Idea Whose Time Has Come. - Victor Hugo
Received on Wednesday, 6 April 2011 18:48:39 UTC