Re: Multi-line meta attributes

> In-Reply-To: <20011030113041.EE974C@maiana.inrialpes.fr> 
>  
> I've come up with two related fixes for the expat parser, following your
> suggestion, Irene, to trace the function EndOfAttributeValue () in the 
> module Amaya/amaya/Xml2thot.c. One issue appears to be a bug in expat, 
> so I'll present it first. The symptom is that multiple spaces are 
> incorrectly preserved in attribute values. 
>  
> This line of code is clearly intended to suppress multiple spaces, in 
> libwww\modules\expat\xmlparse\xmlparse.c line 2815, in function 
> appendAttributeValue(). The logic gets short-circuited as currently 
> written: 
>  
> if (!isCdata && (poolLength(pool) == 0 || poolLastChar(pool) == 0x20)) 
>  
> The second test above is skipped due to the expansion of the poolLength() 
> and poolLastChar() preprocessor #defines. So two additional sets of 
> parenteses are needed, as follows: 
>  
> if ((!isCdata && (poolLength(pool) == 0) || (poolLastChar(pool) == 0x20))) 
>   
> By adding the parenteses, multiple spaces are suppressed, as intended, when
> parsing the following sample input: 
>  
>   <meta name="description" content="Software design &amp; 
>    consulting for workstations, servers, &amp; embedded firmware. 
>    Systems programming, quality-crafted applications and 
>    enhancements for Internet, Linux, Windows." /> 
>  
> If you would please test and verify this fix, then I'll post the related 
> fix for preserving linefeeds. Thank you. 
>  
> Regards, 
> Marc 

Hello Marc,
Thanks for your contribution.

I think that Expat is right according to the XML specification. The section 
3.3.3 (Attribute-Value Normalization) says :
"
Before the value of an attribute is passed to the application or checked for 
validity, the XML processor must normalize the attribute value by applying the 
algorithm below, or by using some other method such that the value passed to 
the application is the same as that produced by the algorithm.

.... (algorithm)...

If the attribute type is not CDATA, then the XML processor must further 
process the normalized attribute value by discarding any leading and trailing 
space (#x20) characters, and by replacing sequences of space (#x20) characters 
by a single space (#x20) character.
All attributes for which no declaration has been read should be treated by a 
non-validating processor as if declared CDATA.
"
As Expat is a non-validating processor, he treats attributes as if they were 
declared as CDATA and preserves multiple spaces.

In the other hand, the XHTML 1.0 specification says (section 4.7 Whitespace 
handling in attribute values)
"
In attribute values, user agents will strip leading and trailing whitespace 
from attribute values and map sequences of one or more whitespace characters 
(including line breaks) to a single inter-word space (an ASCII space character 
for western scripts). See Section 3.3.3 of [XML].
"
That means that the user agent (Amaya) has particularly to suppress multiple 
whitespace characters in attributes values, that is not the case today. I will 
modify the code of Amaya to cope with this feature as soon as possible (and 
after the next release planned for the next week).

Regards 

Laurent Carcone
Amaya team


  

Received on Monday, 10 December 2001 10:54:08 UTC