[whatwg] [wf2] Note about XML attribute value handling

Anne van Kesteren wrote:
> Mikko Rantalainen wrote:
>><input type="hidden"> works just fine with existing UAs even when one 
>>uses application/xhtml+xml provided that all meaningful whitespace has 
>>been converted to entities. &#10; for LF, &#13; for CR and so on. PHP, 
>>for example, provides function htmlentities() exactly for this purpose.
> 
> Really? Why would 'htmlentities' be useful in an XML environment? Also, 
> getting HTML *entities* in your XML document doesn't seem like a good thing.

Yes, you're right that in some cases that results in problems. That's 
because htmlentities() returns entity references like "&quot;"; if it 
*only* returned numeric character references like "&#34;" it would work 
just fine despite the fact that it's designed for HTML. Those numbers 
refer to UNICODE character points. Als note that XML normalization rules 
state that "For a character reference, append the referenced character 
to the normalized value" but "For an entity reference, recursively apply 
step 3 of this algorithm to the replacement text of the entity." [1] 
Real world  user agent behavior might differ and that's what I'm really 
interested in.

That said, I'm using my own function to correctly encode special 
characters to numerical character references, but the point remains the 
same. It doesn't matter if you put stuff in the contents of an element 
or inside an attribute value; you *have to* encode the string to hide 
special characters. If you put the string inside an attribute, the 
encoding must also include all whitespace and quotation characters in 
addition to characters like "<", "&" and ">".

You cannot put a random string between <textarea> and </textarea> tags 
either and expect to get a valid XML fragment as a result.


[1] http://www.w3.org/TR/REC-xml/#AVNormalize

-- 
Mikko

Received on Tuesday, 10 May 2005 22:48:32 UTC