W3C home > Mailing lists > Public > www-talk@w3.org > May to June 2009

UTF-8 vs CDATA

From: Magnus Henoch <magnus@erlang-consulting.com>
Date: Mon, 01 Jun 2009 15:47:05 +0100
To: www-talk@w3.org
Message-ID: <84prdoj9jq.fsf@linux-b2a3.site>
Xingdong and I just had an interesting battle with Erlang Web that
deserves to be documented... :)

We have a record, one of whose fields uses a custom wtype that outputs a
piece of Javascript to render its control.  It worked as long as we only
used ASCII text; with Unicode text it would fail, passing UTF-32 to
erlang:iolist_size inside e_mod_inets:controller_exec.  Obviously
something was failing to convert Xmerl's UTF-32 representation to the
UTF-8 that the external world uses.

After an hour of debugging, we realized that the problem was caused by
the <script> tag contents being wrapped in a CDATA section:

<script>
//<![CDATA[
...lots of javascript
//]]>
</script>

Because of this, Xmerl's parser would return an xmlText record with
type = cdata, and this triggers Erlang Web's special (non)treatment of
the text - and it was not translated.  Removing the CDATA marker fixed
the problem.

(Incidentally, this is OK since Erlang Web serves XHTML files as HTML,
where the contents of <script> tags are implicitly CDATA.)

So consider this a vote to change 'type = cdata' to 'type =
erlang_web_passthrough' for the special meaning.  That way it stands
less chance of interfering with proper usage of XML.  :)

-- 
Magnus Henoch, magnus@erlang-consulting.com
Erlang Training and Consulting
http://www.erlang-consulting.com/
Received on Monday, 1 June 2009 14:47:58 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 27 October 2010 18:14:31 GMT