UTF-8 vs CDATA

Xingdong and I just had an interesting battle with Erlang Web that
deserves to be documented... :)

We have a record, one of whose fields uses a custom wtype that outputs a
piece of Javascript to render its control.  It worked as long as we only
used ASCII text; with Unicode text it would fail, passing UTF-32 to
erlang:iolist_size inside e_mod_inets:controller_exec.  Obviously
something was failing to convert Xmerl's UTF-32 representation to the
UTF-8 that the external world uses.

After an hour of debugging, we realized that the problem was caused by
the <script> tag contents being wrapped in a CDATA section:

<script>
//<![CDATA[
...lots of javascript
//]]>
</script>

Because of this, Xmerl's parser would return an xmlText record with
type = cdata, and this triggers Erlang Web's special (non)treatment of
the text - and it was not translated.  Removing the CDATA marker fixed
the problem.

(Incidentally, this is OK since Erlang Web serves XHTML files as HTML,
where the contents of <script> tags are implicitly CDATA.)

So consider this a vote to change 'type = cdata' to 'type =
erlang_web_passthrough' for the special meaning.  That way it stands
less chance of interfering with proper usage of XML.  :)

-- 
Magnus Henoch, magnus@erlang-consulting.com
Erlang Training and Consulting
http://www.erlang-consulting.com/

Received on Monday, 1 June 2009 14:47:58 UTC