Re: xsl transformation and special characters

-------- Original Message --------
Subject: Re: xsl transformation and special characters
Date: Fri, 04 Jul 2003 14:25:00 +0200
From: Paul Libbrecht <paul@activemath.org>
To: Morten Andersen <mortena@mip.sdu.dk>
References: <5.2.1.1.0.20030630150751.01c89c18@mailhost.mip.sdu.dk>

Morten,

Your bug seems similar to the bug we have encountered. It goes as follows:

The HTML specifications do NOT give a way to specify the encoding of the
data being sent to the script/servlet/... The reason... the encoding
attribute of the form element is meant to contain this stupid default
value x-www-url-form-encoded.
This "encoding" is just a low level way of transmitting bytestreams and
tells that the byte 2E should become %2E and...

However, it is not clear which text-encoding should be used to prepare
this bytestream.

The current practice we have observed was that the browser sent the text
using the same encoding as the page it was originating from (some
browsers have a setting for this even).

What you are seeing seems to be a utf-8 translation of the given
letters. You're now left with decoding this stream.

This is to be added with a bug in Tomcat we have faced, namely that
Tomcat has also jumped over that bit of specification (which anyways is
needed from all other sides): and the request.getParameter(paramName)
responds a String made of characters whose first bytes are the bytes of
this x-www-url-form-encoded.
We had, thus, to pipe these request.getParameter() through a UTF-8
java.io.InputStreamReader and we are now able to accept russian, math,
and just about anything Unicode 3 in our forms (this is in the
ActiveMath project).

Hope that helps.

Paul


Morten Andersen wrote:
> Dear mathml experts
> 
> I'm trying to develop an application where text and mathematics can be 
> edited online using a browser. I've runned into a few problems doing that:
> 
> I've made a XHTML page, where the end-user should be capable of editing 
> a text containing special letters like the danish æ, ø an å. This page 
> is rendered just fine, but something happens as the text is sent from 
> the textarea to the server that translates the letters:
> 
>     * æ  > æ
>     * ø   > gø 
> 
> 
> This does not happen in a html page... First as I started using XML the 
> problem occurred. Here is part of the XML page:
> 
> 
> <?xml version="1.0" ?>
> <?xml-stylesheet type="text/xsl" href="pmathml.xsl"?>
> <!--
>   pref:renderer="techexplorer-plugin"
>   pref:renderer="techexplorer"
>   pref:renderer="css"
>   pref:renderer="mathplayer"
>   pref:renderer="mathplayer-dl"
> -->
> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="da">
> .....
> 
> 
> <TEXTAREA CLASS="editor">
> æøå
> </TEXTAREA>
> ...
> 
> 
> As I submit the form with the Textarea, the letters: æ,ø and å are 
> translated. But only in the browsers: ie6 and opera, not in Netscape or 
> Mozilla.
> 
> It seems that the xsl transformations are causing the problems with the 
> special characters. So I tryed a  version like the one beneath:
> 
> <?xml version="1.0"?>
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 plus MathML 2.0//EN"
>                "http://www.w3.org/Math/DTD/xhtml-math11-f.dtd">
> 
> <html xmlns="http://www.w3.org/1999/xhtml" 
> xmlns:m="http://www.w3.org/1998/Math/MathML">
>  <head>
>    <OBJECT ID="MathPlayer" 
> CLASSID="clsid:32F66A20-7614-11D4-BD11-00104BD3F987"></OBJECT>
>    <?IMPORT NAMESPACE="m" IMPLEMENTATION="#MathPlayer"?>
> 
> This solves the problem with special character inputs but this causes 
> the following problems:
> 
>     * I have to write the mathml like: <m:math>... instead of: <math
>       xlms="..." >, this is a problem if I want to make it possible to
>       copy mathml from programs like Mathematica,
>     * I can't use the WEB-EQ input-control without writing a class-id
>       statement in the header.
>     * I'm not sure tha I can use both mathplayer and input-control on
>       the same page. 
> 
> 1) If I use xsl transformations the user-input from html forms in 
> Internet Explorer and Opera translates special characters like æ,ø and 
> mathematical symbols like the integral symbol before they reach the server.
> 2) If I use the  <HTML doctype way then I have to write the mathml like: 
> <m:math>... instead of: <math xlms="..." >, this is a problem if I want 
> to make it possible to copy mathml from programs like Mathematica, plus 
> I can't use the WEB-EQ input-control without writing a class-id 
> statement in the header.
> 
> I think the solution could be something with entering some extra xsl 
> files in the xsl way or translating the mathml on the server, depending 
> on whether it must be edited or viewed.
> 
> How can this be solved?
> 
> Thanks
> 
> Morten Andersen
> Denmark

Received on Friday, 4 July 2003 08:32:09 UTC