W3C home > Mailing lists > Public > www-math@w3.org > March 2004

Re: Mathml display with IE

From: Robert Miner <RobertM@dessci.com>
Date: Wed, 31 Mar 2004 10:53:05 -0600
Message-Id: <200403311653.i2VGr5s31673@wisdom.geomtech.com>
To: pzn04@yahoo.fr
Cc: davidc@nag.co.uk, thabing@uiuc.edu, www-math@w3.org


Hi.

You wrote:

> Your links show me the 03c code for lt. I need the E084 number to
> correct the Open Office output.
> 
> Here is a summary of my problem, it is a follow up of previous exchanges
> with Robert on that issue. You may check the messages on:
> http://lists.w3.org/Archives/Public/www-math/2004Mar/0020.html
> 
> I used to write my documents with MsWord.  I then converted them into
> xml/html.  I also implemented the mathml.  It worked.
> 
> I am willing to use Open Office. However, the formulas that are inserted
> with a math editor or with Open Office are extracted with codes
> generated in the Unicode Private Area!
> 
> My xml or html would display the ? marks instead of the expected
> characters.
> 
> My transformation is meant to be automatic, I cannot review each formula
> in order to correct its code in the mathml.  This is why I need to
> transform the characters into a code that IE can display.
> 
> In order to do this, I need the real UTF-8 code as explained by Robert.
> Robert gave me the number for the < sign, I tried the transfomation and
> I got the appropriate display with IE.  I need to do the same thing for
> other characters.
> 
> Hope this is clearer.  Could you please tell me about the real UTF-8
> code?

I understand what you need.  The problem is that I don't know how to
get it, except by experimenting with Open Office, and that would be
very tedious and error prone.

Does anyone on this list know how to contact the people who maintain
the MathML support in Open Office?  I just spent 15 minutes or so
searching OpenOffice.org, and couldn't find anyone to contact about
their math editor...

In any event, if you want to do the experiments on your own, I can
tell you how to decode the raw UTF-8 output from the Open Office
output to figure out what hex codepoint OO is putting out for a given
character.  First, look at the output in a binary editor that will
show you the actual byte values.  I typically use Visual C++ or emacs
(which shows raw bytes in octal).  Write out the byte in binary.
Then, the key to the UTF-8 encoding is:

Format of octets in a UTF-8 sequence 

Octet		Format		
usage				
1st of 1	0xxxxxxx
1st of 2	110xxxxx
1st of 3	1110xxxx
1st of 4	11110xxx
1st of 5	111110xx
1st of 6	1111110x
2nd-6th         10xxxxxx
    
Thus, for E084, emacs shows the octal bytes from the Open Office
output as \356 \202 \204.  In binary that becomes 11101110 10000010
10000100.  Looking at the first byte, we see it is the first of three,
so we extract the significant digits from the three bytes to obtain

1110 000010 000100

Converting binary 1110000010000100 to hex gives E084.

Hope this helps.

--Robert


------------------------------------------------------------------
Dr. Robert Miner                                RobertM@dessci.com
MathML 2.0 Specification Co-editor                    651-223-2883
Design Science, Inc.   "How Science Communicates"   www.dessci.com
------------------------------------------------------------------
Received on Wednesday, 31 March 2004 12:04:41 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Saturday, 20 February 2010 06:12:56 GMT