- From: Robert Miner <RobertM@dessci.com>
- Date: Wed, 31 Mar 2004 10:53:05 -0600
- To: pzn04@yahoo.fr
- Cc: davidc@nag.co.uk, thabing@uiuc.edu, www-math@w3.org
Hi. You wrote: > Your links show me the 03c code for lt. I need the E084 number to > correct the Open Office output. > > Here is a summary of my problem, it is a follow up of previous exchanges > with Robert on that issue. You may check the messages on: > http://lists.w3.org/Archives/Public/www-math/2004Mar/0020.html > > I used to write my documents with MsWord. I then converted them into > xml/html. I also implemented the mathml. It worked. > > I am willing to use Open Office. However, the formulas that are inserted > with a math editor or with Open Office are extracted with codes > generated in the Unicode Private Area! > > My xml or html would display the ? marks instead of the expected > characters. > > My transformation is meant to be automatic, I cannot review each formula > in order to correct its code in the mathml. This is why I need to > transform the characters into a code that IE can display. > > In order to do this, I need the real UTF-8 code as explained by Robert. > Robert gave me the number for the < sign, I tried the transfomation and > I got the appropriate display with IE. I need to do the same thing for > other characters. > > Hope this is clearer. Could you please tell me about the real UTF-8 > code? I understand what you need. The problem is that I don't know how to get it, except by experimenting with Open Office, and that would be very tedious and error prone. Does anyone on this list know how to contact the people who maintain the MathML support in Open Office? I just spent 15 minutes or so searching OpenOffice.org, and couldn't find anyone to contact about their math editor... In any event, if you want to do the experiments on your own, I can tell you how to decode the raw UTF-8 output from the Open Office output to figure out what hex codepoint OO is putting out for a given character. First, look at the output in a binary editor that will show you the actual byte values. I typically use Visual C++ or emacs (which shows raw bytes in octal). Write out the byte in binary. Then, the key to the UTF-8 encoding is: Format of octets in a UTF-8 sequence Octet Format usage 1st of 1 0xxxxxxx 1st of 2 110xxxxx 1st of 3 1110xxxx 1st of 4 11110xxx 1st of 5 111110xx 1st of 6 1111110x 2nd-6th 10xxxxxx Thus, for E084, emacs shows the octal bytes from the Open Office output as \356 \202 \204. In binary that becomes 11101110 10000010 10000100. Looking at the first byte, we see it is the first of three, so we extract the significant digits from the three bytes to obtain 1110 000010 000100 Converting binary 1110000010000100 to hex gives E084. Hope this helps. --Robert ------------------------------------------------------------------ Dr. Robert Miner RobertM@dessci.com MathML 2.0 Specification Co-editor 651-223-2883 Design Science, Inc. "How Science Communicates" www.dessci.com ------------------------------------------------------------------
Received on Wednesday, 31 March 2004 12:04:41 UTC