- From: Martin Wickman <martin.wickman@infohwy.se>
- Date: Sun, 29 Oct 2000 14:14:15 +0100
- To: html-tidy@w3.org
From: Frank Steuer <steuer@ece.orst.edu> Thanks for your reply! (I have been working/debugging all night with this, so excuse any sleepishness :-) > I try to do the same job - but to achieve a general solution. > I also used jtidy and then xalan and xerces to transform the XML > documentsto wml (or cHTML, XHTML subsets or HTML subsets) via XSLT. That's my idea as well. Feels good to know there are others out there with the same problems. Btw, do you use the DOM representation or do you just "pipe" the tidied, prettyprinted output from jtidy to xalan etc? I have some reservations about the status of jtidys DOM support. jtidy happily parses my html into a DOM document. But when I try to traverse the DOM-tree to produce a textual output, I dont get all elements and some other stuff are missing as well. This makes me a bit suspicous. I have tried using my own prettyprinter, jdoms DOM output and a few others, but none of them produces the correct output. Maybe I am doing something wrong. Here is a snippet: Tidy tidy = new Tidy(); tidy.setXmlOut(true); tidy.setXmlPi(true); Document doc = tidy.parseDOM (in, null); tidy.pprint (doc, System.out); pprint() produces what looks like a correct XML representation. But if I use the Document doc object with jDOM or send it to the print() method in the example class (TestDOM from sourceforge) it prints nothing. The html test document is wellformed and as simple as possible. My original idea was to use the DOM representation and then call xalan with a XSL stylesheet. If that wont work I guess I have to parse the tidied XML string again using another XML parser. > It works - more or less. The problems I have is that I try to > transcodedocuments I do not have any control about. (lots of > errors, headings used as layout tool and not to define the > structure of an document etc....) I know the feeling, unfortunately I cannot give you any helpful hints. But if you manage to get it to work and GLP it, it would be a huge donation to the opensource community and http://www.kannel.org in particular :-) > One of the problems I still have to solve is the splitting of big xml > documents in several decks and cards. Here you should not have > that big problem, because you said that you have a kind of control > about how the html documents are written. Sure enough I will face that problem as well. But I dont think splitting documents into several cards will solve the low-memory issues (afaik, a deck is sent with all cards at the same time?). I guess that the files will have to be splitted up somehow anyway, inserting 'Next section...' and 'Previous section...' tags. > I would try XSL(T). It is pretty easy and by changing the XSL > stylesheetsyou can try to get the wanted output. You don't have to > change the application, recompile it to java bytecode etc. I have started writing some XSLT files for the XML/HTML to WML conversion. > I will publish the results of my work pretty soon as GPLed source. > Right now it does not make sense because it is to much under > construction and not documented yet. Great. I'll be watching this space.
Received on Sunday, 29 October 2000 08:12:10 UTC