W3C home > Mailing lists > Public > html-tidy@w3.org > October to December 2000

RE: HTML -> XML -> WML using jtidy

From: Rao, Rajesh <RRao@mportal.com>
Date: Sun, 29 Oct 2000 08:11:10 -0500
Message-ID: <B7BB4AC1259FD411844100D0B784ACB00543D0@MPORTAL_SRV2>
To: "'Martin Wickman'" <martin.wickman@infohwy.se>, html-tidy@w3.org

This might be of interest. Its probably not what you are exactly looking for
, but relevant.



-----Original Message-----
From: Martin Wickman [mailto:martin.wickman@infohwy.se]
Sent: Saturday, October 28, 2000 10:50 PM
To: html-tidy@w3.org
Subject: HTML -> XML -> WML using jtidy


I am trying to convert some html documents to wml. As we know, 
beginning with a html format and then going to wml is not that easy. It 
would be more or less a piece of cake if I could start with a data 
representation (pure xmls) and the create wml, but no... not this time 

I have tried using Java Jtidy to get a xml representation for the html 
documents, and sure enought it seems to work. At least I can pretty 
print the document and traverse it using the org.w3c.dom specification.

In other words I can now get an xml version of the document, but I am 
not sure what to do next, and I am thus reaching out to you guys. Here 
is some ideas I have:

1. Manipulate the xml DOM representation -- removing and replacing tags 
left and right until I get the wml I want. Seems easy enough, until I 
started looking at the documents... Lots and lots of "weird" html 
elements and attributes I need to transform. Things are nested as well, 
so I cant just 'delete BODY' to get rid of it :-)


2. Using XSL in some clever way to magically get my WML documents. I 
have looked into XSL, but am not really sure how to go about it. I know 
a bit about how XSL works, but I am not sure if I can use it to my 

Do you fine folks know what might be the best way to do this? Maybe 
provide some useful pointers or additional ideas to help me out?

I do have some control over how the html document gets created, so I 
can force the people creating them not to use extremely weird stuff. 
This means I am NOT trying to create a 100% general solution which can 
handle any arbitrary html document

Received on Sunday, 29 October 2000 08:15:31 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:38:48 UTC