HTML -> XML -> WML using jtidy from Martin Wickman on 2000-10-28 (html-tidy@w3.org from October to December 2000)

From: Martin Wickman <martin.wickman@infohwy.se>
Date: Sat, 28 Oct 2000 23:50:17 +0200
To: html-tidy@w3.org
Message-ID: <73f5977d.977d73f5@infohwy.se>

Hi

I am trying to convert some html documents to wml. As we know, 
beginning with a html format and then going to wml is not that easy. It 
would be more or less a piece of cake if I could start with a data 
representation (pure xmls) and the create wml, but no... not this time 
anyway.

I have tried using Java Jtidy to get a xml representation for the html 
documents, and sure enought it seems to work. At least I can pretty 
print the document and traverse it using the org.w3c.dom specification.

In other words I can now get an xml version of the document, but I am 
not sure what to do next, and I am thus reaching out to you guys. Here 
is some ideas I have:

1. Manipulate the xml DOM representation -- removing and replacing tags 
left and right until I get the wml I want. Seems easy enough, until I 
started looking at the documents... Lots and lots of "weird" html 
elements and attributes I need to transform. Things are nested as well, 
so I cant just 'delete BODY' to get rid of it :-)

-or-

2. Using XSL in some clever way to magically get my WML documents. I 
have looked into XSL, but am not really sure how to go about it. I know 
a bit about how XSL works, but I am not sure if I can use it to my 
means.

Do you fine folks know what might be the best way to do this? Maybe 
provide some useful pointers or additional ideas to help me out?

I do have some control over how the html document gets created, so I 
can force the people creating them not to use extremely weird stuff. 
This means I am NOT trying to create a 100% general solution which can 
handle any arbitrary html document

/Cheers!

Received on Saturday, 28 October 2000 17:48:10 UTC