- From: <mizuno@pacific.net.sg>
- Date: Sun, 27 Aug 2000 18:41:05 -0700
- To: charles@w3.org
- CC: w3c-wai-ig@w3.org
Thank you Charles, for such a fast response. Yes Charles, I'll very much appreciate that you could make the information available, hope it's not too much of trouble to you. Just curiosity, why is it that you think it's impossible in specifying the algorithm? For tools, is there any recommendations that you would like to have for me? As I mentioned I'm in doing final year project in the school (School of Information Technology in Nanyang Polytechnic), this is the scenario I've thought of and wish to deploy. The objective of my project scope is to have an automated scheduled job updates for an finance information portal by deploying agents to the Web to search for all relevent as well as the latest finance updates. I have the intention to convert their search results(inclusive of the HTML documents) they ought to bring back, into XML docs, first it's for easy reference, second, an easier way for me to ask another agent to help me put up the information into my portal. I don't wish the poor agent guy to be confused on issues like which is a heading and which aren't and in the end, cannot get the job done. Also, it's for easy updates into the database for SQL retrieval. Basically that's what my scope requires. If there's any comment on how I should go about attaining my objective, please share with me. I'm more than willingly to hear opinions from all angles. Once again, thanks for your precious comment. ---------------------------------------------------------- Pei Ling, Tan Final Year Project Student at NYP mailto: mizuno@pacific.net.sg --- Original Message --- Charles McCathieNevile <charles@w3.org> Wrote on Sun, 27 Aug 2000 21:16:04 -0400 (EDT) ------------------ Pei Ling, There are several tools that produce XHTML (HTML that is written in XML) either as authoring tools or as conversion tools. They cannot recognise the headings if they are not specified, but if they are already marked up in HTML then it is trivial to ensure that they remain marked up in XML. A tool that would be interesting is one which looked at formatting properties used in a document and tried to identify headings and lists (and perhaps other structures). Normally people who use visual formatting instead of semantic srtructure in documents (be they HTML, word processor documents, or even handwritten ones) do so in a fairly regular way, and I believe that in most cases it would not be terribly difficult to specify an algorithm for guessing at the main structures. I actually tried to do this once, and if you are intersted I will try to find that work and make it available (it was only very rough notes, but I don't think it is an impossible project at all, and believe it would be valuable). On a related note, I read in a press release on several WAI lists today that Adobe are trying to make a version of Acrobat software that can do some of these things. cheers Charles McCN On Mon, 28 Aug 2000, Pei Ling wrote: Is there any existing agents in the market that's able to function as a converter for HTML to XML documents? Is it possible for the agent then to identify the headings or sub-headings from the content and accordingly translate the HTML code into XML? Any reply/comment would be very much appreciated. ---------------------------------------------------------- Pei Ling, Tan Final Year Project Student at NYP mailto: mizuno@pacific.net.sg ---------------------------------------- PS: Sign up for a free www.earth9.com account, start a Habitat and GET ORGANISED. -- Charles McCathieNevile mailto:charles@w3.org phone: +61 (0) 409 134 136 W3C Web Accessibility Initiative http://www.w3.org/WAI Location: I-cubed, 110 Victoria Street, Carlton VIC 3053 Postal: GPO Box 2476V, Melbourne 3001, Australia ----- Sent using MailStart.com ( http://MailStart.Com/welcome.html ) The FREE way to access your mailbox via any web browser, anywhere!
Received on Sunday, 27 August 2000 21:41:05 UTC