RE: Converter Agent from mizuno@pacific.net.sg on 2000-08-28 (w3c-wai-ig@w3.org from July to September 2000)

From: <mizuno@pacific.net.sg>
Date: Sun, 27 Aug 2000 18:41:05 -0700
To: charles@w3.org
CC: w3c-wai-ig@w3.org
Message-Id: <270800240.67260@203.117.188.5>
Thank you Charles, for such a fast response.

Yes Charles, I'll very much appreciate that you could make the
information available, hope it's not too much of trouble to you.

Just curiosity, why is it that you think it's impossible in specifying
the algorithm?

For tools, is there any recommendations that you would like to
have for me?

As I mentioned I'm in doing final year project in the school
(School of Information Technology in Nanyang Polytechnic), this
is the scenario I've thought of and wish to deploy.

The objective of my project scope is to have an automated scheduled
job updates for an finance information portal by deploying agents
to the Web to search for all relevent as well as the latest finance
updates. 

I have the intention to convert their search results(inclusive
of the HTML documents) they ought to bring back, into XML docs,
first it's for easy reference, second, an easier way for me to
ask another agent to help me put up the information into my portal.
I don't wish the poor agent guy to be confused on issues like
which is a heading and which aren't and in the end, cannot get
the job done. Also, it's for easy updates into the database for
SQL retrieval.

Basically that's what my scope requires. If there's any comment
on how I should go about attaining my objective, please share
with me. I'm more than willingly to hear opinions from all angles.

Once again, thanks for your precious comment.

----------------------------------------------------------
  
  Pei Ling, Tan
  Final Year Project Student at NYP
  mailto: mizuno@pacific.net.sg



--- Original Message ---
Charles McCathieNevile <charles@w3.org> Wrote on 
Sun, 27 Aug 2000 21:16:04 -0400 (EDT)
 ------------------ 
Pei Ling,

There are several tools that produce XHTML (HTML that is written
in
XML) either as authoring tools or as conversion tools.

They cannot recognise the headings if they are not specified,
but if they are
already marked up in HTML then it is trivial to ensure that they
remain
marked up in XML.

A tool that would be interesting is one which looked at formatting
properties
used in a document and tried to identify headings and lists (and
perhaps
other structures). Normally people who use visual formatting
instead of
semantic srtructure in documents (be they HTML, word processor
documents, or
even handwritten ones) do so in a fairly regular way, and I believe
that in
most cases it would not be terribly difficult to specify an algorithm
for
guessing at the main structures. I actually tried to do this
once, and if you
are intersted I will try to find that work and make it available
(it was only
very rough notes, but I don't think it is an impossible project
at all, and
believe it would be valuable). On a related note, I read in a
press release
on several WAI lists today that Adobe are trying to make a version
of Acrobat
software that can do some of these things.

cheers

Charles McCN

On Mon, 28 Aug 2000, Pei Ling wrote:

  
  
  Is there any existing agents in the market that's able to function
as a converter for HTML to XML documents? Is it possible for
the agent then to identify the headings or sub-headings from
the content and accordingly translate the HTML code into XML?
  
  Any reply/comment would be very much appreciated.
  
  ----------------------------------------------------------
  
  Pei Ling, Tan
  Final Year Project Student at NYP
  mailto: mizuno@pacific.net.sg
  
  ----------------------------------------
  PS: Sign up for a free www.earth9.com account, start a Habitat
and GET ORGANISED.
  

-- 
Charles McCathieNevile    mailto:charles@w3.org    phone: +61
(0) 409 134 136
W3C Web Accessibility Initiative                      http://www.w3.org/WAI
Location: I-cubed, 110 Victoria Street, Carlton VIC 3053
Postal: GPO Box 2476V, Melbourne 3001,  Australia 



-----
Sent using MailStart.com ( http://MailStart.Com/welcome.html )
The FREE way to access your mailbox via any web browser, anywhere!
Received on Sunday, 27 August 2000 21:41:05 UTC