W3C home > Mailing lists > Public > html-tidy@w3.org > April to June 1999

Re: Asp server side script translations

From: Rick Jelliffe <ricko@gate.sinica.edu.tw>
Date: Thu, 10 Jun 1999 10:51:09 +0800
Message-ID: <000f01beb2ec$18c8ee80$dd066d8c@sinica.edu.tw>
To: <html-tidy@w3.org>
 From: Eric Blossom <eric@neomorphic.com>

> I like Rick's suggestion. Tidy has always been a practical tool. This
> suggestion looks pretty straight forward and would be helpful. Especially
> considering the recent posts by folks having trouble on this issue. Rick,
> you volunteering to code up this enhancement?

I'll check with my boss.


P.S. I have made a C interface header for DOM (level 1 XML) at
http://www.ascc.net/~ricko/src/dom_interface.h if anyone is interested.
It is just a big C struct, but I think it is useful to allow
access to DOM data. If anyone is wondering how DOM might fit in with their
all-C applications, it might be interesting.  The big issue for C is that
the DOM API uses 16 bit chars: there might be scope for a Son-of-DOM API for
C programs that uses char, to prevent spurious conversions.
I thought the issue that DOM uses objects and C is not an OOPL would be more
important than it was; the problem for C is that IDL to C mappings programs
all are written to generate networking stubs and that the CORBA IDL-to-C
mapping does not include conventions for creating or releasing objects: DOM
seems to assume that this will happen using the OOPLs normal create and
release mechanism, but C of course does not have them; so there will always
be scope for variant DOM APIs in C. Which is why I made the headers, to make
it easy for programmers to use a common API for DOM in C, if they want.

P.P.S. On the issue of writing parsers for XML and HTML, people may be
interested in XXX "eXperimental Xml leXer" at
http://www.ascc.net/xml/en/utf-8/xxxlex.html  It is an early attempt to make
a different parser architecture, the "notation processor", based on figuring
out what a generic tokenizer routine for SGML/XML/HTML is, highly
parameterized to cope with every kind of token: it is a state machine where
every state is the same and the transition carries all the specifics.
The advantage of this approach might be that the same lexer can be used to
also parse other embedded notations: PI contents, JavaScript, CSS, DTDs,
Basing the parse on that kind of tokenizer may then allow the rest of
well-formedness checking to be performed by a regular expression checker:
again implemented using a generic routine that also can perform the XML
content model and attribute validation: I am looking at whether it can
reduce code size and complexity a real lot.
There is discussion material at "XXX Notation Processors"
http://www.ascc.net/xml/en/utf-8/xxxmodel.html and  "Validate this! Content
Models on other Targets" http://www.ascc.net/xml/en/utf-8/OtherValid.html
Received on Wednesday, 9 June 1999 23:02:58 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:38:46 UTC