- From: Rick Jelliffe <ricko@gate.sinica.edu.tw>
- Date: Thu, 10 Jun 1999 10:51:09 +0800
- To: <html-tidy@w3.org>
From: Eric Blossom <eric@neomorphic.com> > I like Rick's suggestion. Tidy has always been a practical tool. This > suggestion looks pretty straight forward and would be helpful. Especially > considering the recent posts by folks having trouble on this issue. Rick, are > you volunteering to code up this enhancement? I'll check with my boss. Rick P.S. I have made a C interface header for DOM (level 1 XML) at http://www.ascc.net/~ricko/src/dom_interface.h if anyone is interested. It is just a big C struct, but I think it is useful to allow implementation-independent access to DOM data. If anyone is wondering how DOM might fit in with their all-C applications, it might be interesting. The big issue for C is that the DOM API uses 16 bit chars: there might be scope for a Son-of-DOM API for C programs that uses char, to prevent spurious conversions. I thought the issue that DOM uses objects and C is not an OOPL would be more important than it was; the problem for C is that IDL to C mappings programs all are written to generate networking stubs and that the CORBA IDL-to-C mapping does not include conventions for creating or releasing objects: DOM seems to assume that this will happen using the OOPLs normal create and release mechanism, but C of course does not have them; so there will always be scope for variant DOM APIs in C. Which is why I made the headers, to make it easy for programmers to use a common API for DOM in C, if they want. P.P.S. On the issue of writing parsers for XML and HTML, people may be interested in XXX "eXperimental Xml leXer" at http://www.ascc.net/xml/en/utf-8/xxxlex.html It is an early attempt to make a different parser architecture, the "notation processor", based on figuring out what a generic tokenizer routine for SGML/XML/HTML is, highly parameterized to cope with every kind of token: it is a state machine where every state is the same and the transition carries all the specifics. The advantage of this approach might be that the same lexer can be used to also parse other embedded notations: PI contents, JavaScript, CSS, DTDs, etc. Basing the parse on that kind of tokenizer may then allow the rest of well-formedness checking to be performed by a regular expression checker: again implemented using a generic routine that also can perform the XML content model and attribute validation: I am looking at whether it can reduce code size and complexity a real lot. There is discussion material at "XXX Notation Processors" http://www.ascc.net/xml/en/utf-8/xxxmodel.html and "Validate this! Content Models on other Targets" http://www.ascc.net/xml/en/utf-8/OtherValid.html
Received on Wednesday, 9 June 1999 23:02:58 UTC