- From: Peter Murray-Rust <Peter@ursus.demon.co.uk>
- Date: Fri, 16 May 1997 13:29:31 GMT
- To: w3c-sgml-wg@w3.org
This is a general response to the three proposals already in (and the US isn't fully awake yet :-) - In message <9374.199705161115@grogan.cogsci.ed.ac.uk> "Henry S. Thompson" writes: > > Constructing the full vanilla DTD is left as an exercise for the > reader :-). > Constructing the full processing software is left as an exercise for the MCSGS... [... rest of proposal omitted ...] I read these proposals from the point of view of an *implementor*, albeit one with no CS training, and would like proposers to keep implementation very much in their minds. I still believe in the idea that *an* XML infrastructure can be built by individuals in the virtual XML community and, indeed, that this is critical in working out the language as we go along. Looking at the proposals so far, I feel some way from being clear how they would be implemented in conjunction with what we have at present. That's not to say they aren't the right way forward, but simply that we must bear this in mind (I've been involved with other language developments that crashed because they were too expansive). There are semantics in XML-LINK (which I have implemented in JUMBO) which have not been fully tested and I, at least, don't find completely trivial. It's worth remembering that apparently simply concepts such as parameter entities and whitespace take a LOT of care to define precisely. If they aren't so designed, it's highly probable that different implementations will produce different results. <TERMINOLOGY> Are the proposals (SD[1-5]) seen as part of XML-lang, or are they a new XML-name? If the former, we have a *lot* of work to do before XML-lang is finalised; if the latter, then documents written to these proposals may break XML-lang software and will in any case require additional processing. </> Let me suggest the architecture which will be necessary for a *generic* XML processing system (i.e. without a browser, without stylesheet, without any domain-specific stuff.) [In JUMBO this is localised in *part* of a package called jumbo.sgml. It might be possible to separate it further]. Some of the solutions proposed appear either to be incompatible with SGML and/or to produce documents which may break current XML (and SGML) parsers. An obvious example is namespace collisions between elementTypes in DTDs - I shall focus on this example. At present I see three modules as being required in a generic vanilla XML system (e.g. w/o browser): pre-parser -> parser -> post-parser <NOTE> If this model is simplistic or inappropriate, please say so and suggest another :-) </> The parser is the simplest to start with and can be exemplified by (n)sgmls, Lark or NXP. It takes an XML-compliant document as described in the spec and may validate it. (Throughout this diussion I shall assume that validation at various levels is a requirement). It then may produce output which is completely undefined by the spec, but three examples are Esis (NXP), abstract tree (Lark) and groves. Please correct me, but I believe that all three cover the same space in the pipeline above. If a document is prepared to specifications like the later proposals, then it *may* not be XML- or even SGML- compliant. [This is something that needs elaborating.] If it is not XML-compliant, then either (a) the parsers need redefining and may require a context-sensitive grammar or (b) a pre-parser is required that converts the input to be XML-compliant. Assuming the latter, it might take multiple DTDs and expand their GIs to be more fully qualified, e.g. <!-- part of CML DTD --> <!Element VAR (#PCDATA)> gets preprocessed to <!Element cml.VAR (#PCDATA)> before it is read in to the parser. This may be manageable, but it's not trivial to keep it all together. The post-parser seems essential. We are seeing proposals for additional elements, PIs, multiple attributes etc, which have to be processed at a considerable level of complexity. XML-link defines an inheritance mechanism and it's critical that everyone does this the same way - at present I'm not aware of any implementations of XML-link other than JUMBO, so I can't check my interpretation. Proposals for multiple inheritance worry me ('Java in a Nutshell' says (p77) "Multiple inheritance opens up a can of worms") and I would probably give up if it were a requirement. The post-parser has a great deal to do. XML-link defines a number of syntactic constructs that will not be checked by the DTD and yet any reputable processor should check (shouldn't it?). Then there are the transformations, and the new proposals are implicitly describing a complex set of these. So, are we still agreed that it should be easy to implement XML-name? Or are we expecting that it needs teams of programmers and will be left to one or two major enterprises to do it? If it's possible, could I suggest that proposals in this area try to answer the following questions and post the answers - it would certainly help me. (A) Can parsers compliant with the 9704 XML spec parse the suggested documents (including all DTDs)? If not, (A1) is it proposed that the parsers be altered to allow this? or (A2) is some pre-parsing software proposed? If so, what basic operations must it carry out? (B) Will the proposal require a post-parsing process? If so (B1) what mechanisms (e.g. inheritance, parsing of PIs, etc.) will be required? (B2) what validation will the post-parser be expected to carry out? I appreciate that this may not be suitable for all proposals, but any help will be appreciated. P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/
Received on Friday, 16 May 1997 09:17:46 UTC