- From: Peter Murray-Rust <Peter@ursus.demon.co.uk>
- Date: Sun, 18 May 1997 10:03:46 GMT
- To: w3c-sgml-wg@w3.org
The publication of the MathML DTD paper has catalysed my thinking on namespaces. As an implementer I'd like to ask some factual questions and also to present what I see as *possible* (not necessarily desirable) solutions. It may then be possible to see in future discussion whether we already have solutions, whether we have potential solutions but no software, or whether we have to invent new syntax and write software for it. I regard namespaces as *the* critical problem of the 5 questions - the others can all be managed in some way at present; namespaces can't. If this problem isn't addressed quickly, the next six months will see de facto namespaces in XML that will take years to recover from, if at all. This (long) posting builds on earlier postings on this thread and proposes a simple, trivially implementable, solution to the problem, at least for the interim. <VECTOR XML-TYPE="AXIOM"> Any solution must be SGML-compatible <SEP/> Any solution must work BOTH for documents with and without DOCTYPEs or validation <SEP/> Any solution must be reasonably cheap to implement </VECTOR> <Q>Can SGML, in all its glory, solve this problem *in practice* at present?</Q> I really have no idea of the answer to this. My understanding is that there are potentially two solutions: <SOLUTION> SUBDOC. From my reading this allows one to switch DTDs in mid-processing. I understand that not all parsers support it. Having been weaned on sgmls I find it supported there, but no explicit documentation on how to use it. I also have no idea what the output is. </SOLUTION> <SOLUTION> AFs and/or HyTime. I am repeatedly assured by the evangelists - and I believe them - that AFs and HyTime will solve all our problems. This is clearly a millenium solution for SGML, but not for July 1, 1997. Currently it ('it' refers to the universal solution) has the following drawbacks. (a) It's not easy to understand, and I don't. [Several people have both talked to me and posted explanations for which many thanks, but I'm a slow learner.] (b) There is no simple introdcution with examples. (c) There is no inexpensive working system that I can play around with. (d) I am frightened that the *output* of a HyTime engine (or whatever) is sufficiently complex that the post-parsing/application engines must take on a further level of sophistication. </SOLUTION> <API> I am concerned that we are still no further forward with an 'API' for parser 'output'. The three options seem to be: ESIS, ElementTree and groves. I assume that some of the proposed solutions may require other APIs and be incompatible with the ESIS approach, and possibly the ElementTree. (I take on faith that you can do anything with groves). If I'm right, then we need to know this up front, because otherwise we are beavering away writing software that is doomed. </API> <Q> If SUBDOC or AFs are required as the solution, how much work in MCSG-months (1997 revised level) is required? </Q> <A XML-TYPE="hypothesis"> A lot. And then some more. </A> If SUBDOC/AFs are going to be an essential requirement for managing XML namespaces we'd better hear the bad news now and not some months down the track. We don't have to *do* anything in V1.0 except: - tell people to be patient - stop them developing horrible kludges - work out how much effort it's going to be - try to create a timescale Here are some other approaches which seem to be possible. None are without problems. <PROBLEM> An earlier posting suggested that the problem was primarily one of semantics. It's not - it involves syntax as well. Consider (from MathML): <![CDATA[ <!Element PRODUCT (%MathExp;)*> ]]> and (hypothetically) from CML <![CDATA[ <!Element PRODUCT (MOL)*> ]]> The content models and ATTLISTs (not shown) of these are quite different and so any document combining both of these will be invalid. That's the unavoidable syntactic problem *if validation is required*. </PROBLEM> <SOLUTION> Accept that there will be WF documents that cannot be validated however well-intentioned the authors are. I do not like this (although I'll probably practise it, especially when evangelising XML). However of all the solutions, it's one that works today. </SOLUTION> <SOLUTION XML-TYPE="recommended"> Require that all documents which include DTD fragments use FullyQualifiedNames. As an example, 'Java in a Nutshell' (p.19) asserts that all O'Reilly Java hackers use the form: com.ora.writers.david.widgets.Barchart.display(). for their Java classes. (BTW - ORA denizens - does it work OK?) This is effectively a URL and therefore unique in the world. In XML terms this would mean that the MathML PRODUCT would become: org.w3.mathml.PRODUCT To me this is very attractive because: - it requires no new technology - is guaranteed to work - is automatically linked to semantics and other behaviours - gives newcomers to SGML the idea they need to be thoughtful about this. Its only drawbacks are: - it's long-winded (but we don't care about that, do we?). - it requires an agreed naming convention within XML (i.e. that any ElementType starting with (com, edu, country-name, org, etc.) and followed by '.' is taken as a fully qualified name - until URNs come along some FQNs may change as people move (but it can't be worse than *not* doing this). In summary, even if we merely *encourage* people to use this we are solving 99% of the potential problems. None of you will *complain* if I write: <uk.ac.nottingham.cml.MOL> I suspect that even if we adopt other solutions there is still a role for FQNs undet the surface. The parser would output an ESIS stream that looked something like: (uk.ac.nottingham.cml.CML (uk.ac.nottingham.cml.PRODUCT (uk.ac.nottingham.cml.MOL (uk.ac.nottingham.cml.ATOMS ... )uk.ac.nottingham.cml.ATOMS )uk.ac.nottingham.cml.MOL )uk.ac.nottingham.cml.PRODUCT (org.w3.mathml.PRODUCT (org.w3.mathml.EXPR ... )org.w3.mathml.EXPR )org.w3.mathml.PRODUCT )uk.ac.nottingham.cml.CML Similar things would be possible for ElementTrees. The application might decide that for presentation purposes it could strip off the FQN leaders without losing information, e.g. it could label tags with the short form and pretty icons linked to the FQNs. It would however preserve *all* of the namespace in the document using *tools that exist at present*. It should be attractive to HTML-hackers who are quite accustomed to filling their pages with 'http://www.w3.org/some/where' and don't complain. So the only real problem is how to make the documents authorable. </SOLUTION> <PROPOSAL TYPE="ancillary"> It's becoming clearer daily that there is a requirement for a pre-parser for manipulating documents before they go into the parser. MathML has already lifted the lid on this by suggesting that there should be macros to insert length boilerplate (the boilerplate being legal XML). (I suspect this cannot easily be done with entities since there is argument substitution, but I haven't looked in detail.) The MathML documents before macroprocessing are presumably not validatable but may be WF - I haven't checked. This is therefore a proposal for a pre-processor for XML documents which cannot be validated, but are WF and which *will be validatable after preprocessing*. It will require (minor) changes to the language and (minor) changes to current processing software which validates. It requires no changes for documents and tools dealing with WF documents. <PROPOSAL> There should be a PI which expands GIs within a scope to (more) fully qualified names. This PI may be ignored by WF processors, but they are advised to pass it to the application. For validation the parser is required to perform this expansion. </PROPOSAL> Example <![CDATA[ <!DOCTYPE cml SYSTEM "cml.dtd" [ <!-- unqualified, just as an example --> <!-- an additional version the math dtd *using* FQNames; i.e. long-winded --> <!entity % mathml SYSTEM "http://www.w3.org/some/where/fqmathml.dtd"> %mathml; ]> <CML> <MOL Title="benzene"> ... <XLIST TITLE="symmetry operators"> <?XML-DOCTYPE DOCTYPE="org.w3.mathml"?> <MATRIX> <MATRIXROW>1<SEP/>0<SEP/>0<SEP/></MATRIXROW> <MATRIXROW>0<SEP/>0<SEP/>1<SEP/></MATRIXROW> <MATRIXROW>0<SEP/>1<SEP/>0<SEP/></MATRIXROW> </MATRIX> <?XML-DOCTYPE?> <!-- exists from math scope --> ... </MOL> </CML> ]]> Note that this is *valid WF XML as of today* (assuming no typos). It's potentially invalid XML because MATRIX might collide with CML (actually it doesn't). However, after pre-processing to stick the 'org.w3.mathml' on the front it's guaranteed to be validatable (assuming we reserve the DomainName space for the FQNs). All that is required is for the XML-ERB to (a) reserve this syntax for GIs (b) reserve the namespace for *the first component* of XML GIs. The only implementors who need to worry are those who write validating parsers and this should be a trivial addition to their task. </IMPLEMENTATION> P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/
Received on Sunday, 18 May 1997 07:03:46 UTC