- From: Peter Murray-Rust <Peter@ursus.demon.co.uk>
- Date: Thu, 30 Jan 1997 16:51:26 GMT
- To: w3c-sgml-wg@www10.w3.org
I hope this isn't going over solved ground, but I'd like to check up about entities and the interpretation of 4.3. The spec uses the word 'include' which I take to mean 'copy the entire contents of the external entity' (normally a file) 'into the space vacated after removing the &...; string'. The spec emphasises the use of modularity in authoring (which I strongly support). Therefore a simple example would be: <!DOCTYPE CML SYSTEM "cml.dtd" [ <!ENTITY bib1 SYSTEM "bib1.cml"> <!ENTITY bib2 SYSTEM "bib2.cml"> <!ENTITY mol1 SYSTEM "mol1.cml"> <!ENTITY mol2 SYSTEM "mol2.cml"> ]> <CML> <XLIST TITLE="bibliography"> &bib1; &bib2; </XLIST> <XLIST TITLE="molecules"> &mol1; &mol2; </XLIST> </CML> Assume that the files have structures like: <BIB> ... </BIB> and <MOL> ... </MOL> and are 'valid', then the whole document is a valid CML document. The subsidiary files are not valid CML (they have no DOCTYPE) but they are WF and in all other respects valid. They are therefore valuable reusable components (but see below). This seems to be the intention of the draft. However it is also possible to create a valid document as (say) ... <!ENTITY molfrag1 SYSTEM "molfrag1.txt"> ]> <CML> <MOL> &molfrag1; </CML> where molfrag1.txt contains something like: <ATOMS> <!--* valid atom content here *--> </ATOMS> </MOL> i.e. the starttag is in one file and the endtag in another. Whilst this is horrible, it is the sort of thing that a mindless text processor might do when sending chunks to a mailer with size restrictions. It would also be possible to have both the start and the endtags in the main document. I am not an expert on NOTATION but is seems that this is required if including a foreign file, e.g. <FIGURE NOTATION="gif"> &mygif; </FIGURE> It therefore becomes difficult to say whether a document is or is not WF without looking at the entities. My motivation is that such document fragments may be useful both as entities and as link-ends - 'the things the pointy bits point to' (if I have that correct). In other words I might also wish to write something like: <XLIST TITLE="Molecules"> <-XML-LINK HREF="mol1.cml"></-XML-LINK> <-XML-LINK HREF="mol2.cml"></-XML-LINK> </XLIST> to reference the molecules. However the semantics are differnt. The second assumes that the application will find something with a well defined structure of _some_ sort in the files. I'm still not clear how it knows precisely what that structure _is_. If mol1.xml is a complete valid CML file (i.e. has a DOCTYPE statement and an accessible DTD) then I know, otherwise I have to guess. I _hate_ using file suffixes, and my preference would be to include a MIME type somewhere in the LINK. The nearest I can get is the HRTYPE attribute - but this isn't clear in the spec. If it's allowed, I'd suggest: <-XML-LINK HREF="mol1.cml" HRTYPE="application/x-cml"></-XML-LINK> However, if this is allowed then the use of entities fails (since the files all must have DOCTYPEs in them). So I really want to be able to omit the DOCTYPEs and use HRTYPE (or some other tool) to tell the application 'what is at the end of the pointy bit is a WF CML file and its nature is determined by its first element. Just assume there is a DOCTYPE at the front' Of course I have to make sure that the context into which the file is imported is sensible, but that's my problem! P. [BTW I am not a supporter of punctuation in NAMEs if it can be avoided. For example, I create Java classes for most of my Elements directly from their names. -XML-LINK.java is illegal, and probably has to be contracted to XMLLINK.java. The (obvious?) underscore character doesn't seem to be used in XML names (?)] Peter Murray-Rust, (domestic net connection) Virtual School of Molecular Sciences, Nottingham University, UK http://www.ccc.nottingham.ac.uk/~pazpmr/
Received on Thursday, 30 January 1997 12:41:17 UTC