Date: Mon, 8 Jun 92 00:17:48 -0500 From: connolly@pixel.convex.com (Dan Connolly) Message-Id: <9206080517.AA20948@pixel.convex.com> To: www-talk@nxoc01.cern.ch Cc: enag@ifi.uio.no Subject: Re: using NOTATIONs inline In-Reply-To: <23177A@erik.naggum.no> Cc: In article <23177A@erik.naggum.no> you write: >Dan Connolly <connolly@convex.com> writes: >| >| The WWW group is attempting to define a multimedia interchange >| format called HTML. . . . > >Why not use HyTime? > Eric: Partyly because of ignorance (we've heard of HyTime, but we don't know the details). I'd expect a HYTIME engine to be quite a bit of work to implement. And partly because, as I understand it, HYTIME doesn't go as far as to perscribe a DTD. The WWW project needs one particluar language, not a whole architecture. I'd certainly like to know more about HYTIME's techniques for addressing documents, esp. elements of documents. Now for the WWW gang: >: >| That is, is it possible to put an arbitrary 8 bit binary stream >| _inside_ an SGML document? My guess is: no. But if we use >| CDATA, can we include anything that doesn't contain the closing >| tag in full? > >If you by "the closing tag in full" mean the entire end-tag, complete >with etago, generic identifier, and tagc, as in "</image>", this is not >the way SGML does it. CDATA and SDATA are terminated by a etago >"delimiter-in-context", which is an etago (end-tag open, "</") delimiter >followed by a name start character, or a grpo (group open, "(") >delimiter if concurrent document types are allowed. In the reference >concrete syntax, this means that the regular expression "</[(a-z]" >matches the end of CDATA and SDATA elements. > >You can also use marked sections, with a CDATA status keyword, in which >case the CDATA is terminated by the mse delimiter (marked section end, >"]]>"). > >: >| Someone made the point that an SGML document is only allowed to >| include SGML characters as specified by the SGML declaration, and if >| we're going to use the default SGML declaration, we have to stick to >| the characters blessed by it. > >Blessed and blessed. The SGML declaration is supposed to reflect the >reality of the document, not enforce arbitrary limits on them. So you >write an SGML declaration which fits the document. > >| That's not my understanding. I thought that inside CDATA (or SDATA, >| I think) you could put _anything_ but the closing tag in full. > >As said above, the etago delimiter-in-context terminates the data, >regardless of whether it's a legal end-tag in that context. > >You should be aware that the SGML parser will parse the contents of the >"binary" content, and ignore record start, and treat record ends >different from other characters. In addition, it's an error for an SGML >entity to contain characters with any of the numbers listed in the >SHUNCHAR part of the SYNTAX declaration. This is _not_ what you want >with binary data. > >| What's the scoop? Do we have to use external entities for raw data? > >Yes. An external entity that is not an SGML text entity requires a >notation identifier, so you only need to list the entities in the DTD, >with notation, and refer to them by name in the document instance. > >If this is not satisfactory, you should declare the objects to be CDATA, >and use a binary to text-only transformation scheme. There are several >such schemes. Among them, base64 is the preferred encoding in my view, >since it's available as part of the new Multipurpose Internet Mail >Extensions (MIME) RFC-to-be. (The latest draft is available for >anonymous FTP as ftp.ifi.uio.no:/pub/SGML/MIME.6.ps and MIME.6.txt for >two weeks from today. Section 5.2 which concerns the base64 encoding is >also available as ftp.ifi.uio.no:/pub/SGML/base64.txt.) Transformation >back to the binary form from the text-only form may be done on the fly >by the application before sending the data to the notation interpreter. > My idea is to use MIME encodings, but put these attachments _outside_ the SGML text, in an attached (or external) body part. >In addition to being much easier to deal with in SGML, this also makes >SGML documents containing such content robust with respect to file >transfer, etc. > >Hope this helps, ></Erik> Thanks. Mostly it confirms my suspicions, but it should also provide a somewhat authoritative answer (no references to ISO 8879 here :-) to the WWW project. >-- >Erik Naggum | +47-295-0313 | ISO 8879 SGML | Memento, >Naggum Software | "fuzzface" | ISO 10744 HyTime | terrigena. >Boks 1570, Vika | <erik@naggum.no> | JTC 1/SC 18/WG 8 | Memento, >0118 OSLO, NORWAY | <enag@ifi.uio.no> | SGML UG SIGhyper | vita brevis.