Re: using NOTATIONs inline

Dan Connolly (connolly@pixel.convex.com)
Mon, 8 Jun 92 00:17:48 -0500


Date: Mon, 8 Jun 92 00:17:48 -0500
From: connolly@pixel.convex.com (Dan Connolly)
Message-Id: <9206080517.AA20948@pixel.convex.com>
To: www-talk@nxoc01.cern.ch
Cc: enag@ifi.uio.no
Subject: Re: using NOTATIONs inline
In-Reply-To: <23177A@erik.naggum.no>
Cc: 

In article <23177A@erik.naggum.no> you write:
>Dan Connolly <connolly@convex.com> writes:
>|
>|   The WWW group is attempting to define a multimedia interchange
>|   format called HTML.  . . .
>
>Why not use HyTime?
>
Eric:
Partyly because of ignorance (we've heard of HyTime, but we don't
know the details). I'd expect a HYTIME engine to be quite a bit
of work to implement. And partly because, as I understand it, HYTIME
doesn't go as far as to perscribe a DTD. The WWW project needs
one particluar language, not a whole architecture.

I'd certainly like to know more about HYTIME's techniques for addressing
documents, esp. elements of documents.

Now for the WWW gang:
>:
>|   That is, is it possible to put an arbitrary 8 bit binary stream
>|   _inside_ an SGML document? My guess is: no. But if we use
>|   CDATA, can we include anything that doesn't contain the closing
>|   tag in full?
>
>If you by "the closing tag in full" mean the entire end-tag, complete
>with etago, generic identifier, and tagc, as in "</image>", this is not
>the way SGML does it.  CDATA and SDATA are terminated by a etago
>"delimiter-in-context", which is an etago (end-tag open, "</") delimiter
>followed by a name start character, or a grpo (group open, "(")
>delimiter if concurrent document types are allowed.  In the reference
>concrete syntax, this means that the regular expression "</[(a-z]"
>matches the end of CDATA and SDATA elements.
>
>You can also use marked sections, with a CDATA status keyword, in which
>case the CDATA is terminated by the mse delimiter (marked section end,
>"]]>").
>
>:
>|   Someone made the point that an SGML document is only allowed to
>|   include SGML characters as specified by the SGML declaration, and if
>|   we're going to use the default SGML declaration, we have to stick to
>|   the characters blessed by it.
>
>Blessed and blessed.  The SGML declaration is supposed to reflect the
>reality of the document, not enforce arbitrary limits on them.  So you
>write an SGML declaration which fits the document.
>
>|   That's not my understanding. I thought that inside CDATA (or SDATA,
>|   I think) you could put _anything_ but the closing tag in full.
>
>As said above, the etago delimiter-in-context terminates the data,
>regardless of whether it's a legal end-tag in that context.
>
>You should be aware that the SGML parser will parse the contents of the
>"binary" content, and ignore record start, and treat record ends
>different from other characters.  In addition, it's an error for an SGML
>entity to contain characters with any of the numbers listed in the
>SHUNCHAR part of the SYNTAX declaration.  This is _not_ what you want
>with binary data.
>
>|   What's the scoop? Do we have to use external entities for raw data?
>
>Yes.  An external entity that is not an SGML text entity requires a
>notation identifier, so you only need to list the entities in the DTD,
>with notation, and refer to them by name in the document instance.
>
>If this is not satisfactory, you should declare the objects to be CDATA,
>and use a binary to text-only transformation scheme.  There are several
>such schemes.  Among them, base64 is the preferred encoding in my view,
>since it's available as part of the new Multipurpose Internet Mail
>Extensions (MIME) RFC-to-be.  (The latest draft is available for
>anonymous FTP as ftp.ifi.uio.no:/pub/SGML/MIME.6.ps and MIME.6.txt for
>two weeks from today.  Section 5.2 which concerns the base64 encoding is
>also available as ftp.ifi.uio.no:/pub/SGML/base64.txt.)  Transformation
>back to the binary form from the text-only form may be done on the fly
>by the application before sending the data to the notation interpreter.
>
My idea is to use MIME encodings, but put these attachments _outside_
the SGML text, in an attached (or external) body part.

>In addition to being much easier to deal with in SGML, this also makes
>SGML documents containing such content robust with respect to file
>transfer, etc.
>
>Hope this helps,
></Erik>

Thanks. Mostly it confirms my suspicions, but it should also provide
a somewhat authoritative answer (no references to ISO 8879 here :-)
to the WWW project.

>--
>Erik Naggum       |  +47-295-0313     |  ISO 8879 SGML     |  Memento,
>Naggum Software   |   "fuzzface"      |  ISO 10744 HyTime  |  terrigena.
>Boks 1570, Vika   | <erik@naggum.no>  |  JTC 1/SC 18/WG 8  |  Memento,
>0118 OSLO, NORWAY | <enag@ifi.uio.no> |  SGML UG SIGhyper  |  vita brevis.