W3C home > Mailing lists > Public > w3c-sgml-wg@w3.org > October 1996

Re: RS/RE: basic questions

From: Paul Grosso <paul@arbortext.com>
Date: Thu, 3 Oct 96 09:57:25 CDT
Message-Id: <9610031457.AA10693@atiaus.arbortext.com>
To: w3c-sgml-wg@w3.org
> From: lee@sq.com
> Date: Wed, 2 Oct 96 22:44:32 EDT

> There's no point in practice in being compatible with 8879 -- instead, XML
> has to be compatible with actual tools.  Probably the most widespread
> tools that read SGML are HoTMetaL, Panorama, Adept, and A/E, with NSGMLS
> and SGMLS and Omnimark being the most widespread on the `next layer down'
> (conceptually, I don't mean to deprecate them!).  I don't list DynaText
> because the viewer is the widespread part, and it reads a compiled form.
> As far as I know, only the tools in the 2nd group I mentioned support
> shortref or datatag or making " a name start character (is that what's
> going on there with <">...</">??).

> Again, if there are any other software writers represented here whose
> current shipping software could not cope with making all whitespace
> significant, and could not supply a script or patch or upgrade or
> sidegrade or whatever within, say, six to twelve months, send me mail
> and I will post a summary early next week.

Adept handles shortrefs on input, though it will expand ("normalize") 
them upon input (and therefore never write them out, though it would
be trivial to add an "outputfilter" that remapped certain tags to some
other string of characters).

Adept uses a conforming 8879 parser (except when it notices that the
file is one it has previously written out in which case it uses a
stripped down parser which, however, implements the same RE rules
as 8879), so the 8879-insignificant REs are already gone by the time
the application gets control.  Therefore, Adept as written would not
cope with making whitespace significant that isn't significant per 8879.

Adept also ensures when it writes out a file that it introduces line
breaks only where they are (1) insignificant per 8879 (and carefully
places two REs when necessary to ensure that one is significant) or
(2) in data content that is not indicated (by the style sheet) as a
verbatim element (and the RE is treated as a space by the application
when rereading the file).  

Adept introduces line breaks because most users want to be able to 
(a) print or (b) "ascii" email or (c) processes via something like sed or
(d) edit in something like vi, and (a) and (b) require lines generally
less than 80 characters long and (c) and (d) require lines less than
256 character long.  However, the user can set the "target output
record length" in Adept to anything between 40 and a very large number,
and if they set it to, say, 30000, 8879-significant line breaks will
not be introduced.

The above is presented in the interest of factuality, and is not intended
as any statement of opinion on this bordering-on-counter-productive and 
rapidly-approaching-ludicrous discussion on record ends.  I'm reminded
that the role of the work group participants is one of input-givers to 
the ERB, I consider that the ERB has been overwhelmed with input on this 
topic, and I plan to allow the ERB to incorporate the input into the 
proposal that Tim and Michael are developing without further input from 
me on this topic.
Received on Thursday, 3 October 1996 11:27:08 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 20:25:03 UTC