[Prev][Next][Index][Thread]

Re: RS/RE: basic questions



>According to my understanding, the only way to make all whitespace
>significant (i.e. to pass all whitespace to the application) is to do that
>SGML DECL RE remapping hack. So a LOT of software would have to be changed
>(i.e. almost all of the products you mentioned) if you are right that they
>do not support SGML DECL tricks.

As others have pointed out, the problem is really one for entity
managers, as they are the things that actually put in RE and RS. In
most SGML parsers I've seen, the entity manager actually doesn't even
do this (ie. they don't *really* detect, and then add RE or RS), and
instead, the parser is hardwired into thinking that CRLF are the
things to look for. So the thing about "SGML compatability" is
something of a red herring, because if you stick to the letter of the
law (so to speak), many/most SGML applications don't conform anyway
(good indication of a broken specification).

In addition, having a parser perform removal of what the author might
have intended to be significant whitespace (one can never be sure what
the author's intentions were) to me seems to be at least as bad as
leaving extra whitespace in. I would prefer to just leave it all in,
and let applications process it. I think this will result in
considerably *less* degradation of data of time as whatever occurs
between a start and end tag would then always be the canonical
content. 

So, my final proposal:

  1) To say that RE handling follows the rules outlined by the
     ERB meeting *if* they occur.
  2) That the RE and RS pair be defined to some private use area
     code so that there can be no confusion with CRLF.
  3) That we strongly recommend that XML entity managers adopt
     a stream view of storage objects such that they do not
     recognise record boundaries (common application view provided by
     most system libraries on most modern OS's).

Note that (3) is a *recommendation*, not a requirement. This would
then let the market decide which way the decision should fall.

To be frank, I think doing RE+RS and I18N *correctly* to be less
important that actually *doing* XML. I have every faith that
common usage will eradicate any bad decisions we make now (by either
killing XML or changing it). As such, I will reduce my participation
in discussions regarding both topics. 


References: