Re: SGML and XML

At 2:23 PM 11/28/96, Charles F. Goldfarb wrote:
>The SGML technique for accomplishing this result has two steps:
>1. Declare a new SEPCHAR function character in the XML concrete syntax, using a
>character number that is not expected to occur naturally.  Call this
>"newsep". :
>2. In a shortref map used by the document element, map RE to NEWSEP.
>That's it. If you're unhappy with SGML's built-in whitespace handling, you can
>use this technique today. If we choose to use it for XML, it will be simple to
>explain, easy to implement, and totally conforming.
    Unfortunately this does not work for XML, as it is not possible to tell
the difference between mixed content and element content when parsing
without a DTD.

   It also unfortunately requires throwing away a character code point.
Especially for 8-bit character sets, that might be a high price to pay.
Isn't there some chance that we could get a way to specify the behavior
wanted rather than an arcane workaround? I want to be able to turn off
RS/RE handling, and turn all whitespace into SEPCHARS without having to
make arcane restrictions on what other, non-whitespace, characters can
appear in a document. Is that too much to ask?

   Also, it's not clear to me that RS/RE are visible to shortrefs at the
beginning of an element. I thought SHORTREF applied to document data, not
ignored characters, but I could well be wrong, as I have avoided delving
into the mysteries of shortref (like why a shortref cannot include the
letter "B").

   -- David

I am not a number. I am an undefined character.
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________