- From: Kent M Pitman <kmp@harlequin.com>
- Date: Fri, 24 Apr 98 11:43:23 EDT
- To: xml-editor@w3.org
- Cc: kmp@harlequin.com
The introductory text in section 4, Physical Structures, is very
confusing. It uses a meaning for "parsed" which is alien to any
meaning of "parsed" that I am familiar with.
If I understand at all, after many readings, the word "parsed" could
usefully be replaced by the word "XML" (or "XML entity" or "XML document"),
and "unparsed" by "non-XML" (or "non-XML entity" or "non-XML document").
As nearly as I can tell from your use of "parsed",
(a) it has nothing to do with the issue of whether the text has
been changed from XML source characters to a structural
representation of XML [the thing I normally associate with parsing].
and
(b) it is both insulting to implementors of other systems, not to mention
wholly confusing, to suggest that [for example] a database is not
parsed. The whole point of a database is that it IS parsed--it is NOT
source representation [unparsed], but a highly structured
representation.
- - - - -
Here are some examples of confusions I had while reading this text, to help
you understand why the chosen text is not good:
(1) I was imagining that '<!ENTITY FOO "BAR">' was unparsed if
represented as the string [character vector]:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|<|!|E|N|T|I|T|Y| |F|O|O| |"|B|A|R|"|>|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
and that it was parsed if it was represented as some structured object:
+-------+----------------+
| Class | XML Markup |
+-------+----------------+
| Kind | General Entity |
+-------+----------------+ +-+-+-+
| NAME | +-------------------> |F|O|O|
+-------+----------------+ +-+-+-+ +-+-+-+
| VAL | +--------------------------------->|B|A|R|
+-------+----------------+ +-+-+-+
(2) Then I worried that maybe the "parsed" part was "BAR". That maybe
instead of substituting the text vector "BAR", I was supposed to have
pre-parsed that. For example, if I'd seen
<DEFINE % ZAP '<!ENTITY FOO "BAR">'>
that I wasn't supposed to substitute
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|<|!|E|N|T|I|T|Y| |F|O|O| |"|B|A|R|"|>|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
for %ZAP; where it occurs but I was instead supposed to substitute
+-------+----------------+
| Class | XML Markup |
+-------+----------------+
| Kind | General Entity |
+-------+----------------+ +-+-+-+
| NAME | +-------------------> |F|O|O|
+-------+----------------+ +-+-+-+ +-+-+-+
| VAL | +--------------------------------->|B|A|R|
+-------+----------------+ +-+-+-+
But that didn't make sense because some objects can't be parsed without
knowledge of their context and parameter entity definitions contain no
notion of the content of their expansion.
(3) For a while, I also worried that "PEReference" meant "Parsed Entity
Reference" until I (fortunately) found mention of a "Parameter Entity
Reference". I *really* do not like cute little two-letter unintelligible
abbreviations, like PE, and would prefer definition [69] (and its callers)
refer to ParamEntityReference, not PEReference. ("cp" is another
two-letter abbrev that annoyed me; my memory of SGML says it should be
"content particle" but I use other systems where it means other things
like "command processor" and using a short name encourages that confusion).
- - - - -
Here is what I *think* the section in 4. Physical Structures is trying to say:
[By the way, I find the remark in the first paragraph about how the
external dtd subset is not identified by name to be confusing. If
it's external and it has no name, how can it not be identified by name??]
==============================================================================
4. Physical Structures
...
Entitites may be either XML documents themselves, or documents of
other kinds not intended to be parsed by XML. An XML document's
contents are referred to as the `replacement text' for the `entity
name' that names the XML document.
A non-XML entity is a resource whose contents are either not text or,
if text, are not to be interpreted as XML. Each non-XML entity has
an associated notation, identified by name. Beyond a requirement
that an XML processor make the identifiers for the entity and
notation available to the application, XML places not constraints on
the contents of non-XML entities.
XML entities are invoked by name using entity references; non-XML
entities are invoked by name, given the value of ENTITY or ENTITIES
attributes.
...
==============================================================================
By the way, I think the ", see below," in paragraph 1 of Physical Structures
to be visually confusing and not helpful. Also, immediately following, I
don't understand why an "external DTD subset" is not referred to by name. How
can anything external ever be addressed if not by name? I tried to find a
definition of "external DTD subset" which answered this question usefully, but
found nothing really helpful.
Received on Friday, 24 April 1998 11:40:07 UTC