How to count infoset items w.r.t. entities? from Dan Connolly on 1999-05-20 (www-xml-infoset-comments@w3.org from April to June 1999)

From: Dan Connolly <connolly@w3.org>
Date: Thu, 20 May 1999 12:38:01 -0500
To: www-xml-infoset-comments@w3.org
Message-ID: <37444879.7CDC0E63@w3.org>
The spec says:

"There is one processing instruction information item for
every processing instruction in the document."
-- http://www.w3.org/TR/1999/WD-xml-infoset-19990517#infoitem.pi

but I don't see any specification of how to count how many
processing instructions there are in an XML document.

For example, how many processing instructions are there in the
following document?

	<!DOCTYPE foo [
	<!ENTITY piShorthand "<?any pi?>">
	]>
	<foo>&piShorthand;&piShorthand;&piShorthand;</foo>

Strictly following the XML 1.0 grammar, you won't encounter
the PI production *at all*.

In fact, I can't find anything in the XML 1.0 spec
that says the following has more than one element:

	<!DOCTYPE foo [
	<!ENTITY elshorthand "<xxx/>">
	]>
	<foo>&elShorthand;&elShorthand;&elShorthand;</foo>

The section "4.1 Character and Entity References" doesn't
say "when you see an entity reference, dereference it and
parse the contents inline." It doesn't even have
a reference to section 4.4
http://www.w3.org/TR/1998/REC-xml-19980210#entproc

There's some stuff in
http://www.w3.org/TR/1998/REC-xml-19980210#included

	An entity is included when its replacement text is
	retrieved and processed, in place of the reference
	itself, as though it were part of the document at the
	location the reference was recognized.

whatever that means.


More on counting...

under "2.1. The Document Information Item"

	"There is always one document information item in the
	information set, ..."

That makes it sound like there's only one information set in the
world, like there's only one set of integers in the world.

I suggest

	"There is always one document information item in the
	information set of an XML document, ..."

Under "2.1.1. Document: Required Properties"

	"2.An unordered set of notation information items,
	one for each notation declaration that the XML
	processor has read." 

That says that the information set is not just
a function of an XML document, but also a function of
the behaviour of a processor used to read it. Surely
that's not what you meant, right?

I suggest:

	2. a set of notation information items, one for
	each notation declaration in the XML document.

modulo the question about counting items in the first place.
I think you're going to have to talk about the parse tree
resulting from using the productions in the XML 1.0 spec
(which means that it matters that the grammar is abiguous).
And you'll have to figure out how entities really interact
with those productions.

Another example: "Validating
    processors are required by XML 1.0 to provide this information;
    non-validating processors may always set this flag to false. "

"set this flag"? The information set just is. It doesn't have
state that can be flipped on and off. It's a function of
an XML document alone, according to

	1. Introduction

	This document specifies an abstract data set called the
	XML information set (Infoset), a description of the
	information available in a well-formed XML document [XML].


Also: why mince words so much? In stead of:

	The document information item must have the following
	properties available is some form:

why not just:

	The document information item consists of:


under 2.2.2. Elements: Optional Properties

	6.A reference to the entity information item for the entity
	in which this element begins and ends. 

why "A reference to"? why not just:

	6. an entity information item for the entity
	in which this element begins and ends. 

if you're worried about identity, don't. You wouldn't say:

	a reference to the ISO 10646 character code of the character...

would you? entity information items are identical or not
just like integers are identical or not.

(by the way: I think the term is "codepoint" not "character code"
c.f. http://www.w3.org/TR/1999/WD-charmod-19990225#CharBytes)

-- 
Dan Connolly, W3C
http://www.w3.org/People/Connolly/
Received on Thursday, 20 May 1999 13:37:58 UTC