[Prev][Next][Index][Thread]

Re: Character references in XML



In message <3333B8A9.309A@utila.ifi.uni-klu.ac.at> "Norbert H. MIKULA" writes:
> Thank's to Henry Thompson,
> 
> I was pointed to a severe misunderstanding of character references
> in XML, on my side.

At the risk of repeating myself, I think it's something the ERB/WG needs
to address.  The draft has the noble aim of being self-contained and
unambiguous in no more than 20 pages.  It's unclear whether this can
be easily resolved in a short description.

> 
> My initial believe was, that character references are resolved
> before the lexer gets to see them. Thus they really would be
> treated as if they were entered directly.
> 
> Henry pointed out to me that, at least if following the SGML
> philosophy, that's not the case. Also the SGML handbook
> confirms his point (357:10)
> 
> "a numeric character reference is always treated as data in
> the context in which the replacement occurs." 

My inclination (which may conflict with the philosophy of the
draft) is to take the SGML interpretation as our guide (assuming
this is not in conflict with other parts of XML).  Then we can
either:
	- refer the developer to ISO:8879 and/or the Handbook.  Some
		members of the WG don't like this since it breaks the
		ideal of a single document.
	- transcribe (subject to copyright, etc.) the verbatim parts
		of ISO:8879 that describe how to interpret references.
		This will almost certainly break the 20 page rule unless
		it's compressed.

> 
> However, I would like to see clarifications of that issue. 
> (Mostly for the sake of others that will have to deal
> with it.)
> 
> I don't want to get into a religious war, but to me it is
> more natural to treat char refs as if they would be entered
> directly. Thus the lexer never would see them.

I suspect that if we are going to do something different from SGML
we shall run into severe difficulties:
	- XML may no longer be a 'subset'/'dialect' of SGML
	- people will look to SGML for guidance and assume that
		XML 'does the same'
	- there will be complex situations that are legal XML but which
		we may not have anticipated.  The nested entities is one
		such, and Norbert's examples are another.

I don't think we have any option but to follow SGML, and for the sake of 
Norbert, other xml-dev'ers and myself we really need this clearing up soon.
(I am still struggling to get a valid parser that can be bolted into my system
:-).

	P.
 

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/


Follow-Ups: