Character references in XML

Thank's to Henry Thompson,

I was pointed to a severe misunderstanding of character references
in XML, on my side.

My initial believe was, that character references are resolved
before the lexer gets to see them. Thus they really would be
treated as if they were entered directly.

Henry pointed out to me that, at least if following the SGML
philosophy, that's not the case. Also the SGML handbook
confirms his point (357:10)

"a numeric character reference is always treated as data in
the context in which the replacement occurs." 

However, I would like to see clarifications of that issue. 
(Mostly for the sake of others that will have to deal
with it.)

I don't want to get into a religious war, but to me it is
more natural to treat char refs as if they would be entered
directly. Thus the lexer never would see them.

It all started when I thought that 

<!entity % yy '&#37;zz;'> 

would be changed to 

<!entity % yy '%zz;'>

before being handed to the lexer. Thus the parser would 
immediately resolve the reference to the parameter entity
%zz; which in turn would have to be defined already at this point.  


-- 
Best regards,
Norbert H. Mikula

=====================================================
= SGML, DSSSL, Intra- & Internet, AI, Java 
=====================================================
= mailto:nmikula@edu.uni-klu.ac.at 
= http://www.edu.uni-klu.ac.at/~nmikula
=====================================================

Received on Saturday, 22 March 1997 05:46:09 UTC