- From: Peter Murray-Rust <Peter@ursus.demon.co.uk>
- Date: Sat, 22 Mar 1997 11:32:12 GMT
- To: w3c-sgml-wg@w3.org
In message <3333B8A9.309A@utila.ifi.uni-klu.ac.at> "Norbert H. MIKULA" writes: > Thank's to Henry Thompson, > > I was pointed to a severe misunderstanding of character references > in XML, on my side. At the risk of repeating myself, I think it's something the ERB/WG needs to address. The draft has the noble aim of being self-contained and unambiguous in no more than 20 pages. It's unclear whether this can be easily resolved in a short description. > > My initial believe was, that character references are resolved > before the lexer gets to see them. Thus they really would be > treated as if they were entered directly. > > Henry pointed out to me that, at least if following the SGML > philosophy, that's not the case. Also the SGML handbook > confirms his point (357:10) > > "a numeric character reference is always treated as data in > the context in which the replacement occurs." My inclination (which may conflict with the philosophy of the draft) is to take the SGML interpretation as our guide (assuming this is not in conflict with other parts of XML). Then we can either: - refer the developer to ISO:8879 and/or the Handbook. Some members of the WG don't like this since it breaks the ideal of a single document. - transcribe (subject to copyright, etc.) the verbatim parts of ISO:8879 that describe how to interpret references. This will almost certainly break the 20 page rule unless it's compressed. > > However, I would like to see clarifications of that issue. > (Mostly for the sake of others that will have to deal > with it.) > > I don't want to get into a religious war, but to me it is > more natural to treat char refs as if they would be entered > directly. Thus the lexer never would see them. I suspect that if we are going to do something different from SGML we shall run into severe difficulties: - XML may no longer be a 'subset'/'dialect' of SGML - people will look to SGML for guidance and assume that XML 'does the same' - there will be complex situations that are legal XML but which we may not have anticipated. The nested entities is one such, and Norbert's examples are another. I don't think we have any option but to follow SGML, and for the sake of Norbert, other xml-dev'ers and myself we really need this clearing up soon. (I am still struggling to get a valid parser that can be bolted into my system :-). P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/
Received on Saturday, 22 March 1997 06:46:19 UTC