W3C home > Mailing lists > Public > www-html@w3.org > June 2002

Error in XHTML Modularisation Recommendation? (was: RE: [xml-dev] XHTML modularisation causes strange error with MSXML)

From: MAISONNY Benoit <Benoit.MAISONNY@eurocontrol.int>
Date: Sat, 29 Jun 2002 12:51:45 +0200
Message-ID: <779D622F93F6D311ADDA0008C70DA9F604491AA7@agnbe02.mis.eurocontrol.be>
To: Dare Obasanjo <dareo@microsoft.com>, www-html@w3.org, xml-dev@lists.xml.org

> From: Dare Obasanjo [mailto:dareo@microsoft.com]
> 
> Does this newsgroup post solve your problems 
> 
> 	
> http://groups.google.com/groups?selm=DXc2q8GFBHA.259%40cppssbb
> sa01.micro
> soft.com 

YES! It does solve my problem. (I did perform an extensive search on Google,
but not in the groups... Shame on me!)

So: that post explains the situation for the entity "lt", which seems to be
bogus indeed in the XHTML Modularisation Specification (I would like to know
the HTML gurus' opinion on this one). My (very similar) problem was in fact
with the "amp" entity, but of course the lt one popped up right after.

Solution: I simply redeclared lt and amp in my "driver" file like this:
<!ENTITY lt "&#38;#60;">
<!ENTITY amp "&#38;#38;">
That was enough for MSXML to accept my modularised DTD.


I did further tests with the "bogus" DTD and the other parsers. I think it
can be interesting to report that here. I simply added "&lt;" somewhere in
my sample XML instance and tried to validate it. Results:

- Xerces 2.0.1 validates, which is wrong.
- XMLSpy validates too.
- ElCel's xmlvalid 0.14.7 doesn't and reports:
	xhtml-special.ent [37:20] : Fatal error: entity reference 
	must start with a letter, '_' or ':'
  Note that the lt entity is in fact on line 35, not 37.
- libxml2 doesn't and reports:
	Entity: line 1: error: xmlParseEntityRef: no name
	&<
	^
	Entity: line 1: error: xmlParseStartTag: invalid element name
	&<
	 ^
	sample7-id.xml:14: error: Entity value required
	html:li id="N10058" class="Text" title="Text" xml:lang="Text">Text
test lt: &lt
	
^
I suppose the last 2 don't even check the entities before they are actually
used. Call it a feature or a bug. Anyway, I think MSXML could be more
specific in pointing the error. "Line 1 position 2" is not very helpful.

More generally, and speaking as a user of these parsers, I think they should
output a warning about such bogus entities (and of course an error when they
are encountered in the doc instance).

> 
> if not can you update me on what the problems you have 
> afterwards are? 
> 
> Thanks. 

Thanks to you, Dare, and for the very appropriate words of wisdom :-)
May I suggest a Knowledge Base article about this issue?

Benoit
(PS: just for indexing purpose, I thought I should type modularization in
en_US somewhere. Now it's done.)

> 
> -- 
> PITHY WORDS OF WISDOM 
> Rule of Accuracy: When working toward the solution of a problem, it
> always helps if you know the answer. 
> Corollary: Provided, of course, that you know there is a problem. 
> 
> This posting is provided "AS IS" with no warranties, and confers no
> rights. 
> 
> 
> 
> > -----Original Message-----
> > From: MAISONNY Benoit [mailto:Benoit.MAISONNY@eurocontrol.int] 
> > Sent: Friday, June 28, 2002 5:09 AM
> > To: 'www-html@w3.org'; 'xml-dev@lists.xml.org'
> > Subject: [xml-dev] XHTML modularisation causes strange error 
> > with MSXML
> > 
> > 
> > Hello,
> > 
> > I wrote a DTD for a type of aeronautical document, using 
> > XHTML Modularisation. This worked fine until I tried to 
> > validate the DTD and sample documents with MSXML4 SP1 (and 
> > earlier). Needless to say they validate with Xerces and 
> > ElCel's xmlvalid (and even with XMLSpy, after adding a space 
> > into empty parameter entities).
> > 
> > I found some messages on xml-dev[1] and www-html[2] related 
> > to MSXML in the context of XHTML m12n and 1.1, but I can't 
> > find any solution/workaround for the following error message 
> > (output by xmlint.exe, Microsoft's validator):
> > 
> >     A name was started with an invalid character.
> >     URL: file:///H:/eAIP/_dev/xhtml/eAIP/sample7-id.xml
> >     Line 00001: &&
> >     Pos  00002: -^
> > 
> > As you can expect, that file begins with <?xml ... and not 
> > with && or whatever.
> > 
> > A page[3] on microsoft.com explains this can be caused by 
> > encoding issues, but it is not my case: I checked the files 
> > at byte level to be sure (they are UTF-8). It doesn't seem to 
> > be an OS issue neither (same on NT4 & win98).
> > 
> > I suppose it's a bug in MSXML and I won't wait for MS to 
> > correct it for me
> > :-)
> > So my question is rather: what causes this error? Is there a 
> > workaround? (I am not against a small change in the DTD that 
> > would magically please MSXML.) Should I give up XHTML m12n 
> > and instead adapt XHTML 1.0 extensively to suit my needs? 
> > (not that I want to!)
> > 
> > Thanks for any help and sorry for cross-posting,
> > Benoit
> > 
> > [1] http://lists.xml.org/archives/xml-dev/200106/msg00502.html
> >     http://lists.xml.org/archives/xml-dev/200106/msg00616.html
> > [2] http://lists.w3.org/Archives/Public/www-html/2002May/0036.html
> > [3] 
> > http://msdn.microsoft.com/library/default.asp?url=/library/en-
> us/dnwebteam/h
> tml/webteam08062001.asp
> 
> ......................................................................
> Benoit MAISONNY                        benoit.maisonny@eurocontrol.int
> eAIP development & support   http://www.eurocontrol.int/ais/ahead/eaip
> EUROCONTROL                        Aeronautical Information Management
> 
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> 
> The list archives are at http://lists.xml.org/archives/xml-dev/
> 
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://lists.xml.org/ob/adm.pl>
> 
Received on Saturday, 29 June 2002 06:51:51 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 27 March 2012 18:15:52 GMT