Re: ampersands, angle brackets, errors, warnings and xml

Hi Marc,

Thanks for sending this message, especially after obvious serious  
research.

I think your conclusions are correct (see below for details), but  
please note that I am not as much of an expert as others on this  
list. Hopefully if I say something completely wrong, they'll jump in :).

On 12 Jul 2005, at 00:28, Marc Richards wrote:
> 1) Should the validator be throwing an error instead of a warning  
> whenever it encounters an ampersand or left angle bracket as data  
> for a document served as application/xhtml+xml? i.e. was there a  
> conscious decision made to only throw a warning or is this simply  
> one of the XML parser limitations.

As far as I know, it is not legal in XML and authorized in SGML (with  
shorttags). Therefore, in XML mode it should throw an error. Whether  
it should be a warning in SGML mode is source of controversy : you'll  
get an equal number of people asking for it, for the sake of quality,  
and of people complaining that the validator should not dare confuse  
people with warnings for a valid construct.

Instead, what happens is:
- openSP's XML mode is "limited" (you saw the note)
- in XML mode, openSP throws a warning for such a construct
- in SGML mode, openSP accepts such constructs, unless asked to
- XHTML is always parsed using XML mode (see also Bug 1500)

[Bug 1500] http://www.w3.org/Bugs/Public/show_bug.cgi?id=1500


> If this *is* one of the XML limitations then I think it would be  
> helpful to compile a short list of common limitations and list them  
> on a w3c page in plain English.  I have read the OpenSP page[4] a  
> couple times and I am still not sure whether or not recognizing "<"  
> and "&" as invalid is a limitation of the parser; The language on  
> that page is fairly technical.  The validator could link to this  
> internal page directly and that page would then link to the OpenSP  
> page as well.

This could be a good idea. How about starting a scratchpad on the  
wiki, e.g somewhere like: http://esw.w3.org/topic/MarkupValidator/ 
XML_Limitations and motivate people on the list to contribute?

> 2) Why are you issuing a warning for the use of ampersands and let  
> angle brackets in xhtml but not html.  If the warning is in fact  
> saying "this may be valid in some contexts, but it is recommended  
> to use &amp; or &lt;" then this is an SGML warning and should be  
> shown for both HTML and XHTML as text/html.  Ideally with and  
> example like "R & D valid, R&D invalid".  Is there a bug open for  
> issuing the warning for html doctypes as well?

See above, my remark on the fussy mode. You could search this list  
for "fussy" and get an idea of the discussions that happened a while  
ago on this topic.

> Are there open bugs you can point me to? Are there bugs I should file?

I think 798 and 1500 are the relevant ones. If you think they do not  
cover the whole span of the issue, feel free to open others.

Hope this answered your questions.

Regards,
-- 
olivier

Received on Thursday, 14 July 2005 06:24:45 UTC