Re: ampersands, angle brackets, errors, warnings and xml from Marc Richards on 2005-07-18 (www-validator@w3.org from July 2005)

From: Marc Richards <contact_marcos@yahoo.es>
Date: Mon, 18 Jul 2005 09:59:19 -0400
To: olivier Thereaux <ot@w3.org>
Cc: www-validator@w3.org
Message-ID: <42DBB5B7.5070102@yahoo.es>
Oops, looks like I sent that email from a mis-configured account.  The 
from name should have been "Marc Richards" not Marcos Rubino.

Marc


Marcos Rubino wrote:

>
> Hi Olivier,
>
> Thanks for your response.  I did quite a bit more digging and I now 
> think I understand the situation a little better.  See my responses 
> inline.
>
> olivier Thereaux wrote:
>
>> Hi Marc,
>>
>> Thanks for sending this message, especially after obvious serious  
>> research.
>>
>> I think your conclusions are correct (see below for details), but  
>> please note that I am not as much of an expert as others on this  
>> list. Hopefully if I say something completely wrong, they'll jump in :).
>>
>> On 12 Jul 2005, at 00:28, Marc Richards wrote:
>>
>>> 1) Should the validator be throwing an error instead of a warning  
>>> whenever it encounters an ampersand or left angle bracket as data  
>>> for a document served as application/xhtml+xml? i.e. was there a  
>>> conscious decision made to only throw a warning or is this simply  
>>> one of the XML parser limitations.
>>
>>
>>
>> As far as I know, it is not legal in XML and authorized in SGML 
>> (with  shorttags). Therefore, in XML mode it should throw an error. 
>> Whether  it should be a warning in SGML mode is source of controversy 
>> : you'll  get an equal number of people asking for it, for the sake 
>> of quality,  and of people complaining that the validator should not 
>> dare confuse  people with warnings for a valid construct.
>>
>> Instead, what happens is:
>> - openSP's XML mode is "limited" (you saw the note)
>> - in XML mode, openSP throws a warning for such a construct
>> - in SGML mode, openSP accepts such constructs, unless asked to
>> - XHTML is always parsed using XML mode (see also Bug 1500)
>>
>> [Bug 1500] http://www.w3.org/Bugs/Public/show_bug.cgi?id=1500
>
>
> Isn't bug 1500 misdirected?  Correct me if I am wrong here, but even 
> if the XHTML as text/html pages were processed by the validator in 
> SGML mode with an XHTML DTD they would still be "valid" (since XML is 
> a subset of SGML) and as a result, bugs would still be filed agaist 
> Mozilla, Opera and Safari as long as people weren't taking advantage 
> of the techniques outlined in appendix C.
>
> It may be useful to offer a XHTML 1.0 Appendix C conformance testing 
> service (and it seemd there has been some forays in that direction[1]) 
> so that people could get an idea of how well their pages worked in 
> HTML4 UAs, but that doesn't mean that the validator is doing anything 
> wrong.
>
> [1]http://qa-dev.w3.org/~bjoern/appendix-c/validator/
>
> A legitimate question still remains: Should the validator be parsing 
> XHTML served as text/html in SGML mode or XML mode?
>
> While I think it makes sense for standard HTML4 user-agents to process 
> text/html documents in SGML mode for backwards compatibility, the 
> majority of the users who test their XHTML pages using the validator 
> are looking for forwards compatibility and the well formedness that 
> XML brings to the table.
>
> In an ideal world, HTML4 only UAs would be served the page as 
> text/html and XHTML UAs (including the validator) would be served the 
> same page as application/xhtml+xml, however the fact of the matter is 
> that
> (a) most people don't have content negotiation setup
> (b) serving docs as application/xhtml+xml to current browsers that 
> support it is very tricky/error prone (javascript issues, CSS issues, 
> browser issues, etc)
> (c) people have come to expect the validator to test XHTML pages for 
> xml well-formedness
>
> Given the way things stand now I think the best default is for the the 
> validator to parse and evaluate the pages as XML.  I can't see any 
> value to anyone (end-users, web-developers, UA-developers) in 
> evaluating the pages as SGML instead of as XML while still using the 
> XHTML DTD. If you are testing XML well-formedness, you already have 
> SGML well-formedness covered (right?).  There is some value in testing 
> Appendix C conformance, but that is a separate issue.
>
>
>>> If this *is* one of the XML limitations then I think it would be  
>>> helpful to compile a short list of common limitations and list them  
>>> on a w3c page in plain English.  I have read the OpenSP page[4] a  
>>> couple times and I am still not sure whether or not recognizing "<"  
>>> and "&" as invalid is a limitation of the parser; The language on  
>>> that page is fairly technical.  The validator could link to this  
>>> internal page directly and that page would then link to the OpenSP  
>>> page as well.
>>
>>
>>
>> This could be a good idea. How about starting a scratchpad on the  
>> wiki, e.g somewhere like: http://esw.w3.org/topic/MarkupValidator/ 
>> XML_Limitations and motivate people on the list to contribute?
>
>
> Done[2].  Everybody feel free to add, subtract, enhance.
>
> [2] http://esw.w3.org/topic/MarkupValidator/XML_Limitations
>
>>> 2) Why are you issuing a warning for the use of ampersands and let  
>>> angle brackets in xhtml but not html.  If the warning is in fact  
>>> saying "this may be valid in some contexts, but it is recommended  
>>> to use &amp; or &lt;" then this is an SGML warning and should be  
>>> shown for both HTML and XHTML as text/html.  Ideally with and  
>>> example like "R & D valid, R&D invalid".  Is there a bug open for  
>>> issuing the warning for html doctypes as well?
>>
>>
>>
>> See above, my remark on the fussy mode. You could search this list  
>> for "fussy" and get an idea of the discussions that happened a while  
>> ago on this topic.
>
>
> Is it technically possible to get the validator to flag & and < as 
> warnings in SGML mode? I couldn't find a bug for this one.
>
> If it is technically doable, I think there is less likelyhood of 
> backlash from the community (ala fussy mode) if
> - users still got the bright green "this page is valid" at the top of 
> the page
> - the color of the warnings were made a little more neutral (yellow 
> instead of pale red).
> - the warning text is clear and helpful.
>
> I am not sure how much utility this solution would really have, so I 
> am not terribly gung ho about it, but I will file a bug if people 
> think it is likely to help users avoid potential errors.
>
>>> Are there open bugs you can point me to? Are there bugs I should file?
>>
>>
>>
>> I think 798 and 1500 are the relevant ones. If you think they do not  
>> cover the whole span of the issue, feel free to open others.
>
>
> Assuming that we are agreed about evaluating XHTML documents served as 
> text/html in XML mode, is it technically possible to get the validator 
> to flag & and < as errors? Unless I am mistaken, this seems to be the 
> most obvious area where UAs choke on the well-formedness test (when 
> parsing as XML), but the validator just lets you off with a warning.  
> Of course it would be very imporant to make it clear to users why 
> their document doesn't validate using language they can understand, 
> plety of examples, and links to more detailed information.
>
> Is there a bug open for this?  Is it likely to fixed without major 
> architectural changes? Bug 798 seems to be mislabeled.  As far as the 
> soulution that was found is concerned it should be titled "warnings 
> are mistakenly suppresed on valid pages".
>
>> Hope this answered your questions.
>>
>
> Sure did, which of course led to more questions.  Thanks for taking 
> the time answer.
>
>
> Marc
>
>
>
Received on Monday, 18 July 2005 14:24:21 UTC