[XML] well-formedness & validation / section 4.2.8.1

Regarding the discussion point on section 4.2.8.1:

"The altered content MUST be well-formed"and "The altered content SHOULD 
validate to an appropriate published formal grammar".

Doubts have been expressed that the first sentence brings anything to the
guidelines. Here are the arguments pro.

1. One intent of the CTG is to make sure that CT-proxies deliver correct 
content to terminals. Well-formedness is a formal, easily testable way to 
express this; this is justification to keep the requirement, all the more so 
since the XML standard states that violations of well-formedness are fatal 
errors (section 1.2).

2. There are good reasons not to impose the delivery of strictly valid markup
(hence SHOULD in the second sentence), but no cases have been found where 
syntactically incorrect XML (i.e. non well-formed form XML) is technically 
necessary to make a terminal browser operate properly. Hence, nothing is 
gained by suppressing the formulation on well-formedness.

3. There are essentially two cases where it is admissible not to follow the
second guideline:
i. When there is no "appropriate published" grammar, notably because one is 
using proprietary extensions not formally expressed in a DTD.
ii. When a published formal grammar must be violated because of an erroneous
behaviour (quirks) in the device's browser.
If no requirement on well-formedness is imposed, there is formally no longer
any minimum requirement on the content delivered to terminals as soon as one
is freed, for whatever reason, from the (non-binding) validation guideline. So
either the CT-proxies should do the maximum, or they are formally allowed not
even to satisfy a minimum -- this is surely not the intent of CTG, but this is
what suppressing the first sentence amounts to.

4. Last but not least, there is a widespread expectation that content delivered
to terminals should be valid, but also a widespread, long-standing explicit 
requirement that it must be at a minimum well-formed. A sample of documents 
about this point is listed in the appendix.

Conclusion: The first statement above on well-formedness does not harm and 
does good. There are excellent reasons to enforce a minimum of syntactically 
correct content to be delivered to terminals, and no concrete reasons not to 
require it.


E.Casais
--------

Appendix: what guideline documents state about well-formedness and validation.


W3C: Mobile Web Practices, 2008. Section 5.4.7: "[VALID_MARKUP] Create 
documents that validate to published formal grammars. [...] If the page markup 
is invalid this will result in unpredictable and possibly incomplete 
presentation."

W3C: W3C mobileOK Basic Tests 1.0, 2008. Section 3.4: 
"[...] If the DOCTYPE refers to a known XHTML version, validate against 
that DOCTYPE and if invalid, warn. 
[...] If (regardless of its stated DOCTYPE) the document does not validate 
against the XHTML Basic 1.1 DTD: If (regardless of its stated DOCTYPE) it 
does not validate against the XHTML-MP 1.2 DTD, FAIL.
[...] If the Internet media type is "image/gif" or "image/jpeg", and the 
resource is not valid (see 2.4.9 Validity), FAIL.
[...] If the Internet media type is "text/css" and the content is not valid 
CSS (see 2.4.9 Validity), FAIL."
This proposed recommendation has the strongest wording of all documents, since
it requires validation -- also for linked resources.

RIM: BlackBerry Browser Version 4.3 Content Developer Guide, 2008. Section
"Creating XHTML-MP–compliant sites": "[...] Syntax: Although it uses many 
standard HTML tags, XHTML-MP follows XML syntax conventions. This means that,
unlike HTML, XHTML-MP content must be well-formed."
Section "WML design tips": "[...] Syntax: Use proper syntax. WML is an XML 
language; therefore, content must be well-formed."
Documents back to version 4.1 (2005) contain the same sentence. The document
on version 4.6 does not mention validation or well-formedness.

Google: Creating a Google-friendly site: Best practices, 2008. Section "Adding 
a mobile site to Google": "In addition, Google provides Webmaster Guidelines in
order to help webmasters design and configure sites in a way that Google can 
find, index and rank. [...] This page reiterates several of those guidelines 
and adds some new ones for mobile web sites.
-- Use well-formed markup (WML, cHTML, XHTML Basic or XHTML MP).
-- Validate your markup. For example, the W3C Validator can verify that your 
XHTML pages adhere to the markup's syntax.
-- Use the right DOCTYPE for the markup language you are using."

Luca Passani: Global Authoring Practices for the Mobile Web, 2008. Section 3.1:
"[VALID_XHTMLMP] Make sure that mobile pages are valid XHTML Mobile Profile 1.0.
Rationale: Most microbrowsers today are reasonably tolerant of validity and 
well-formdness errors in mark-up when it comes to HTML-based pages. In spite of
this, there are still devices out there that will break on not strictly 
well-formed XHTML MP or not valid XHTML-MP. For this reason, it is recommended 
that applications' pages are tested with a validator. [...] Most microbrowsers 
are tolerant to minor validation and well-formedness errors, but this will not
guarantee that an application would not appear broken when accessed with a 
different microbrowser. [...] While the usage of extra XHTML attributes will 
make an XHTML-MP page non-valid, opting for non-valid, but effective, mark-up 
may be preferable in some cases. Making sure that a page is well-formed is 
still recommended, since well-formed XHTML-MP mark-up is less likely to cause 
troubles than invalid XHTML-MP."

dotMobi: Mobile Web Developers Guide, 2007. Section "Always use well-formed
code": "For those not familiar with XHTML, the first thing to know is that all
code should validate (according to the doctype) and be well-formed (a valid XML
document)."

dotMobi: Switch On! Web Developer Guide, 2006. Section 4.1 "Mandatory registrant
rules": "[dotMobi] Requests for URIs consisting only of "example.mobi" or 
"www.example.mobi" must result in a response that is encoded in a format the 
device supports or valid XHTML-Mobile Profile 1.0 or later released version 
[XHTMLMP], where "example" stands for any domain name." Section 4.2.4 "Page
definition": "[W3C VALID_MARKUP] Create documents that validate to published 
formal grammars."

SonyEricsson: Web browser in Sony Ericsson phones, 2006. Section "DOCTYPE":
"[...] NOTE: When the browser renders the document, it is vital that the syntax
is correct. If the document is not syntactically correct, it may not be 
displayed correctly. The DTD, a strict definition of the valid syntax, is
what guarantees interoperability between browsers from different vendors."
Previous versions of the document dating back to 2003 contain the same 
statement. The twin document "NetFront Web browser in Sony Ericsson phones"
never mentions validation or well-formedness.

SonyEricsson: P800 Browsing Services, 2003. Section "The anatomy of an XHTML
document": "[...] Validation is the process of checking that the document is 
grammatically correct. Validation should take place before the document is sent
to the browser. Many Web authoring tools validate the document while the 
author is writing the document. The browser expects that the XHTML document is 
valid. An invalid XHTML document will not be displayed."

Microsoft: Windows Mobile Version 5.0 SDK, 2006. Section "HTML Support for
Internet Explorer Mobile": "Internet Explorer Mobile is more reliant on well
formed HTML than is Internet Explorer on a desktop computer. Internet Explorer 
on the desktop computer performs additional work to correct nonvalid HTML, but 
in the interest of performance and memory usage, Internet Explorer Mobile 
performs much less of such auto-correction."
The document "SDK Documentation for Windows Mobile-Based Pocket PCs", 2005, 
contains the same sentence.

Nokia: Nokia Web browser guideline, 2007. Section 4.1: "[...] Use different 
validators (such as the World Wide Web Consortium [W3C] service at 
http://validator.w3.org/) for ensuring compliance to markup standards."

Nokia: Series 40 Platform: Designing XHTML Mobile Profile Content, 2006.
Series 60 Platform: Designing XHTML Mobile Profile Content, 2004.
XHTML Guidelines, 2004.
Section 2.13/3.19: "[...] XHTML code should be validated to avoid any 
interoperability problems and to enhance performance."

Nokia: Series 80 Developer Platform 2.0: Designing XHTML/HTML Content, 2004...
Designing XHTML/HTML Content For The Nokia 7710 Device, 2004.
Developer Platform 2.0 for Series 90: Designing XHTML/HTML Content, 2003.
Section 6.1: "HTML/XHTML code should be validated to avoid any interoperability
problems and to enhance performance. Valid code is always less prone to 
incompatibilities and errors than pages that contain erroneous syntax."

Nokia: WAP Service Developer's Guide for Nokia Series 40 Phones with WML 
Browser, 2003.
WAP Service Developer's Guide for Nokia Series 30 Phones with WML Browser, 2003.
WAP Service Developer’s Guide for Nokia 9200 Communicator Series, 2002. 
Developer Platform 2.0 for Series 90: Designing XHTML/HTML Content, 2002.
WAP Service Developer’s Guide for Nokia 9110i, 2000.
WAP Service Developer’s Guide for Nokia 6210 and 6250, 2000.
Section 4.2/4.1/2.2: "There are several XML validators available that validate
your documents against WML Document Type Definition. It is recommended that 
authors validate their WAP pages, because invalid WML is always treated as an 
error and discarded (that is, it is not shown to the user!)."
The documents 
WAP Service Developer’s Guide for Nokia Series 60 Phones with XHTML Browser, 
2003.
WAP Service Developer’s Guide for Nokia Series 40 Phones with XHTML Browser,
2003.
WAP Service Developer’s Guide for Nokia Series 30 Phones with XHTML Browser,
2003.
simply list available validation tools.

NEXTEL WAP 2.0 STYLE GUIDE, 2005. Section 14: "It is important to write valid 
XHTML code. Valid XHTML markup ensures that your documents will render as 
expected. To help you achieve this, the following website provides an online 
validator to help you check your XHTML code: http://validator.w3.org." The
remainder of the section makes it clear that well-formedness is meant instead
of validation.

ATT Wireless, XHTML Programming Guide, 2005. Section 2.1: "[...] XML is a 
rigorous and strict language. A Document Type Definition (DTD) or an XML Schema
is used to formally define the data elements and their relationships. This 
degree of formality is not followed by HTML, but must be followed by any 
XML-compliant language. While this may seem like a burden on the developer, 
numerous validation tools are available to automate the validation of any 
XML-compliant language. Furthermore, by enforcing proper content, unnecessarily
robust browsers can be avoided which benefits mobile device performance and 
functionality in the long run."

Sprint: Sprint PCS Mobile Browser Technology Paper, 2004. Section 7.1: "All web
pages using Sprint Code Standards must validate as XHTML Transitional 1.0. The 
differences between HTML 4 and XHTML are few, the key difference being 
responsible code. Browsers have historically been forgiving of lax coding 
habits, such as not closing the <p> element. [...]" The ensuing description
of criteria for XHTML deals with well-formedness rather than validation.

Sprint: Usability Requirements for XHTML Basic Applications, 2003. Section 5
"Strong recommendations": "Ensure your code is valid. The XHTML Basic validator
is found at http://validator.w3.org/ [...]".
--------

Received on Monday, 1 December 2008 03:59:40 UTC