RE: [XML] well-formedness & validation / section 4.2.8.1

> User agents will just try to make educated guesses, based on years of experience playing with such cases.

User agents on mobile devices are not as sophisticated as their counterparts on large desktop machines, and consequently do not necessarily do as good a job of guessing the author's intent. This possibly contributes to the reason that mobile Web content tends to be better formed than the tag soup we've come to expect in the rest of the Web. I do not believe that a well-designed adapting proxy would compound the problems of heuristic tidying. Indeed, it may do better, and so I still believe that delivering well-formed content to mobile devices is a good thing. 

Where a site has already been developed to deliver a good mobile experience, the proxy should leave it alone (especially if some signalling indicates that it should remain untouched). This would apply even if the origin server is delivering mal-formed markup. In this case, any loss of quality in user experience as a result of poor markup should be levelled at the origin server. I know that proxy providers may be tempted to step in and correct the poor markup, but if we don't adhere to the inter-system signalling then what's the point of the signalling?

What you seem to be suggesting is that if the origin server permits proxy-based adaptation (possibly implicitly), and the proxy only wants to modify a small part of the total payload, then the amendment should be restricted to the fragment that demands alteration, and the rest of the document should remain as it was even if it is mal-formed.

I have a little bit of sympathy with that position, but not enough to agree. An adapter that would go so far as to parse the original document and discover the parts that needed amendment would surely be capable of emitting a version of the entire document that is well-formed, without (any noticeable) loss of fidelity to the original author's intent.

Of course, I admit that MobileAware's technology (which is the only one upon which I can speak with authority) is an origin-server technology, so I cannot really speak for those whose solutions are generic proxies in the middle of the origin-client connection. Nevertheless I do have some idea of the complexities involved in manipulating content for mobile devices and I'd be surprised if any proper solution would support the idea of limiting the well-formed markup to a mere fragment.

I'd be interested to hear the opinions of proxy solution providers.

---Rotan.

-----Original Message-----
From: Francois Daoust [mailto:fd@w3.org] 
Sent: 01 December 2008 09:53
To: Rotan Hanrahan
Cc: casays@yahoo.com; public-bpwg-ct@w3.org
Subject: Re: [XML] well-formedness & validation / section 4.2.8.1

I agree in theory but not so much in practice.

Mandating well-formedness is pretty cool, but I suspect well-formed 
content is still the exception to the rule on the Web, especially with 
legacy Web sites (I understand that the mobile Web is by far "cleaner" 
in that respect than the old desktop one, but that is not the point here).

I am not aware of any mechanism that could be used on non well-formed 
content to make it well-formed while keeping the intent of the content 
provider intact. User agents will just try to make educated guesses, 
based on years of experience playing with such cases. Mandating the 
CT-proxy to do that as well would introduce another level of guesses, 
which does not seem to be such a good idea.

I can imagine a simple feature that CT-proxies could provide: linearize 
tables when the user agent does not support them. It could do that 
without fixing the well-formedness of the content, introducing as little 
changes as possible, and thus keeping the content provider's intent as 
intact as possible.

I think that it is a good reason not to impose the delivery of strictly 
valid markup.

What we could say is "When the initial content is well-formed, the 
altered content MUST be well-formed".

I add that a "SHOULD" statement is already strong, especially since we 
will require conformant deployments to justify the reasons for not 
following a SHOULD statement.

In the end, I guess I still fail to see the clear added value of 
requiring well-formedness in practice.

Francois.


Rotan Hanrahan wrote:
> Assuming that "content" refers to a markup language adhering to XML, the argument presented by Eduardo is entirely reasonable. I have already engaged in a public discussion on the subject, and came to the same conclusion.
> 
> ---Rotan
> 
> -----Original Message-----
> From: public-bpwg-ct-request@w3.org [mailto:public-bpwg-ct-request@w3.org] On Behalf Of Eduardo Casais
> Sent: 01 December 2008 03:58
> To: public-bpwg-ct@w3.org
> Subject: [XML] well-formedness & validation / section 4.2.8.1
> 
> 
> Regarding the discussion point on section 4.2.8.1:
> 
> "The altered content MUST be well-formed"and "The altered content SHOULD 
> validate to an appropriate published formal grammar".
> 
> Doubts have been expressed that the first sentence brings anything to the
> guidelines. Here are the arguments pro.
> 
> 1. One intent of the CTG is to make sure that CT-proxies deliver correct 
> content to terminals. Well-formedness is a formal, easily testable way to 
> express this; this is justification to keep the requirement, all the more so 
> since the XML standard states that violations of well-formedness are fatal 
> errors (section 1.2).
> 
> 2. There are good reasons not to impose the delivery of strictly valid markup
> (hence SHOULD in the second sentence), but no cases have been found where 
> syntactically incorrect XML (i.e. non well-formed form XML) is technically 
> necessary to make a terminal browser operate properly. Hence, nothing is 
> gained by suppressing the formulation on well-formedness.
> 
> 3. There are essentially two cases where it is admissible not to follow the
> second guideline:
> i. When there is no "appropriate published" grammar, notably because one is 
> using proprietary extensions not formally expressed in a DTD.
> ii. When a published formal grammar must be violated because of an erroneous
> behaviour (quirks) in the device's browser.
> If no requirement on well-formedness is imposed, there is formally no longer
> any minimum requirement on the content delivered to terminals as soon as one
> is freed, for whatever reason, from the (non-binding) validation guideline. So
> either the CT-proxies should do the maximum, or they are formally allowed not
> even to satisfy a minimum -- this is surely not the intent of CTG, but this is
> what suppressing the first sentence amounts to.
> 
> 4. Last but not least, there is a widespread expectation that content delivered
> to terminals should be valid, but also a widespread, long-standing explicit 
> requirement that it must be at a minimum well-formed. A sample of documents 
> about this point is listed in the appendix.
> 
> Conclusion: The first statement above on well-formedness does not harm and 
> does good. There are excellent reasons to enforce a minimum of syntactically 
> correct content to be delivered to terminals, and no concrete reasons not to 
> require it.
> 
> 
> E.Casais
> --------
> 
> Appendix: what guideline documents state about well-formedness and validation.
> 
> 
> W3C: Mobile Web Practices, 2008. Section 5.4.7: "[VALID_MARKUP] Create 
> documents that validate to published formal grammars. [...] If the page markup 
> is invalid this will result in unpredictable and possibly incomplete 
> presentation."
> 
> W3C: W3C mobileOK Basic Tests 1.0, 2008. Section 3.4: 
> "[...] If the DOCTYPE refers to a known XHTML version, validate against 
> that DOCTYPE and if invalid, warn. 
> [...] If (regardless of its stated DOCTYPE) the document does not validate 
> against the XHTML Basic 1.1 DTD: If (regardless of its stated DOCTYPE) it 
> does not validate against the XHTML-MP 1.2 DTD, FAIL.
> [...] If the Internet media type is "image/gif" or "image/jpeg", and the 
> resource is not valid (see 2.4.9 Validity), FAIL.
> [...] If the Internet media type is "text/css" and the content is not valid 
> CSS (see 2.4.9 Validity), FAIL."
> This proposed recommendation has the strongest wording of all documents, since
> it requires validation -- also for linked resources.
> 
> RIM: BlackBerry Browser Version 4.3 Content Developer Guide, 2008. Section
> "Creating XHTML-MP–compliant sites": "[...] Syntax: Although it uses many 
> standard HTML tags, XHTML-MP follows XML syntax conventions. This means that,
> unlike HTML, XHTML-MP content must be well-formed."
> Section "WML design tips": "[...] Syntax: Use proper syntax. WML is an XML 
> language; therefore, content must be well-formed."
> Documents back to version 4.1 (2005) contain the same sentence. The document
> on version 4.6 does not mention validation or well-formedness.
> 
> Google: Creating a Google-friendly site: Best practices, 2008. Section "Adding 
> a mobile site to Google": "In addition, Google provides Webmaster Guidelines in
> order to help webmasters design and configure sites in a way that Google can 
> find, index and rank. [...] This page reiterates several of those guidelines 
> and adds some new ones for mobile web sites.
> -- Use well-formed markup (WML, cHTML, XHTML Basic or XHTML MP).
> -- Validate your markup. For example, the W3C Validator can verify that your 
> XHTML pages adhere to the markup's syntax.
> -- Use the right DOCTYPE for the markup language you are using."
> 
> Luca Passani: Global Authoring Practices for the Mobile Web, 2008. Section 3.1:
> "[VALID_XHTMLMP] Make sure that mobile pages are valid XHTML Mobile Profile 1.0.
> Rationale: Most microbrowsers today are reasonably tolerant of validity and 
> well-formdness errors in mark-up when it comes to HTML-based pages. In spite of
> this, there are still devices out there that will break on not strictly 
> well-formed XHTML MP or not valid XHTML-MP. For this reason, it is recommended 
> that applications' pages are tested with a validator. [...] Most microbrowsers 
> are tolerant to minor validation and well-formedness errors, but this will not
> guarantee that an application would not appear broken when accessed with a 
> different microbrowser. [...] While the usage of extra XHTML attributes will 
> make an XHTML-MP page non-valid, opting for non-valid, but effective, mark-up 
> may be preferable in some cases. Making sure that a page is well-formed is 
> still recommended, since well-formed XHTML-MP mark-up is less likely to cause 
> troubles than invalid XHTML-MP."
> 
> dotMobi: Mobile Web Developers Guide, 2007. Section "Always use well-formed
> code": "For those not familiar with XHTML, the first thing to know is that all
> code should validate (according to the doctype) and be well-formed (a valid XML
> document)."
> 
> dotMobi: Switch On! Web Developer Guide, 2006. Section 4.1 "Mandatory registrant
> rules": "[dotMobi] Requests for URIs consisting only of "example.mobi" or 
> "www.example.mobi" must result in a response that is encoded in a format the 
> device supports or valid XHTML-Mobile Profile 1.0 or later released version 
> [XHTMLMP], where "example" stands for any domain name." Section 4.2.4 "Page
> definition": "[W3C VALID_MARKUP] Create documents that validate to published 
> formal grammars."
> 
> SonyEricsson: Web browser in Sony Ericsson phones, 2006. Section "DOCTYPE":
> "[...] NOTE: When the browser renders the document, it is vital that the syntax
> is correct. If the document is not syntactically correct, it may not be 
> displayed correctly. The DTD, a strict definition of the valid syntax, is
> what guarantees interoperability between browsers from different vendors."
> Previous versions of the document dating back to 2003 contain the same 
> statement. The twin document "NetFront Web browser in Sony Ericsson phones"
> never mentions validation or well-formedness.
> 
> SonyEricsson: P800 Browsing Services, 2003. Section "The anatomy of an XHTML
> document": "[...] Validation is the process of checking that the document is 
> grammatically correct. Validation should take place before the document is sent
> to the browser. Many Web authoring tools validate the document while the 
> author is writing the document. The browser expects that the XHTML document is 
> valid. An invalid XHTML document will not be displayed."
> 
> Microsoft: Windows Mobile Version 5.0 SDK, 2006. Section "HTML Support for
> Internet Explorer Mobile": "Internet Explorer Mobile is more reliant on well
> formed HTML than is Internet Explorer on a desktop computer. Internet Explorer 
> on the desktop computer performs additional work to correct nonvalid HTML, but 
> in the interest of performance and memory usage, Internet Explorer Mobile 
> performs much less of such auto-correction."
> The document "SDK Documentation for Windows Mobile-Based Pocket PCs", 2005, 
> contains the same sentence.
> 
> Nokia: Nokia Web browser guideline, 2007. Section 4.1: "[...] Use different 
> validators (such as the World Wide Web Consortium [W3C] service at 
> http://validator.w3.org/) for ensuring compliance to markup standards."
> 
> Nokia: Series 40 Platform: Designing XHTML Mobile Profile Content, 2006.
> Series 60 Platform: Designing XHTML Mobile Profile Content, 2004.
> XHTML Guidelines, 2004.
> Section 2.13/3.19: "[...] XHTML code should be validated to avoid any 
> interoperability problems and to enhance performance."
> 
> Nokia: Series 80 Developer Platform 2.0: Designing XHTML/HTML Content, 2004...
> Designing XHTML/HTML Content For The Nokia 7710 Device, 2004.
> Developer Platform 2.0 for Series 90: Designing XHTML/HTML Content, 2003.
> Section 6.1: "HTML/XHTML code should be validated to avoid any interoperability
> problems and to enhance performance. Valid code is always less prone to 
> incompatibilities and errors than pages that contain erroneous syntax."
> 
> Nokia: WAP Service Developer's Guide for Nokia Series 40 Phones with WML 
> Browser, 2003.
> WAP Service Developer's Guide for Nokia Series 30 Phones with WML Browser, 2003.
> WAP Service Developer’s Guide for Nokia 9200 Communicator Series, 2002. 
> Developer Platform 2.0 for Series 90: Designing XHTML/HTML Content, 2002.
> WAP Service Developer’s Guide for Nokia 9110i, 2000.
> WAP Service Developer’s Guide for Nokia 6210 and 6250, 2000.
> Section 4.2/4.1/2.2: "There are several XML validators available that validate
> your documents against WML Document Type Definition. It is recommended that 
> authors validate their WAP pages, because invalid WML is always treated as an 
> error and discarded (that is, it is not shown to the user!)."
> The documents 
> WAP Service Developer’s Guide for Nokia Series 60 Phones with XHTML Browser, 
> 2003.
> WAP Service Developer’s Guide for Nokia Series 40 Phones with XHTML Browser,
> 2003.
> WAP Service Developer’s Guide for Nokia Series 30 Phones with XHTML Browser,
> 2003.
> simply list available validation tools.
> 
> NEXTEL WAP 2.0 STYLE GUIDE, 2005. Section 14: "It is important to write valid 
> XHTML code. Valid XHTML markup ensures that your documents will render as 
> expected. To help you achieve this, the following website provides an online 
> validator to help you check your XHTML code: http://validator.w3.org." The
> remainder of the section makes it clear that well-formedness is meant instead
> of validation.
> 
> ATT Wireless, XHTML Programming Guide, 2005. Section 2.1: "[...] XML is a 
> rigorous and strict language. A Document Type Definition (DTD) or an XML Schema
> is used to formally define the data elements and their relationships. This 
> degree of formality is not followed by HTML, but must be followed by any 
> XML-compliant language. While this may seem like a burden on the developer, 
> numerous validation tools are available to automate the validation of any 
> XML-compliant language. Furthermore, by enforcing proper content, unnecessarily
> robust browsers can be avoided which benefits mobile device performance and 
> functionality in the long run."
> 
> Sprint: Sprint PCS Mobile Browser Technology Paper, 2004. Section 7.1: "All web
> pages using Sprint Code Standards must validate as XHTML Transitional 1.0. The 
> differences between HTML 4 and XHTML are few, the key difference being 
> responsible code. Browsers have historically been forgiving of lax coding 
> habits, such as not closing the <p> element. [...]" The ensuing description
> of criteria for XHTML deals with well-formedness rather than validation.
> 
> Sprint: Usability Requirements for XHTML Basic Applications, 2003. Section 5
> "Strong recommendations": "Ensure your code is valid. The XHTML Basic validator
> is found at http://validator.w3.org/ [...]".
> --------
> 
> 
> 
> 
>       
> 
> 
> 

Received on Monday, 1 December 2008 17:08:13 UTC