W3C home > Mailing lists > Public > public-exi@w3.org > March 2016

Re: AW: Whitespace preservation mode; got HTML5 XHTML schema?

From: Don Brutzman <brutzman@nps.edu>
Date: Mon, 29 Feb 2016 16:18:52 -0800
To: "Peintner, Daniel (ext)" <daniel.peintner.ext@siemens.com>, Takuki Kamiya <tkamiya@us.fujitsu.com>
CC: "public-exi@w3.org" <public-exi@w3.org>
Message-ID: <56D4DFEC.5030508@nps.edu>
1.  Editorial suggestion.  Excerpt:

	Not in all situations it is possible to respect whitespace handling rules.

Change for readability:

	It is not possible to respect whitespace handling rules in all situations.

2.  Not quite the same as your question, but a possible approach follows.  A predominant use case for whitespace use on the Web is likely HTML.

HTML5 reconciles all differences between HTML syntax (which might not be well-formed) and XHTML syntax (which is well-formed XML).  This is done explicitly by aligning each syntax with the DOM.  Consistent rules about whitespace preservation can likely be deduced from the recommendation.

This section is non-normative.

This specification defines an abstract language for describing documents and applications, and some APIs for interacting with in-memory representations of resources that use this language.

The in-memory representation is known as "DOM HTML", or "the DOM" for short.

There are various concrete syntaxes that can be used to transmit resources that use this abstract language, two of which are defined in this specification.

The first such concrete syntax is the HTML syntax. This is the format suggested for most authors. It is compatible with most legacy Web browsers. If a document is transmitted with the text/html MIME type, then it will be processed as an HTML document by Web browsers. This specification defines version 5.0 of the HTML syntax, known as "HTML 5".

The second concrete syntax is the XHTML syntax, which is an application of XML. When a document is transmitted with an XML MIME type, such as application/xhtml+xml, then it is treated as an XML document by Web browsers, to be parsed by an XML processor. Authors are reminded that the processing for XML and HTML differs; in particular, even minor syntax errors will prevent a document labeled as XML from being rendered fully, whereas they would be ignored in the HTML syntax. This specification defines version 5.0 of the XHTML syntax, known as "XHTML 5".

The DOM, the HTML syntax, and the XHTML syntax cannot all represent the same content. For example, namespaces cannot be represented using the HTML syntax, but they are supported in the DOM and in the XHTML syntax. Similarly, documents that use the noscript feature can be represented using the HTML syntax, but cannot be represented with the DOM or in the XHTML syntax. Comments that contain the string "-->" can only be represented in the DOM, not in the HTML and XHTML syntaxes.

As a result, it might be possible for EXI to provide compression for the vast majority of HTML5 documents if an XML schema were available for the HTML5 XHTML syntax.

Presumably such a schema would also support XML Encryption and XML Digital Signature as well.

No such schema is provided in the HTML5 Recommendation.

Found the following reference via search, which mentions possible informal alternatives.

	"Is there an xhtml.xsd equivalent available for HTML5?"

Found a W3C Note from 2002:

	XHTML 1.0 in XML Schema - W3C

Wondering if anyone knows whether any work has been done, or is planned, on attempting to define an XML Schema for HTML5 XHTML Syntax?  Or perhaps there is a list of incompatibilities somewhere?

If others in the EXI group thought it valuable, perhaps we should suggest creation of such a schema.  Having such a schema-aware EXI capability would certainly be useful for some applications.

On 1/11/2016 9:27 AM, Peintner, Daniel (ext) wrote:
> All,
> I started to define whitespace handling rules in the spirit of the current TTFMS rules [1].
> Please find a first draft here [2].
> I think we could add advise for users
> * to use preserve.LexicalValue if encoding fails
> * to use xml:space="preserve" if canonicalization is
>    expected to preserve as much whitespaces as possible
> Do you have any comment and/or feedback.
> Thanks,
> -- Daniel
> [1] https://lists.w3.org/Archives/Public/public-exi/2015Oct/0008.html
> [2] https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#whitespaceHandling
> ________________________________
> Von: Takuki Kamiya [tkamiya@us.fujitsu.com]
> Gesendet: Dienstag, 1. Dezember 2015 03:51
> An: public-exi@w3.org
> Betreff: Whitespace preservation mode
> Hi,
> When there is a type associated with an element, content type information
> gives you an idea as to what to do with whitespaces during encoding.
> However, in schema-less situations, the best you can do is to guess what
> is expected to do, unless xml:space is specified. I am not very sure if
> this heuristics is always correct.
> I think we may need to provide a canonicalization mode where canonicalization
> is expected to preserve as much whitespaces as possible.
> Thank you,
> Takuki Kamiya
> Fujitsu Laboratories of America

all the best, Don
Don Brutzman  Naval Postgraduate School, Code USW/Br       brutzman@nps.edu
Watkins 270,  MOVES Institute, Monterey CA 93943-5000 USA   +1.831.656.2149
X3D graphics, virtual worlds, navy robotics http://faculty.nps.edu/brutzman
Received on Tuesday, 1 March 2016 00:19:24 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 00:19:24 UTC