Re: content type for XHTML fragments: reformulated

Hi,

Am Dienstag, 17. Januar 2006 21:34 schrieb Garret Wilson:
> Patrick H. Lauke wrote:
> > Garret Wilson wrote:
> >>  The former is a fragment of a web page on XHTML
> >> explaining how to use the <em> element, and actually displays the
> >> literal string "<em>foo</em>".
> >
> > Should the < and > not be encoded as &lt; and &gt; in that particular
> > case?
>
> No! If you have a text file (e.g. "instructions.txt" written with
> notepad) containing the sentence, "this is how you use the emphasis tag:
> <em>foo</em>", should < and > be encoded? No. In fact, look at the
> source of this very email---you'll see that < and > are not encoded.
> That's what plain text means.
>
> Now, the application when constructing the larger XHTML document *will*
> have to encode those characters in the *destination* document. Those
> characters will not be encoded in the *source* text/plain fragment,
> though. (If the source fragment is an XHTML fragment, on the other hand,
> those characters, if meant to be taken literally rather than interpreted
> as markup, would need to be encoded.)
>
> Put another way, the effective text of the following two fragments are
> identical:
>
> "<em>foo</em>" (content type: text/plain)
> "&lt;em&gt;foo&lt;/em&gt;" (content type: XHTML fragment)
Yes.

> (Nit-picky point: the two fragments above don't technically represent
> identical content, because the former represents a string and the latter
> represents an XML Text node, but they should result in identical text in
> the destination document.)
Well, even from an XML point of view they are the same. For text/plain I'd 
assume an XInclusion of parse=text, while for fragments I'd expect an an 
entity reference substitution or, if possible, an XInclusion of parse=xml. 
The resulting character sequence in the text node would actually be equal for 
the example.

> It is impossible to represent the XHTML fragment "<em>foo</em>" in a
> text/plain fragment, because plain text (naturally) has no concept of
> syntactical structure.

On the text/plain issue: text/plain should be used when the message body that 
was labelled as Content-Type: text/plain actually is meant to be interpreted 
as plain text. For an XHTML fragment that is served for further processing 
where it will be put together with other fragments to form a real XHTML 
document, text/plain is not appropriate. If the fragments are external 
entites in XML sense, application/xml-external-parsed-entity fits best, and 
application/octet-stream would still be much better than text/plain for that 
case.

From an XInclude / Entity reference substitution point of view, I'd assume 
that text/plain is meant to be included with parse=text, while 
application/xml-external-parsed-entity is parse=xml and 
application/octet-stream is undetermined.

Anyway, I'd say a general answer whether the characters need or need not to be 
encoded cannot be given.
If the XHTML fragment is included as part of an XML document, encoding the 
characters that are part of the XHTML markup is not a good idea. When the 
XHTML is part of the XML markup of the containing document, XPath and XQuery 
can be used for the XHTML, if the XHTML markup is encoded, that's impossible.

Also, if XHTML is contained as markup, schema or even dtd validation of the 
XHTML fragment is possible.

Some more arguments for not encoding markup of XHTML fragments.

-- 
Christian Hujer
Free software developer
E-Mail: Christian.Hujer@itcqis.com
WWW: http://www.itcqis.com/ http://daimonin.sf.net/

Received on Tuesday, 17 January 2006 21:33:58 UTC