Re: Unclear what: "For interoperability, authors are advised to avoid optional features of XML" means.

Maciej Stachowiak wrote:
>
>
> On Dec 1, 2007, at 2:55 AM, Dean Edridge wrote:
>
>>
>> From: http://www.w3.org/html/wg/html5/#xhtml5
>>> According to the XML specification, XML processors are not 
>>> guaranteed to process the external DTD subset referenced in the 
>>> DOCTYPE. This means, for example, that using entities for characters 
>>> in XHTML documents is unsafe (except for <, >, &, " 
>>> and ').
>>
>> Can this be changed to something more along the lines of:
>> According to the XML specification, XML processors are not guaranteed 
>> to process the external DTD subset referenced in the DOCTYPE. This 
>> means, for example, that using named character entities for 
>> characters in XHTML documents is unsafe (except for <, >, 
>> &, " and '). When using XHTML, it is recommended that 
>> authors use the UTF-8 charset. Additionally, authors can use 
>> numerical or hexadecimal entities, for example use the numerical 
>> reference ™ to display the trademark symbol.
>
> Technically, I believe the things often called "numeric entities" are 
> called "numeric character references" in XML: 
> <http://www.w3.org/TR/REC-xml/#include-if-valid>. (I'm sure the editor 
> is aware of this, just mentioning it to spread knowledge of correct 
> terminology).
>
>>> For interoperability, authors are advised to avoid optional features 
>>> of XML.
>>
>> It's unclear to me what this sentence is actually trying to say here. 
>> This could mean a lot of things.
>> Perhaps one of the editors could explain what the reader is supposed 
>> to take from this sentence.
>
> It's saying that although XML allows documents to use some optional 
> features, such as references to external entities, it's probably not a 
> good idea for public web content to rely on it, since not all XML 
> processors will support the optional features. This is with reference 
> to the previous sentence, and suggests that authors (at least of 
> public Web content) should not use entities such as &auml; or &mdash;.
>
> Regards,
> Maciej

So the original paragraph from From: 
http://www.w3.org/html/wg/html5/#xhtml5  was:
> According to the XML specification, XML processors are not guaranteed 
> to process the external DTD subset referenced in the DOCTYPE. This 
> means, for example, that using entities for characters in XHTML 
> documents is unsafe (except for &lt;, &gt;, &amp;, &quot; and &apos;). 
> For interoperability, authors are advised to avoid optional features 
> of XML.

I think that we need to rewrite the paragraph. We can change it to 
something more like:
[[
According to the XML specification, XML processors are not guaranteed to 
process the external DTD subset referenced in the DOCTYPE. This means, 
for example, that using numeric character references for characters in 
XHTML documents is unsafe (except for &lt;, &gt;, &amp;, &quot; and 
&apos;). This means that authors (at least of public Web content) should 
not use entities such as &auml; or &mdash;. When using XHTML, it is 
recommended that authors use the UTF-8 charset which eliminates the need 
for most character references. Additionally, authors have the option of 
using numeric or hexadecimal character references if they feel the need, 
for example an author wishing to display the trademark symbol with the 
named character reference &trade; can use the numeric character 
reference &#8482; instead.
]]

Let's see what other people and the editors think about that.

regards,
Dean Edridge

Received on Saturday, 1 December 2007 14:51:46 UTC