Re: Checkpoint 2.2: Proposal for revised provision regarding XML/SGML content

At 01:23 PM 2002-09-28, Ian B. Jacobs wrote:
>Dear UAWG,
>
>This is a proposal to fix a provision of checkpoint 2.2
>(text view) in UAAG 1.0. Al, I'm addressing you explicitly
>to ask whether the proposal is properly worded.

Does this situation perhaps fit your concept of "normative inclusions"?

In other words, the proposal quoted below reads as though it defines what
is a text format for the purposes of this checkpoint, although it is probably
safer to say that these two conditions define normative inclusions.  Any
media object satisfying either of these conditions MUST be treated as a text
object for the purposes of this checkpoint.  But it is not true that all
media objects the User Agent SHOULD treat as a text media object meet these
criteria.  For example, a media object of type message/rfc-822 should be
viewable as text.  I don't think we need to put that in as a normative
inclusion, but I think it better to state the two conditions as given as
normative inclusions (clarifying, but not exhausting, the cases to be
included).

I would also add

   c) [In the absence of authoritative MIME-type information to the contrary,]
any media object identified as XML or SGML by a DOCTYPE indication conforming
to the rules of those formats.

It is overly onerous to check a media object for text-ness in general.  But
it is not overly onerous to check the head of a file for a legal DOCTYPE
indication.

Note that User Agents receive media objects in many ways.  The HTTP protocol
is somewhat reliable as regards providing MIME-type information for the data
so delivered.  The file: and ftp: URI schemes, on the other hand, do not
provide this kind of information.  In cases like these, using the built-in
markings of XML and SGML objects is appropriate and should be expected.

Al

PS:

Yes, I think that the group over-reacted before and that this approach
should remove the developer concern while better preserving the substance of
the provision.

PPS: The bit about 'in the absence of authoritative information to the
contrary' could be omitted above.  It's pretty picky.  But User Agents that
receive data identified in the HTTP headers as of type
application/octet-stream, for example, should *not* be *required* to check
the contents at all, even if there is a DOCTYPE inside the file and the
whole thing is validatable XML or SGML.  There is a principle related to
this in the working papers [findings] of the TAG.  I still think that User
Agents should not be discouraged from offering a text view of anything that
may make some sense in that view.  On the other hand, they should not be
required to do so when there is strong evidence against this expectation.

>In the Candidate Recommendation, the checkpoint included:
>
>   "all SGML and XML applications, regardless of
>    Internet media type (e.g., HTML 4.01, XHTML 1.1, SMIL, SVG,
>    etc.)."
>
>Some developers during CR indicated that without Internet media type 
>information, one cannot reliably sniff content to determine that a piece 
>of content is an instance of an XML application. So in the last call 
>draft, we deleted this provision.
>
>Steven Pemberton's last call comments [1] include:
>
>   "I would beef up this definition to at least include XML."
>
>We resolved for issue 548 [3] to include the media type
>application/xhtml+xml in this checkpoint. But upon reflection,
>I think that's the wrong solution.
>
>I think that my proposal to delete the XML provision was the wrong 
>solution to the problem the developers raised in CR.
>
>I now understand the ambiguity in the phrase "any XML application, 
>regardless of Internet media type". One could interpret this
>to mean "the user agent has to figure out whether this is XML content, 
>whatever the media type." I think this ambiguity led
>to the comments in CR; I have not confirmed with the reviewers.
>
>I believe the intention of the second provision was to
>provide a text view for content identified by media type as
>XML and SGML content, even if UAAG 1.0 did not identify
>all such media types (hence "regardless of media type").
>
>I would like to reinstate a revised provision as a response
>to Steven Pemberton's comments.
>
><PROPOSAL>
>For the purposes of this checkpoint, a text format is either:
>
>    1) any media object given an Internet media type of
>      "text" (e.g., "text/plain", "text/html", or "text/*"),
>       as defined in RFC 2046 [RFC2046], section 4.1., or
>
>    2) any media object identified by Internet media type to
>       be an instance of an XML or SGML application.
>
>   Note: As an example, RFC 3023 [2] defines some Internet media
>   types for XML content.
></PROPOSAL>
>
>I think this proposal:
>
>  a) Would satisfy Steven Pemberton's suggestion to "beef up" the
>     checkpoint.
>
>  b) Is in line with our original intent (pre-last call) to provide
>     a text view of XML content. We deleted that requirement thinking
>     it was technically unsound, but that discussion was based on
>     misunderstanding.
>
>Thank you,
>
>  - Ian
>
>[1] http://lists.w3.org/Archives/Public/w3c-wai-ua/2002JulSep/0149
>[2] http://www.ietf.org/rfc/rfc3023.txt
>[3] http://www.w3.org/WAI/UA/issues/issues-linear-lc4.html#548
>
>--
>Ian Jacobs (ij@w3.org)   http://www.w3.org/People/Jacobs
>Tel:                     +1 718 260-9447

Received on Saturday, 28 September 2002 20:44:55 UTC