Re: Relation between markup and transport from Arjun Ray on 2001-07-03 (www-talk@w3.org from July to August 2001)

From: Arjun Ray <aray@q2.net>
Date: Mon, 2 Jul 2001 23:57:39 -0400 (EDT)
To: www-talk@w3.org
Message-ID: <Pine.LNX.4.21.0107022142460.2251-100000@info.q2.net>
On Mon, 2 Jul 2001, William F. Hammond wrote:
> Arjun Ray writes:
>> On Sun, 1 Jul 2001, Ian Hickson wrote:

>>> I'm still looking for a good reason to write websites in XHTML 
>>> _at the moment_, given that the majority of web browsers don't 
>>> grok XHTML.
> 
> 1. More reliable CSS styling.

I don't see how.  Or are you refering to a "quirks mode" being invoked
by mistake?
   
> 2. Namespace extensions.

A maguffin, frenetic W3C boosterism notwithstanding. 

> 3. Client-side XSL styling some day?

I'm reminded of The Great Leap Forward, where ordinary people were
going to have steel furnaces in their backyards.  Why use peashooters
when cannon are the kewlest things since sliced bread?  There's no
such thing as Economies of Scale, I tell you!!!

> To understand the sensible model consider two kinds of documents:
> 
> A. Oxford-TEI-Pizza-Chef-custom-brew-with-math under XML.  Serve as
>    "text/xml".  Browser provides tree portrayal if no external
>    application (with a general triage facility for all XML
>    applications) is specified.

Tree portrayal makes much more sense for application/xml than
text/xml.  Even so, I'm not sure what the point is here.  That
semantics are lost?  If so, that goes with the territory.

> B. Server-side translation of above as XHTML+MathML.  Serve as
>    "text/html".  New browsers can show school children at home and in
>    public libraries a properly typeset "1/2".  There is no
>    problem-level breakage in old browsers, and users begin to perceive
>    a need to upgrade.  

Actually, the perception of a need to upgrade is much more likely to
occur if and when something *does* break;)
 
>>> And even _then_, if the person in control of the content is using 
>>> XML tools and so on, they are almost certainly in control of the 
>>> website as well,
> 
> The hypothesis is seldom satisfied in large organizations where, for
> security reasons, distributed desktop platforms are not permitted to
> run HTTP servers.

In large organizations, the "person in control of the content" is the
server admin.  Assuming this functionary has at least two braincells
to rub together and is not in the event saddled with crippleware, I'd
say Ian's essential point still stands (that from the pov of the WWW,
failures in this context are organizational, not infrastructural.)

> Sniffing is not required.  Reading the first line of an http body is
> not sniffing.  

Yes it is.  It may not be only when the semantics of the content-type
require that a "first line" be examined by a compliant processor (to
decide among in-paradigm alternatives).  But even here, it is still
sniffing if a semantics-unaware agent (like the UNIX file program,
which works with magic strings of various kinds, or worse, Microsoft's
documented "approach" to HTTP bodies) were doing it.

> Attaching meaning to comments is sniffing.

Actually, it's worse than sniffing:)

>> Is the perceived lack of Content Negotiation the real problem here,
>> that we have to scrounge for workarounds?
> 
> No.  (I've never been impressed with http content negotiation as an
> idea.)

It hasn't been given a fair shake yet.  The fact remains that as long
as a client does not specifically advertise support in an Accept:
header, a server has nothing better than guesswork to decide what */*
meight really mean.  As it happens, server support for Content
Negotiation has existed for a number of years now.  We're still
waiting for the wowser brigade to catch up.

> The issue is about two models for the relation between markup and
> transport content type.

Sorry, I'm not following this.  Each content-type has its own
semantics.  It's generally understood that a HTTP agent uses a lookup
mechanism to associate HTTP bodies with appropriate semantically aware
processors.  The issue, if any, would have to do with the semantic
capabilities of each such processor.  Any similarity in treatment by
distinct processors differentiated by content-type would be strictly
coincidental.

For */xml, there are no semantics other than the parseability of the
document according to understood rules.  (That is, in a */xml entity,
we may parse out a name from between < and > signs, but simply the
XML-ness of this doesn't tell us what the name means or signifies.  
XML, like SGML, does not provide for semantic assertion of type.)
Nevertheless, sending out XHTML, or XHTML+MathML, or XHTML+FooML, or
XFoo+XBar, as */xml isn't controversial - yet.

For text/html, it is generally expected that names extracted from
within < and > signs have well-known significance most of the time.  
Several flavors of Tag Soup and a smidgen of ersatz SGML applications
share this vague set of almost-understandings.  The semantic assertion
of type may be inchoate, but it's there nonetheless: the magic is in
the "html" part of the content-type.  Right now, the trouble seems to
be the need to differentiate XHTML+(Foo)? from Tag Soup (the "default"
understanding) within the text/html rubric.  We are still looking
for a good answer (assuming one exists at all).

(Personally, I think the best answer may be a separate text/xhtml
content-type, that clients need to assert explicitly in Accept:
headers, so that servers can send out text/html only as a fallback.  
This would be a nobrainer if the W3C stopped stonewalling about the
true meaning of 'text/html'.)

> The other model makes both "text/html" (for tag soup) and
> "text/xml" (for general XML) the exclusive domain of mass-market
> user agents 

How so?  I'm sorry, I'm missing something.

> and forecloses the possibility of external handling of XML
> document types transported by a mass market user agent that are
> not rendered (or processed) well by the agent when served under
> the transport type "text/xml".

If you're committed to fallbacks only, how could you expect to do
better?


Arjun
Received on Monday, 2 July 2001 23:42:11 UTC