Minutes from 4 Feb 2002 TAG teleconference from Ian B. Jacobs on 2002-02-11 (www-tag@w3.org from February 2002)

From: Ian B. Jacobs <ij@w3.org>
Date: Mon, 11 Feb 2002 15:39:33 -0500
To: www-tag@w3.org
Message-ID: <3C682C05.6030209@w3.org>
TAG teleconference
4 Feb 2002

All present: Tim Berners-Lee (TBL, Chair), Tim Bray (TB),
Dan Connolly (DC), Paul Cotton (PC), Roy Fielding (RF),
Chris Lilley (CL), David Orchard (DO), Norm Walsh (NW),
Stuart Williams (SW), Ian Jacobs (IJ)

Previous meeting 28 Jan:
    http://lists.w3.org/Archives/Public/www-tag/2002Jan/0235

Next meeting:    12 Feb face-to-face
    Regrets: CL

See also IRC log:
    http://www.w3.org/2002/02/04-tagmem-irc

A summary of open action items may be found at the end of
this message.

---------------------
Agenda:

1) uriMediaType-9: Why does the Web use mime types
     and not URIs?
2) whenToUseGet-7: How to handle idempotent queries?
3) namespaceDocument-8: What should a namespace
     document look like?
4) Language bindings
5) nsMediaType-3: Relationship between media types
     and namespaces?
6) Determining charsets
7) On using formal models for TAG work
---------------------

--------------------------------------------------
1) uriMediaType-9: Why does the Web use mime types
     and not URIs?
http://www.w3.org/2001/tag/ilist#uriMediaType-9
--------------------------------------------------

DC: I don't know how we can contribute to life as we know it
by addressing this.

TBL: We could suggest that it would be good if mime types
became first-class objects.

TB: This issue has an IETF feel to me.

DC: I don't agree there's a problem. I agree that people
talk about this a lot.

RF: MIME types are resources. As long as you have a
well-established namespace, they become URIs whether people
like it or not. The people who control MIME type space don't
think they should be URIs.

The TAG observed that different specifications (e.g., RDDL,
Canonical XML) are using different conventions for making a
URI of a media type.

Resolved: Accept issue uriMediaType-9.

Action RF: Summarize current approaches for making a URI of
a media type.

----------------------------------------------------
2) uriMediaType-7: How to handle idempotent queries?
http://www.w3.org/2001/tag/ilist#whenToUseGet-7
----------------------------------------------------

In addition to the original question of the issue (When to
use GET?), the TAG added the question of how to handle
idempotent queries (with a new POST-like method? GET plus a
body?).

----------------------------------------------------
3) namespaceDocument-8: What should a namespace
     document look like?
http://www.w3.org/2001/tag/ilist#namespaceDocument-8
----------------------------------------------------

Resolved: Accept issue namespaceDocument-8.

----------------------------------------------------
4) Language bindings
----------------------------------------------------

On 24 Jan 2002, Jim Fuller sent a request [1] to the TAG to
consider the "issue of language binding": "Language binding
was explicitly dropped from XSLT 2.0, in the recognition
that a common approach was required across the W3C."

TBL: This is about API bindings.

NW: When we published XSLT 1.1, it included language
bindings (for how to do function calls from xslt). It
created a firestorm.  Nobody could agree that we should do
this (in the XSLT WG).  I have no confidence that this
should be done across all possible bindings.

CL: Most xslt implementations allow you to do this (via
extensions).

NW: Extension functions pose interoperability problems.

DC: I'm torn on this. I like how XSLT extensions work in
general (but for too many 404s). On the other hand, there
are a lot of places in w3c specs for APis; they use central
registries for tokens.

NW: I observe that http://exslt.org/ publishes some common
extension functions. Publish definitions. Implemented in
various XSLT processors. You can use 'function-available' to
find if they are available.

The TAG spent some debating whether W3C should standardize a
runtime library.

PC: The issue here is not whether there should be a standard
set of functionalities accessible from xslt, it is whether
XSLT (or another specification) should standardize the
extensibility mechanism that allows any extension function
to be used. This is a rat-hole since would have to work
across languages. You'd have to map datatypes across
languages, which is no easy task.

DC: The 'function-available' bit is what I'd want to look
at, if anything. (SAX has such a thing, using URIs; yeah!
DOM has one, that doesn't, last I looked. Boo.)

TBL: This is normally done on the platform, not where W3C
has normally been.

Resolved: No action.

[1] http://lists.w3.org/Archives/Public/www-tag/2002Jan/0194

--------------------------------------------------
5) nsMediaType-3: Relationship between media types
     and namespaces?
http://www.w3.org/2001/tag/ilist#nsMediaType-3
--------------------------------------------------

[Note from scribe: The minutes attempt to piece together
several parallel discussion threads on the telephone and
IRC. Some comments are not presented in the exact
chronological order they were made in order to preserve the
different threads.]

The TAG discussed the example in section D.2 [2] of the XSLT
1.0 Recommendation. The style sheet starts:

    <html xsl:version="1.0"
          xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
          lang="en"> ...

If this style sheet is fed to an HTML browser, the browser
may consider it to be HTML (due to <html> element) and,
while the document is not valid HTML, the browser might be
able to handle it. If fed to an XSLT processor, the result
will be entirely different: a generated HTML document.

Several participants pointed out that this template is
syntactic sugar: a simplification for:

    <xsl:stylesheet version="1.0"
           xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
           xmlns="http://www.w3.org/TR/xhtml1/strict">
<xsl:template match="/">
<html> ...

The TAG considered the question: Is this an HTML document or
an XSLT document?

TBL proposal: The namespace on the root element determines
subsequent behavior (i.e., the outermost piece rules).

TB: There are important exceptions to that rule.

NW: The above template essentially says "copy me, except
when you encounter elements in XSLT namespace." The problem
I have with calling this an xhtml document is that it
wouldn't validate as an xhtml document.

DO noted that the same issue would be relevant for other
namespaces (e.g., SOAP) on the root element.

DC: I'm trying to figure out whether we're designing what we
want or describing what's already there. It's clear what
happens when you hand a mixed document (such as the one in
the example) to an XSLT processor. It's also clear if you
hand the document to an HTML processor (though that's
outside the HTML specification). There are lots of ways to
handle this today (what this HTML browser does, or that XSLT
processor does). If we're playing the "describe what exists"
game, the architecture is: an XML document doesn't say what
its purpose is; you have to have a protocol and a document
before you know what the "meaning" is.  If we're playing the
"design what we want" game, I might agree that we want to be
able to just look at a document to see what it means.

DO: In this example, XSLT is using xhtml at the top as a
shorthand. I don't think you can use the top level element
as a guaranteed deciding factor for establishing what a
document means. The fact that there is XSLT in the document
says that it's an XSLT document.  What happens if we have
two vocabularies that both say that they want to be the
"first thing". What about XSLT and XQUERY in the same
document? They will argue over who is "more important." It
sounds like the author needs to specify what the top-level
processor should be.

TB: I agree in general that namespace dispatching is
appropriate and is better done contextually. But I think
that trying to make a strong statement about the root
element namespace may create more problems than it solves.
I can come up with scenarios where you might want to reach
into the middle of a document and do some things without
regard to context.  I don't want to send everything as
application/xml and doing everything based on namespaces.

PC: If you don't believe you should dispatch on namespaces,
what should you dispatch on?

TB: Media types if you can. It's more efficient for the
sender to tell the recipient what is being sent, when the
sender knows what the content is.

At this point, the following discussion on IRC diverged from
the discussion on the phone:

CL: This situation is a result of the architecture that we
have: a single document that has things "hanging off it." An
alternative might be to send a wrapper that says "here are
the pieces." As the "primary thing," the wrapper would
convey the "meaning; other documents would be derived from
it.

DC: How is a "wrapper" different from a document?

CL: A wrapper is like a manifest, there might not be a
single top document. We can have a Web, not a tree. A
wrapper doesn't get presented. its more like a zip file and
a table of contents.

DC: About the wrapper/manifest - it seems like the question
of RDDL v. XML Schema; I don't see any fundamental
difference.  Either one can point to the other.

The two conversations then rejoined on the subject of
packaging.

NW, DO: I agree with CL that this is a packaging issue.

TBL: The TAG should be able to tell people that they can
tell what a document is by looking at the bits in it and the
MIME type.

TBL: I'm worried by TB's statement that he will want to pull
out content from the middle of a document based on its
namespace.  I want to be able to say "I got the following
from so-and-so and I totally disagree with it."

TB scenarios:

   a) I want to build a table of the XLinks in a document.  I
   want to run through and look for content with xml:lang="ja"

   b) I want to be able to reach into a document and check for
   elements that have digital signature and check them.

CL: If you look at the template, it tells you that, after
transform, what processor should get the content.  You get
advance warning that N is the namespace you'll end up with
as the root.

DC scenario: A link-checker.

TBL: How will I ever send a package without you delving into
it and pulling something out and considering it a document?

NW: That's not enforceable.

The discussion then shifted to the topic of whether meaning
was based on author's intention or the meaning that the
recipient can gather from the content.

TBL: What decides when it's ok to look inside? What criteria
do you use?

NW: I (the recipient) decide. For instance, you may have
sent me something that I can't read (e.g., because it's in
Japanese), but I see that there's some SVG, so I look at the
picture.

IJ: This is model we have for accessibility: the author
proposes, the user disposes.

TBL: But you do it when you understand that something is a
package.

RF: Packaging is not relevant to this discussion.

DC: Packaging seems to be relevant because people keep
saying "packaging" when this comes up. I don't understand
the relevance either, Roy, but I can't dispute it.

DC then imposed a reality check.

DC: I get nervous if we are designing the world we want
since no software does this.

TBL: I'd like to look at a clean world, then look at the
real world and see why people are doing different things. In
some cases, the only difference is attitude.

DC: If we had code that had a common model, I'd be happy to
specify it.

SW: In general, is the MIME type redundant information? Can
it always be derived from the content (e.g., what about
mixed content that might validate in different ways)?

Reply: No, one cannot always derive a unique media type by
looking at the content.

RF: All content can be recognized as multiple types.

DO: This is an interesting question - how do you know which
pieces of content are targeting different processors? SOAP
solves this problem.

DC: I disagree, Dave. SOAP doesn't "solve" the
multiple-protocols issue. I can take a SOAP document and
look at it in emacs, which is not what the SOAP headers call
for.

TB: I stand by my claim that you can legitimately look
inside a document and ignore container elements (see TB's
scenarios above). I have spent a lot of time fighting for
generic markup: one great virtue of generic markup is that
it may be reused in ways author did not predict.

DC: This is the Principle of Least Power [3] in action.

RF: For the question "Does namespace always reflect media
type?", the answer is no. Can you inspect content and derive
a media type? No. Media type and namespace overlaps
structural content and the purpose of how author intends
content to be used.  You can have whatever top-level element
that you want. No matter how you look at document, can be
interpreted as being in different namespaces. The point of
the media type is to the convey author's intention, not
dictate how the user must interpret the message.

DC: There are many ways to look at document; I'm not sure
whether we are going to specify some preferred ones.

Discussion then shifted to the topic of the "meaning" of
content versus information about how to process it.

TBL: Some confusion here about what specs should say. There
is lots of talk about processing models. I think our job in
writing a specification is to say what a document type
means; not what you do with it.  Things get clearer when you
talk about what a document means rather than what to do with
it. For instance, commerce relies on knowing that something
is an invoice, whether you put it on the wall, trash it,
etc.

DC: It's not the bytes in the content that make something an
invoice, it's the context (message you sent me).

TBL: Format specifications should be written so that they
explain the meaning of a document if you get the bits and a
mime type.  Whether something is an invoice is based on a
human-understandable protocol. The meaning of a document is
independent of what you can do with it (see Axioms of Web
Architecture: the meaning of a document [4]).

DC: I totally disagree; the meaning of an invoice has
everything to do what what you're expected to do with it.

RF: It really sounds like these issues are wrapping
themselves around the general issue of what a media type is
on the Web.  We need to write down a philosophy in a livable
form.

SW: I'd question whether documents have multiple
interpretations and the meaning of a document can depend on
the interpretation it is subject to.

The TAG then reviewed a TB draft explaining TAG findings on
issues w3cMediaType-1, customMediaType-2, and nsMediaType-3.
Those findings are publicly available at:
      http://www.w3.org/2001/tag/2002/0129-mime

[2] http://www.w3.org/TR/xslt#data-example
[3] http://www.w3.org/DesignIssues/Principles#PLP
[4] http://www.w3.org/DesignIssues/Meaning

-----------------------
6) Determining charsets
-----------------------

DC: I verified that parameter names are local to each MIME
type.

TBL: Should we recommend that all xml mime types have a
charset parameter?

CL: I would object to assuming that everything has charset
despite what a given specification says.

Agenda item for next meeting: How do we resolve the problem
in processing where there is a reference to charset that may
not be defined?

--------------------------------------
7) On using formal models for TAG work
--------------------------------------

TBL: Should we set as our goal to use formal models where
appropriate (e.g., describe the mathematical relation
between HTTP requests and responses)? DC has used Larch [5]
to model HTTP.

RF: I think someone who wants to see a model happen should
write the model. With formal models, I usually run into a
problem with a need to make simplifying assumptions about
how the technology works. Usually, it's the things that make
technology difficult are the parts that you most need to
understand (and are difficult to model formally).  E.g.,
edge cases are added over time as people in a Working Group
realize that edge cases are tough to describe formally. And
if you don't understand the (messy) edge cases, you don't
understand the system.

TBL: Using a formal model lets you see the invariants of a
system.

PC: I'm a little concerned spending time on formal
definitions when our charter is for higher-level
descriptions of architecture. While I agree that formal
modeling is useful, I'd prefer to allocate that to a Working
Group.

DC: I'd be surprised if we did something that didn't lend
itself to formalization.

No decision.

[5] http://www.w3.org/XML/9711theory/

=============================
Summary of action items
=============================

Open:

TBL: Find out what kind of editing access to the Web site
will be available to TAG participants.

   Status: Not done, but TBL has been in contact with systems
   team.
   Assigned: 7 Jan 2002.

PC: Draft a response to Duane Nickull on www-tag with
recommendation to contact Web Services Architecture Working
Group.
   Assigned: 28 Jan 2002.

RF: Summarize different approaches currently used for
mapping URIs to media types.
   Assigned: 4 Feb 2002.

TB: Update draft findings on first three issues and make
public.
   Assigned: 4 Feb 2002.

Closed:

DC: Verify that parameter names are local to each MIME type.

   Assigned: 28 Jan 2002.
   Done: http://lists.w3.org/Archives/Public/www-tag/2002Feb/0012

Summary: "charset" is not defined across all media types.
   There are NO globally-meaningful parameters that apply to
   all media.

   Reference: RFC2045 http://www.ietf.org/rfc/rfc2045.txt


-- 
Ian Jacobs (ij@w3.org)   http://www.w3.org/People/Jacobs
Tel:                     +1 718 260-9447
Received on Monday, 11 February 2002 15:42:58 UTC