Comments on 01 Dec. Draft of "Authoritative Metadata" from noah_mendelsohn@us.ibm.com on 2005-12-03 (www-tag@w3.org from December 2005)

From: <noah_mendelsohn@us.ibm.com>
Date: Fri, 2 Dec 2005 20:03:26 -0500
To: www-tag@w3.org
Message-ID: <OF5052D0C5.4D6AEEB9-ON852570CC.00014E16-852570CC.0005CF1C@lotus.com>
I see that Roy has posted the promised update of "Authoritative Metadata" 
at [1].  Mostly I think it's spot on and ready to publish as a finding.  I 
do have several suggests/comments/questions that I hope we can consider 
before it goes out.   I expect it's clear which are significant and which 
are nits.

* (1 Summary of Key Points) says "Specifications SHOULD NOT work against 
the Web architecture by requiring or suggesting that a recipient override 
authoritatve metadata without user consent."  Question:  am I overriding 
the authoritative metadata if I treat the received representation as a 
supertype of that specified in the metadata?  For example, is it against 
Web architecture for me to process application/xml with a tool that 
understands UTF-8 (if that is the encoding) but not specifically XML?  I 
would assume that is allowed. 

* I was really pleased to see the change in Section 2 "Representation 
metadata does not constrain the receiving agent to process the 
representation data in one particular way. What it does is allow the 
sender of a representation to express its intentions regarding how the 
data should be interpreted by a recipient. ", but then I was somewhat 
disappointed to see in section 3.1 "The media type determines the default 
processing model used with a representation, including such issues as 
whether the data should be displayed, stored, or executed, and what 
handler should be dispatched for that purpose. "   Wouldn't it be more 
appropriate to say that "A media type licenses normative interpretations 
of the data, possibly including standard renderings, storage semantics, 
etc.  In practice, media types are thus usable for selecting handlers to 
implement such functions."  I'm not convinced that a media type needs to 
be documented in terms of a processing model. 

Section 3.1 goes on to say "A media type, therefore, is not simply an 
indication of data format; it also refers to a standardized processing 
model for that data format. In fact, many different media types share a 
single data format"  For the same reason, I suggest rewording along the 
lines of "A media type, therefore, is not simply an indication of data 
format; it also refers to a standardized interpretation for that data 
format. In fact, many different media types share a single data format. " 
(You might give a simple example, such as two formats which both allow the 
string "true", one of which treats it as a character string and another 
which interprets it as the alternative to "false".)  The same concern 
about "processing" vs. "interpretation" issue arises in later sections as 
well.

* From section 4: "In scenario 1, in terms of Web architecture, Stuart is 
innocent;"   That seems a bit strong.  Might it be better to say:  "Stuart 
has made a mistake, but he has not violated Web architecture; serving a 
document that appears to be text/html as text/plain is not prohibitied."  
I'm just having a bit of trouble applying the word "innocent" to someone 
who has grossly misconfigured his system.  I do understand that you said 
"in terms of WebArch", but that seems to make WebArch look a bit goofy. 
Not a big deal.

* "Data is "self-describing" if it includes enough information to allow 
two parties to establish a consistent interpretation without additional 
clues."  I've been doing a lot of thinking about self-describing data, and 
I think this is too strong.   In fact, I think getting the story on 
self-description right in the short space available in this finding is 
going to be too hard, because there are subtlties.  I'm hoping we'll 
decide to do a separate finding on self-describing documents.  What's 
worrying me is that you always need shared context.  In the case of an XML 
document, both parties need to know XML, probably the namespaces Rec. 
Unicode, some encoding such as UTF-8, and maybe more depending on the 
level of consistent interpretation you seek.  I think I could live with 
"Data is "self-describing" to the extent that it includes enough 
information to allow two parties to establish a consistent interpretation 
."  Anyway, I think this is an area where we should proceed carefully. 

* Section 6: "Answer: The TAG does not believe that author-specified 
overrides in representation data offer the proper solution to social 
problems such as interactions with server"   I'm not sure it's best for 
the finding to speak about the TAG here.   Perhaps we might say:  "Answer: 
This finding suggests that author-specified overrides in representation 
data offer the proper solution to social problems such as interactions 
with server"  or maybe  "Answer: author-specified overrides in 
representation data  SHOULD be used when they can be effective for solving 
 social problems such as interactions with server" .  Grammar question: 
don't all of these formulations suggest that an interaction with a server 
is a social problem?  Seems strange to me. 

* Section 7: if we decide to do a separate finding on self-describing 
documents, that should be listed under future work.

* The title page claims that This version: 
http://www.w3.org/2001/tag/doc/mime-respect-20051201 but that's currently 
a broken link.

Notwithstanding the long list of comments above, I do like this work a 
lot.  It's an excellent finding and I'll be very glad to see it published 
ASAP.

Noah

[1] http://www.w3.org/2001/tag/doc/mime-respect.html

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------
Received on Saturday, 3 December 2005 01:03:36 UTC