21 Comments on 9 December WebArch Draft from Sandro Hawke on 2004-02-13 (public-webarch-comments@w3.org from February 2004)

From: Sandro Hawke <sandro@w3.org>
Date: Fri, 13 Feb 2004 16:14:55 -0500
To: public-webarch-comments@w3.org
Message-Id: <200402132114.i1DLEtsL007983@roke.hawke.org>
This document has come a long way since I last read it.  Excellent.
Here are some comments on the first half or so.  Some are trivial,
some are more substantive.

Commenting on
http://www.w3.org/TR/2003/WD-webarch-20031209/

         -- sandro

===========================================
==  Comment 1, 1. Introduction:

      Identification. Each resource is identified by a URI. In this
      travel scenario, the resource is about the weather in Oaxaca and
      the URI...                       ^^^^^

This was jarring to read.  The text up to that point is simple and
direct, but suddenly here there's handwaving with "about."  What *is*
the resource identified by that URI?  Fortunately, in the picture you
answer this question.

Suggested text:

    Each resource is identified by a URI. In this travel scenario, the
    resource is a periodically-updated report on the weather in Oaxaca,
    and the URI ...

===========================================
==  Comment 2, 1. Introduction:

        The server responds with a representation that includes XHTML
        data and the Internet Media Type "application/xml+xhtml".   

In the graphic, you show the media type as text/html, which is
probably the better choice for simplicity's sake.

===========================================
==  Comment 3, 1.1.3 Principles, Constraints, and Good Practice

     This categorization is derived from Roy Fielding's work on
     "Representational State Transfer" [REST]. Authors of protocol
     specifications in particular should invest time in understanding
     the REST model and consider the role to which of its principles
     could guide their design: statelessness, clear assignment of
     roles to parties, uniform address space, and a limited, uniform
     set of verbs.   

The first sentence is fine, the second reads rather like a paid
product placement.  Is Fielding's thesis that much better than every
other work ever written on distributed systems design that it merits
strong recommendation in the section introducing labeling terms?   If
you want to save this text, put it in a Recommended Reading section.

===========================================
==  Comment 4, 1.2.1. Orthogonal Specifications:

    ... agents can interact with any identifier ...

That's ambiguous.  Replace "with" with "using" and I think you're
okay.  Otherwise it sounds rather like the identifier is one of the
parties doing something in an interaction.


===========================================
==  Comment 5, 1.2.1. Orthogonal Specifications:

     ... that the an image ...
              ^^^ 
                (typo)

===========================================
==  Comment 6, 1.2.2. Extensibility:

      The following applies to languages, in particular the
      specifications of data formats, of message formats, and
      URIs. Note: This document does not distinguish in any formal way
      the terms "format" and "language." Context has determined which
      term is used. 

I can't really parse the first sentence.  Maybe you mean something like:

      The data formats and (more generally) formal languages used
      in the bodies of messages and even in the text of URIs can be
      defined to have certain properties to promote evolution and
      interoperation.

===========================================
==  Comment 7, 1.2.4. Protocol-based Interoperability
 

     It is common for programmers working with the Web to write code
     that generates and parses these messages directly. It is less
     common, but not unusual, for end users to have direct exposure to
     these messages. This leads to the well-known "view source"
     effect, whereby users gain expertise in the workings of the
     systems by direct exposure to the underlying protocols. 

This seems out of place.  I get the point, but it's never summed up.
And I don't see how it belongs in 1.2 General Architecture Principles.

I think you mean:

     Good practice: design protocols and data formats which 
     people can view and reproduce with a minimum of special tools and
     effort. 

[ Ahhh, this is finally covered in Section 4.1; maybe a forward link? ]

and maybe:

     Good practice: user agents should allow user to look "inside" to
     see (and even manipulate) the protocol interactions the agent is
     performing on behalf of the user.


===========================================
==  Comment 8, 2. Identification (see also Comment 10 below)
      
     Parties who wish to communicate must agree upon a shared set of
     identifiers and on their meanings.

This is untrue for some reasonable meanings of "meaning", as Pat Hayes
has argued from time to time.  You could say instead:

     Parties who wish to communicate must agree on the practical
     effects of using certain identifiers.
or
     Parties who wish to communicate must agree upon a shared set of
     identifiers and (to a reasonable degree) on their meanings.

That is: some ambiguity of meaning is both reasonable and
unavoidable.  I don't think an unqualified "agree" normally means
"partially agree".

Does http://weather.example.com/oaxaca identify the weather report for
just Oaxaca or for the Oaxaca region?  When it starts to matter, you
can start to build a shared understanding of which it is.  But you
can't banish those ambiguities until you notice them.  There's also a
school of design where you choose not to banish them, even when you
see them, until you know they matter.


===========================================
==  Comment 9, 2.2. URI Ownership 

      ... the "uuid" scheme ...	  
and
      ... the "md5" scheme ...

but you don't give references.  They are not on IANA's list.  I pay
some attention, and I'm not aware of a stable specification for either
one.   The spec on DanC's list for UUID has long since expired; the
reference for MD5 is simply to a hypothetical use of it.

For uuid you could use urn:nid-5, but that's technically not a
"URI scheme":

   http://www.iana.org/assignments/urn-namespaces
   http://lists.research.netsol.com/pipermail/urn-nid/2002-July/000308.html

Maybe you can says "such as a possible 'UUID' scheme", etc, or you
could use WebDAV's unique-lock-token scheme.

===========================================
==  Comment 10, 2.3. URI Ambiguity (see also Comment 8 above)

     URI ambiguity should not be confused with ambiguity in natural
     language.

Well then use a different word!   Please!   Call it "URI Overloading"
and let "URI Ambiguity" be used for the unavoidable and quite
acceptable situation I talked about in my Comment 8.

"Overloading" seems to me the appropriate word.   Eg:


      (1) In programming languages, a feature that allows an object to
          have different meanings depending on its context. The term
          is used most often in reference to operators that can behave
          differently depending on the data type, or class, of the
          operands. For example, x+y can mean different things
          depending on whether x and y are simple integers or complex
          data structures.

	   -- http://www.webopedia.com/TERM/o/overloading.html

I suggest, then, that "URI overloading" is the bad practice of using
one URI to identify multiple things commonly known to be distinct and
useful to distinguish among, while "URI ambiguity" refers to the
fact that we can never communicate *exactly* what someone else means a
URI to identify.

===========================================
==  Comment 11, 2.3.1. URIs in other Roles:

     The fact that the URI serves other purposes in non-Web contexts
     does not lead to URI ambiguity. URI ambiguity arises [when] a URI
     is used to identify two different _Web_ resources.

What makes the example case okay is not that Nadia is not a "_Web_
resource".  She may well be.

There are two things that make it okay to refer to Nadia as
"mailto:nadia@example.com".

     1.  It's done in secret, in private.  People can redefine terms
         however they want in private.  It's probably a bad idea for
         the same reason defining private URI schemes is bad, of
         course; they leak out.    But until they do, it's basically
         okay. 

     2.  Their system is maintaining an implicit indirection between
         the mailbox and the person.   They're just abbreviating the
         screen prompt which should strictly say "Please enter the
         identifier for the mailbox of the person:" to "Please enter
         person id:".   That's just simplification for the human
         reader -- the programmer should be remembering the
         indirection in effect and when necessary (eg writing out the
         conference registration data in RDF to publish) using or
         documenting it explicitely.    Not a problem.

===========================================
==  Comment 12, 2.6. Fragment Identifiers:

      The fragment identifier of a URI allows indirect identification
      of a secondary resource by reference to a primary resource and
      additional information. The secondary resource may be some
      portion or subset ...

So if a URI contains a "#", then the "second resource identified by
the URI" is the same as the "resource identified by the URI."  That's
a rather odd use of the word "secondary".   

I would suggest instead that you:

     (1) Name the the portion of the URI up the the "#".  TimBL has
         called this the "racine", but I like "stem", "trunk", or
         maybe even "non-fragment portion".
     (2) Call the resource identified by a URI's stem the "stem
         resource", or something like that.

I imagine the awkward primary/secondary bit is probably left over
from the idea that "URI-References" were not URIs.

Actually, that's worth talking about.   The newly issue RDF
Recommendations still call then "URI References":

       A URI reference within an RDF graph (an RDF URI reference) is a
       Unicode string ...

 -- http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-Graph-URIref

Can you add some text saying why you've decided "URI References" are
now to be called "URIs", ... or something?

===========================================
==  Comment 13

References for OWL and RDF can now point to the Recs.

===========================================
==  Comment 14, 2.7.2. Assertion that Two URIs Identify the Same Resource

    Emerging Semantic Web technologies, including the "Web Ontology
    Language (OWL)" [OWL10], define RDF [RDF10] properties such as
    sameAs to assert that two URIs identify the same resource or
    functionalProperty to imply it.

I would add:

    Knowing two URIs identify the same resource does not, however,
    mean they are interchangeable.  For example, Oaxaca might have
    several government-run weather stations, and the measurements take
    from each of these might be available from both
    weather.example.org and weather.example.com.

    The first might call a particular station
        http://weather.example.org/stations/oaxaca#ws17a
    while the second calls it 
        http://weather.example.com/rdfdump?region=oaxaca&station=ws17a

    These two URIs would both identify the same resource, a certain
    collection of weather measuring equipment.  They are owl:sameAs
    each other.  But an attempt to dereference them might well produce
    different content produced by different organizations (probably
    based originally on the same government-supplied data), so a user
    agent which substituted one for the other would be serving its
    user poorly.

===========================================
==  Comment 15, 3.2. Messages and Representations (#def-representation)

      ... includes a representation of the state of the resource. A
      representation is an octet sequence ...	 

      ... electronic data about resource state ...

Does "state" really mean anything there?  Is there a difference
between "data about Ian" and "data about the state of Ian"?  Maybe
this could be clarified with:

   Note: the phrases "representation of the state of the resource" and
   "representation of the resource" mean essentially the same thing;
   the term "state" is sometimes used to help convey that resources
   and thus their representations often vary over time.


===========================================
==  Comment 16, 3.2. Messages and Representations (#def-representation)

      ... electronic data about resource state ...

In theory, not all computers are electronic.   How about just "data"?
Or "information".


===========================================
==  Comment 17, 3.5. Safe Interactions
     
       and sends a message composed of form data using the POST
       method. Note that this is not a safe interaction; 

So you're suggesting it's not safe to engage in e-commerce?  :-)

I know you've worked hard to find a nice, simple way to explain When
To Use GET, but I don't think this is it.  Asking a non-commital
question can be very dangerous.  For instance:

    GET http://books.example.org/
                      check-order-status?credit-card=1234567877653453

That's non-committal, but it's certainly not "safe".

Similarly, using POST isn't necessarily "unsafe"; it just means you're
making a statement instead of asking a question.

It's not quite as snappy as "GET IS SAFE!", but I suppose Ian could
get people to chant "Never Punish People For Asking Questions!".

===========================================
==  Comment 18, on 3.6. Representation Management

     Dirk clicks on the link in the email he receives and is surprised
     to see his browser display a page about auto insurance.  

That seems pretty implausible.  It's a working weather site to one
user and a working car-insurance-ad site to another?  Based in IP
address, or what?  Did his IP address get listed in a database of
entries like, "please deliver this ad to that guy, if he ever
visits"...?

This seems more common and likely to me:

     Since Nadia finds the Oaxaca weather site useful, she emails a
     review to her friend Dirk recommending that he check out
     'http://weather.example.com/oaxaca'. Dirk clicks on the link in
     the email he receives and is surprised to see his browser respond
     "404 Not Found".  Dirk confirms the URI with Nadia, who now has
     the same problem.  Eventually, they figure out the site has been
     reorganized, and the page Nadia recommended is now called
     "http://weather.example.com/newserv?loc=oaxaca".  Embarassed by
     this interaction, Nadia stops recommending the site people.

===========================================
==  Comment 19, on URI Persistence

      There are strong social expectations that once a URI identifies
      a particular resource, it should continue indefinitely to refer
      to that resource; this is called URI persistence.  

I think "URI persistence" is more about observable behavior than the
URI->Resource binding.  What matters most is what the page is useful
for -- that's what guides how people try to categorize and store and
annotate and generally talk about it.  A daily news page which stops
being updated is nearly as bad as one which goes 404.  It's the same
resource ("Joe's Life"), it's just the representations aren't as high
quality any more.  Same resource being identified; different
experience.  Uncool URI.

===========================================
==  Comment 20, 3.7. Future Directions for Interaction

"MLdonkey" appears to be just an application/client.  The actually
network/protocol which MLdonkey and many other app/clients implement is
called "eDonkey2000".  (http://www.edonkey2000.com/)


===========================================
==  Comment 21, 4.2.3. Extensibility

        2. "Must understand": The agent treats unrecognized markup
        as an error condition.

"markup"?  This isnt just about XML.  How about "syntactic constructs".
Received on Friday, 13 February 2004 16:12:34 UTC