RE: Use cases from Jonathan from Booth, David (HP Software - Boston) on 2008-05-06 (public-awwsw@w3.org from May 2008)

From: Booth, David (HP Software - Boston) <dbooth@hp.com>
Date: Tue, 6 May 2008 20:03:56 +0000
To: Jonathan Rees <jar@creativecommons.org>
CC: "public-awwsw@w3.org" <public-awwsw@w3.org>
Message-ID: <184112FE564ADF4F8F9C3FA01AE50009FCF2395380@G1W0486.americas.hpqcorp.net>
> From: Jonathan Rees [mailto:jar@creativecommons.org]
>
> On Apr 29, 2008, at 1:11 AM, Booth, David (HP Software -
> Boston) wrote:
> >> From: Jonathan Rees [mailto:jar@creativecommons.org]
> >> [ . . . ]
> >> Here are some of the cases I'm thinking of:
> >
> > I'll first just answer these in prose, since I can do that more
> > quickly than in N3.  I'll be using ftrr:IR definition of
> > "information resource" at
> > http://lists.w3.org/Archives/Public/public-awwsw/2008Apr/0046.html
> >
> >>
> >> 1. I have two URIs X and Y. By varying Accept-Language I
> >> learn that I
> >> can retrieve French and Spanish variants of something via
> >> URI X, and
> >> Spanish and German variants of something via URI Y. The Spanish
> >> variants retrieved via X and Y are the same. All responses
> >> are 200s.
> >>
> >>         - Is it possible that X and Y denote the same thing?
> >
> > Yes, X and Y can denote the exact same ftrr:IR.  Bear in mind that
> > one of the parameters to an ftrr:IR function is the request, and
> > the request includes the URI of the resource whose representation
> > is being requested, hence there is no way to use deferencing alone
> > to reliably establish that two URIs denote different ftrr:IRs.
> > This reflects the actual capabilities of Web servers.
>
> OK. I would say this answer disagrees with the orthodox view that IRs
> are abstract documents or abstract information (see below). By
> definition of "abstract" an abstract document doesn't know how it is
> represented; choices of representation are a deployment detail. This
> is not just my interpretation; it agrees, as far as I can discern,
> with the answer Tim gave to a similar question of mine recently (in
> Vancouver?).

I don't know exactly what "orthodox view" you mean and I wasn't there so I do not know what was said.  Do you have a reference?

By "document" it sounds like you are talking about something that, at any particular time t, is a finite chunk of abstract information, and the returned awww:representations possible at that time may depend on conneg, but they still are limited to conveying a subset of that same finite information, as described here:
http://lists.w3.org/Archives/Public/www-tag/2008Apr/0169.html

However, the example that you gave does not stipulate that the returned awww:representation depends *only* on conneg.  Hence, we must assume that it could depend on other request inputs, such as the request URI or cookies.  We know that conneg is for conveying subsets of the same information, but we cannot make that assumption about other request inputs such as the request URI and cookies: different values for those inputs could produce radically different information that depends on those other inputs in arbitrarily complex ways.

In the example that you gave, if the Accept-Language was the *only* input that was changed, and the awww:representations were retrieved at the same time, and conneg was used in a manner faithful to its intent, then you could indeed conclude that X and Y must not denote the same IR.

>
> >>         - Is it possible that X and Y do *not* denote the
> >> same thing?
> >>             (assuming that responses are known to be time
> >> invariant.)
> >
> > Yes.  Since the set of <Time, Request, Representation> tuples that
> > make up a ftrr:IR can be infinite, dereferencing two URIs cannot
> > definitively establish that they denote the same ftrr:IR.  They
> > could denote ftrr:IRs that differ in <Time, Request,
> > Representation> tuples that your dereferencing did not test.
> >
> >>         - Is it necessary that X and Y do not denote the
> >> same thing?
> >
> > No, as explained above.
>
> Let me refine the scenario to better fit my intent to your
> definition. I meant to imply that we might have applied the same
> request at the same time to the two different URIs, and gotten
> different responses (say, a request with accept-language of French
> yielding a French response in one case, Spanish in the other because
> the requests via the second URI yield no French responses).

That was the way I understood the scenario.

> In this
> case it would be *necessary* that X and Y do not denote the same
> thing, by your definition.

Unfortunately, no, because the responses can also depend on the request URI, cookies and anything else that is in the request.

>
> In other words: two ftrr:IRs can differ even if they carry identical
> abstract information.  This IR = variable information idea is what I
> thought was Tim's model, and AWWW's, and Pat's, and what I was trying
> to capture in my diagram.

I don't know what you mean by "carry identical abstract information".

>
> I'm not advocating for or against the information-carrying view, and
> I certainly respect the definiteness of ftrr:IR. But it needs to be
> clear that there are two different notions here, and that the
> difference is consequential.

If by "information-carrying view" you mean essentially that an IR is essentially a function with time and conneg parameters as input:

  fIcv: Time x ConnegParams --> Representation

or equivalently in its "curried" form:

  fIcvC: Time --> (ConnegParams --> Representations)

then yes, that is a different notion than ftrr:IR because it omits other request inputs such as request URI, cookies and other headers.

>
> >> 2. Suppose that the values I retrieve (in different languages, say)
> >> via a URI X say contradictory things - for example, one says that
> >> Rome is the capital of Italy, and another says that Paris is the
> >> capital of Italy.
> >>
> >>         - Does X denote an information resource, given that the
> >> values cannot both be representations of the same information?
> >
> > Yes, X still denotes an ftrr:IR, even though it apparently violated
> > the AWWW principle that each language-specific representation
> > carries the same abstract information.  Bear in mind that on the
> > Web, anyone can say anything about anything, including making
> > statements that are false.  In this case, the abstract information
> > that was carried was the assertion that both Paris and Rome are the
> > capital of Italy.  The assertion happens to be false, but its
> > falsity does not change the fact that X denotes an ftrr:IR.
>
> So you are saying that the idea that IRs are related to information
> is not part of the definition of IR (as it is in AWWW), but rather
> just some sort of good practices recommendation.

Sort of.  If a 200 response implies that the URI denotes an IR, then it denotes an IR, regardless of what awww:representations are returned.

However, if you fix the time and all request inputs other than the conneg parameters, then the result *can* be viewed as an abstract chunk of information.  In other words, if you wish to factor the conneg parameters out of ftrr:IR, then you could view ftrr:IR as a function fConneg:

  fConneg: Time x OtherRequestInputs x ConnegParams --> Representation

or equivalently in a curried form:

  fConnegC: Time x OtherRequestInputs --> AbstractInformation

where AbstractInformation is a function from ConnegParams to Representations, and OtherRequestInputs is all request inputs other than the conneg parameters.

> That's fine with
> me, but the problem of defining when these good practices are being
> followed would remain, and any model of good practice would induce a
> subclass of ftrr:IR consisting of those ftrr:IRs compatible with
> these good practices. We'd still have the question of whether we want
> to do a model of good practice, or give up.

I think it makes sense to model both generic IRs and IRs that follow the intent of conneg, just as it is useful to model other subclasses of IRs, such as:

  ftrr:TimeInvariantIR rdfs:subClassOf ftrr:IR;
    rdf:comment """An ftrr:IR whose representations do not depend on time,
        though they may depend on content negotiation or other request inputs.""" .

  ftrr:LanguageInvariantIR rdfs:subClassOf ftrr:IR;
    rdf:comment """An ftrr:IR whose representations do not depend on
        Accept-Language, though they may depend on time or other request inputs.""" .

  ftrr:MediaInvariantIR rdfs:subClassOf ftrr:IR;
    rdf:comment """An ftrr:IR whose representations do not depend on the
        Accept header specified in the request.""" .

  ftrr:AgentInvariantIR rdfs:subClassOf ftrr:IR;
    rdf:comment """An ftrr:IR whose representations do not depend on the
        User-Agent header specified in the request.""" .

  ftrr:ConnegOnlyIR rdfs:subClassOf ftrr:IR;
    rdf:comment "An ftrr:IR whose representations convey subsets of the
        same finite information and do not depend on any
        aspect of the request other than the following HTTP 1.1 content
        negotiation headers: Accept, Accept-Charset, Accept-Encoding,
        Accept-Language and User-Agent.
        See http://www.w3.org/Protocols/rfc2616/rfc2616-sec12.html#sec12.1 .""" .

Of course, these are described in HTTP 1.1 specific terms, so it may be better to come up with more generic phrasings of these descriptions.  These are in the spirit of notions like FixedResource that TimBL defined:
http://www.w3.org/2006/gen/ont.n3

However, I should note that content negotiation as defined in
http://www.w3.org/Protocols/rfc2616/rfc2616-sec12.html#sec12.1
can actually depend on *any* part of the request:
[[
However, an origin server is not limited to these dimensions and MAY vary the response based on any aspect of the request, including information outside the request-header fields or within extension header fields not defined by this specification.
]]
Thus, the description of ftrr:ConnegOnlyIR is really for a restricted form of conneg that depends only on the specific headers listed.  If we take an unrestricted view of conneg (that it may depend on *anything* in the request) then this function:

  fConneg: Time x OtherRequestInputs x ConnegParams --> Representation

collapses to this function:

  fConnegOnly: Time x ConnegParams --> Representation

which of course is equivalent to:

  f: Time x Request --> Representation

which is how I defined ftrr:IR in
http://lists.w3.org/Archives/Public/public-awwsw/2008Apr/0046.html

In other words, if we take an unrestricted view of conneg, then the resulting representation can depend on the request input in arbitrary ways, so it would be a stretch to say that all of the possible representations at a fixed time somehow convey the same abstract information.  I think it is far more descriptive to say that the URI denotes the same information *source*, rather than saying it denotes the same *information*.

>
> It sounds like in your model that given a collection of simultaneous
> responses, there is no way to detect violation of the must-carry-the-
> same-information principle, since by plausible deniability the server
> can always claim that that each response is a subset of the
> (inconsistent) union of all the information in all the responses.

Right, for an ftrr:IR this is true.  But for an ftrr:ConnegOnlyIR one could detect a violation, though as I note ftrr:ConnegOnlyIR as described above uses a restricted view of conneg.  Using the general definition of conneg in
http://www.w3.org/Protocols/rfc2616/rfc2616-sec12.html#sec12.1
the resulting representation can depend on *any* part of the input, which, as you point out, would mean that there would be no way to detect a violation of the must-carry-the-same-information principle.

> This makes "carry the same information" almost tautologous and
> therefore a bit out of the spirit of conneg. (I say "almost" because
> there may be other ways to find inconsistencies, e.g. by reading and
> believing metadata about the IR.) I don't think this could have been
> what was intended.

Right, I think the "carry the same information" notion only applies to ftrr:ConnegOnlyIRs at a fixed time -- not to general ftrr:IRs.

>
> (I can't find this principle in AWWW by the way - can you point me to
> the correct passage? Maybe you're thinking RFC2616?)

Hmm, no I don't exactly see it either.  It seems to be implied but never stated outright.  RFC2616 uses the phrase "best available":
http://www.w3.org/Protocols/rfc2616/rfc2616-sec12.html

>
> >>         - If so, does it denote a "bad" information resource?
> >
> > I'm not sure what you mean by bad.  The ftrr:IR is perfectly fine
> > as an ftrr:IR, but it happens to convey false information.
>
> OK.  "bad" = does not follow good practice recommendations, as above.

It would be violating the intent of conneg, so in that sense it would be bad.

>
> >>         - If not, what does it denote, if anything?
> >>         - Assume unchanging whatever if necessary in order to
> >> make these questions nontrivial.
> >>
> >> 3. Suppose I set up a web server responding to requests
> >> for some URI X as follows:
> >>         - A URI for an IR on the web is chosen at random and
> >> a value is fetched using that URI
> >
> > Let's call this second URI Z, and assume Z != X.
> >
> >>         - The value is returned as the payload of a 200 response
> >
> > Okay, so when X is dereferenced, a representation from Z is
> > returned in a 200 response, right?
>
> That one time, yes. On subsequent requests it would be Z2, Z3, ...
> >
> >> Questions:
> >>         - Does X denote an information resource?
> >
> > Yes.  The dereference of X resulted in a 200 response, therefore X
> > denotes an ftrr:IR.
> >
> >>         - If so, what information do its referent's
> >> representations represent?
> >
> > A random representation chosen from the Web.
>
> No, that's what the representation *is*, not what it *represents*.
> But representation (in any sense other than the trivial got-a-
> response sense) has disappeared from your account, as has
> information, so the question is moot. Nothing wrong with that, just
> different from the other model(s).
> >
> > It sounds like you may be trying to view multiple representations
> > as (perhaps lossy) encodings of some abstract information.  That
> > view only applies to content negotiation, which is only *one*
> > possible use of the Request parameter of an ftrr:IR function:
> >
> >   f: Time x Request --> Representation
>
> "Some abstract information" - yes, that's just what I thought the IR
> idea was supposed to capture. According to what I thought was the
> orthodox view, the notion that a response represents something else
> applies always, not just when one chooses between representations. To
> have a 200 response that does not represent something is considered
> not "good practice" and a threat to accessibility (multiple
> languages, formats, sensory modes).

Right, it may not be good practice, but if the rule is "200 => IR" then the thing is still an IR even if it does not follow good practice.

> This is why you're never supposed
> to give an IR a URI ending with .html.  (Whether the information-is-
> abstract constraint is part of any protocol, or could be verified in
> an audit, is a different question.) (Again, I'm neutral on this point
> of view, just reporting that I have heard it.)
>
> Dividing IRs into those "subject to content negotiation", as RFC2616
> does, and those that aren't would be an interesting way to go.
> Certain good practices might apply to one set that don't to the
> other. I hadn't thought of that. Maybe this is like language and
> "representation" invariance from http://www.w3.org/DesignIssues/
> Generic.html .
>
> >>         - If not, what could X's referent be, if it has one?
> >> Is it a "bad" information resource, or something else?
> >
> > There is nothing wrong with it as an ftrr:IR, but whether you find
> > it useful is up to you.
>
> See above... I hear Tim as saying "don't do it". Useful-to-you is
> akin to saying that the web is self-correcting and doesn't need any
> good practice recommendations.

No, I do think good practice recommendations are needed.  I just meant that architecturally it is still an ftrr:IR, even if it violates good practice or is useless to most people.

> I think this is similar to what
> Xiaoshu has been saying. I am undecided.
> >
> >>         - Is the web site behaving within the limits
> >> specified by RFC2616 and/or AWWW?
> >
> > Yes, provided we assume that AWWW adopts the ftrr:IR definition of
> > "information resource".
>
> By AWWW I mean the 15 December 2004 recommendation. That particular
> version will never adopt anything, I hope.

Oh, right.  Yes, I agree.  :)

>
> My reading is that my randomly-responding URI can't denote an IR
> because it has no "essential characteristics" and/or bears no
> resemblance to anything that might be called "variable information"
> or "abstract information". If it does, then it makes all of these
> notions tautologous, and we might as well stop talking about them. I
> think that may be your suggestion.

Exactly.  *Except* in the more restricted case of conneg on language, media type, etc.



David Booth, Ph.D.
HP Software
+1 617 629 8881 office  |  dbooth@hp.com
http://www.hp.com/go/software

Opinions expressed herein are those of the author and do not represent the official views of HP unless explicitly stated otherwise.
Received on Tuesday, 6 May 2008 20:05:43 UTC