Re: Question on the boundaries of content negotiation in the context of the Web of Data from Dan Brickley on 2009-02-12 (www-tag@w3.org from February 2009)

From: Dan Brickley <danbri@danbri.org>
Date: Thu, 12 Feb 2009 13:52:53 +0100
To: Michael Hausenblas <michael.hausenblas@deri.org>
Cc: www-tag@w3.org
Message-ID: <49941BA5.5000905@danbri.org>
On 12/2/09 13:29, Michael Hausenblas wrote:
>
> Dear TAG members, dear subscribers,
>
> I would like to ask you about your opinion on the following scenario. Please
> note that (1) though I'm a member of the W3C Media Fragments WG I speak only
> for myself, and (2) that all URIs used in the following are dereferenceable
> and made out of 100% recycled electrons.
>
> Given three URIs, namely,
>
> <http://sw-app.org/sandbox/house>
>
> <http://sw-app.org/sandbox/house.png>
>
> <http://sw-app.org/sandbox/house.ttl>
>
> is it 'allowed' (that is, does it break the Web architecture) if one does
> the following:
>
> $curl -I -H "Accept: image/png" http://sw-app.org/sandbox/house
> HTTP/1.1 200 OK
> Date: Thu, 12 Feb 2009 12:12:39 GMT
> Server: Apache/2.2.3 (CentOS)
> Content-Location: house.png
> Vary: negotiate,accept
> TCN: choice
> Last-Modified: Thu, 12 Feb 2009 11:54:07 GMT
> ETag: "5c0fd-2deb-462b760a7f5c0;462b77ce8a040"
> Accept-Ranges: bytes
> Content-Length: 11755
> Connection: close
> Content-Type: image/png
>
> $ curl -I -H "Accept: text/turtle" http://sw-app.org/sandbox/house
> HTTP/1.1 200 OK
> Date: Thu, 12 Feb 2009 12:13:01 GMT
> Server: Apache/2.2.3 (CentOS)
> Content-Location: house.ttl
> Vary: negotiate,accept
> TCN: choice
> Last-Modified: Thu, 12 Feb 2009 11:54:06 GMT
> ETag: "5c0fc-173-462b76098b380;462b77ce8a040"
> Accept-Ranges: bytes
> Content-Length: 371
> Connection: close
> Content-Type: text/turtle
>
> Please note that I don't ask if this works. It does. Obviously. The
> question, to put it in other words, is: is the PNG *representation* derived
> via conneg from the generic resource<http://sw-app.org/sandbox/house>
> equivalent to the RDF in Turtle?
>
> If not, why not? If it is, can you please point me to a finding, note, a
> specification, etc. that 'normatively' defines what 'equivalency' really is?

I'm very interested in the answers here. TimBL has often said (in 
hallway or IRC conversation) that some pair of representations are "too 
different", but I have never really got to the underlying intuitive 
principle.

It may be that we are balancing several things here, and there can 
really be no single principle:

1. intuitions
  - that common causal historical chain is important
  - derrivation (one representation from the other, or from shared 
source) is important

2. user agent behaviour is important; we should consider issues around 
usability and bookmarkability, interactions with language negotiation...

3. role of reformatted content eg for mobile consumption, or accessibility.

4. format granularity issues.
Media types are crude labels for complex things. Modern bitmaps 
encapsulate metadata (eg. PNG meta fields, PDF has XMP which includes an 
RDF subset, ... MP3s have ID3 ...). Modern markup has encapsulated 
metadata (eg. SVG includes RDF XML or RDFa, ditto Atom). And metadata 
can encapsulate content (eg. data: image URIs, SVG paths, XML Literals 
in RDF). Anything can go inside anything, without even looking to .zip 
or widget packaging scenarios.


And of course http-range-14 is lurking there in the background, as is 
the FRBR conceptualisation scheme from the Library community: If Hamlet 
is a Work, ... can we use HTTP to get media-typed representations of 
"Items" that exemplify the concrete "Manifestations" which embody each 
"Expression" of that over-arching "Work"? (and see the FRBR literature 
for a parallel discussion about when two things count as different 
Works, and how this varies with professional field and surrounding 
context).

Should http://hamlet.example.com/first_folio_expression be something 
that has different manifestations at different URIs, eg. 
http://hamlet.example.com/first_folio_expression/ or more realistically,

http://etext.virginia.edu/etcbin/toccer-new2?id=ShaHamF.sgm&images=images/modeng&data=/texts/english/modeng/parsed&tag=public&part=all 


My take is that webarch simply can't give us the answer here. We 
standards nerds don't get to decide what the right partitioning is of 
such content onto URIs. It might be that the above URI could 
content-negotiate to an MP3 or MPEG version. Or an abbreviated MP3 
version. Or a textual summary of the MP3. Or not. All we can do is try 
to document the various tradeoffs...

cheers,

Dan
Received on Thursday, 12 February 2009 12:53:33 UTC