Review of JAR's document from Harry Halpin on 2011-04-11 (public-awwsw@w3.org from April 2011)

From: Harry Halpin <hhalpin@w3.org>
Date: Mon, 11 Apr 2011 18:50:39 +0100 (BST)
To: "David Booth" <david@dbooth.org>
Cc: "Jonathan Rees" <jar@creativecommons.org>, "AWWSW TF" <public-awwsw@w3.org>
Message-ID: <5ba804919399b98244163e186677bcb7.squirrel@webmail-mit.w3.org>
Late, but better than never. Will try to make telecon tomorrow -
reviewing, although I started a few days ago and missed JAR's latest
changes:

http://www.w3.org/2001/tag/awwsw/issue57/latest/


The goal of this document should be to precisely define the problem,
perhaps iterate through a few possible solutions, and then finally settle
on a solution. I see we have just got to describing the problem and some
possible solutions. My big executive summary is:

- Add in IRW ontology or specialized vocabulary as one solution
- Add in the need for a Metadata protocol
- And I strongly support "application/rdf+xml" mimetype, and future RDF
mime-types, just meaning that this URI denotes whatever things the RDF
statements that use that URI accessible from the URI itself describe.
- I'm becoming more partial to a quoting mechanism to describe the named
graph, i.e. the document itself. Historically in languages like LISP
quotes mean "do not interpret", which is precisely what we want them to
mean here.
- Add in part about browser support in browser.

- 1. Introduction -

Upon first reading, you use the term "Peak XY". Why not just use a URI,
like http://www.example.org/PeakXY? I think the problem space should be
constrained to "If a user-agent is presented with a URI, how can that
agent determine the *intended* meaning of the URI." Right now, Section
reads like an introduction to a general theory of discovering the meaning
of symbols, which is difficult - albeit related - waters.

Given that caveat, I would notice that (before the "nature of definitions"
paragraph) "The primary ways terms agents in natural language determine
the meaning of a term is by its use in context. However, on the Web the
context in which a URI is presented can often be limited, and as to enable
interoperability between user-agents there should be a clear algorithm to
follow that lets the intended meaning of the URI be clear."

- intention of who?

- I am not entirely sure about this term "dereference". I would prefer the
term "access", as I think its a bit more obvious. Can you explain a bit
better the
difference between them? It seems when you access/dereference, you can
successfully use a HTTP code to retrieve an associated information
resource.


2. Glossary

- Put Glossary at end. Otherwise, I doubt anyone will get past it.

accessible via
    When a URI is dereferenceable, "the information resource accessible
via a URI" (abbreviated IR(that URI), see below) is the information
resource whose versions are the versions obtained by dereferencing
that URI.

definition:

The "information" could be prose, RDF, OWL, or some combination. ->
"The "information" could be human-readable prose in natural language,
machine-readable RDF, OWL, or some combination."

fixed information resource

I thought the entire point of this according to TimBL was that it was just
an information resource that *did* not change. I would merge this
definition with that of information resource, with fixed being just the
subset that is not intended to change, in particular over time.

term

A URI, word, name, or phrase that can serve in subject or object position
in a statement. -> To be pedantic, a URI can also serve as a predicate.
Just say "that can serve in a position that forms a statement. On the
Semantic Web, statements are RDF triples where a URI could be in the
subject, object, or predicate position.."

refer

    For the purposes of this report, reference is just one way to mean.
There may be other ways to mean other than to refer, but none are
specified here. -> This just confuses me a bit. I tried to present a
more coherent theory in my dissertation distinguishing between
meaning/reference, but you can also just state that "To refer to
something, a term should be understood by an agent as "standing-in"
for some object in the universe of discourse, where that object can be
separated from the term in space or time."

version (of an information resource)

This just confuses me. An information resource associated with another
one? So is anything linked a "version"? I know you've done some
deep-thinking on this Jonathan, but I'm not convinced by this definition
quite yet.

    A fixed information resource associated with an information resource
is a version of the information resource. -> "When an information
resource that is fixed as an octet-stream  but this resource  is
associated with another information resource that changes, the fixed
information resources can be considered versions of the original
changing information resource. For example, a version is  "snapshot"
of a changing information resource at a given time, or via forks, and
so on."


Use-cases

3 - General methods in current use.

3.1 Colocate definition and use: "Just collating definition and use is not
enough, as one of the features of URIs is that they can be removed from a
given context and then re-used in another one."

3.2 Link to documents containing definitions

One could say "Link to a URI with the definition using a special kind of
link", as I think you want to separate linking from just having the
definition accessible from the URI."

3.3 Register a URI scheme or URN namespace

I think the answer to this should be a strong "No" and should be
discouraged, rather than heavily described as currently is. I feel too
much space is used on this example.

3.4 Use the LSID getMetadata() method

I understand why this is in here, but again, I'd say discourage it.

3.5 'Hash URI'

You might want to add "Combined with content negotiation, which determines
the  media-type, there could be a problem where the hash URI is therefore
context dependent. So a hash URI for "http://example/sale#p16" could mean
a segment of a document (paragraph 16) if "text/html" was returned, and
could mean a  resource describing a canoe if "application/rdf+xml" was
returned. This is obviously problematic, but seems to be ignored by the
RDF community so far in practice." You might want to add this to the
"Critique" bit of Section 4.


3.6 'Hashless URI' with HTTP 303 See Other redirect

I'm going to point out yet another giant whole in the 303 story. How do
you get "back" from a URI pointed to by 303. See my comment to 4.6

4.1 "Fragment identifiers are fragile" -> "fragment identifiers are
context-dependent"

See above at 3.5.

4.4 303 is difficult, sometimes impossible, to deploy

As the person who originally brought this up (you might want to cite my
email by
URI), this is a total mess for people to deploy unless they are using tools
or comfortable using .htaccess. Also, some server software does not support
.htaccess, and many people do not have access to edit their servers .htaccess
files.

Another problem is connecting the document URI back to the URI about the
"resource". So when one uses http://example/p16 one gets redirected to
http://example/about-p16. However, how does one go BACK from 
http://example/about-p16 to  http://example/p16? One could imagine a
back-link (we provide this type in IRW), but it's not clearly part of the
status-code and there's no natural back-link.

On a referential leve, I'm just going to point out that the reason that
the use of the 303 status code can not possibly tell us that the resource
redirected from was used for referring, arises because the 303 status code
was specified before the advent of the Semantic Web. As an HTTP response,
there is no reason why it can't be used to simply to redirect from one
information resource to another ´information resource, and in fact that
can and is done.

As put by RFC 1738 "this method exists primarily to allow the output of a
POST-activated script to redirect the user agent to a selected resource,
not to solve a logical problem about URIs on the Semantic Web and
information resources."

I'd add a critique:

4.8 There is no metadata discovery protocol

I would add "There is no easy way, given a multitude of possible ways to
access RDF about something, such as RDFa in HTML, following 303, Link
elements in HTML (i.e. Dublin Core), and following Link headers.
Therefore, given a URI, a developer does not know how to get all the RDF
accessible from that URI, much less sort out contradictions if they arise
in OWL. Practically, this means that a developer cannot deploy RDF at a
URI and be assured of what RDF a consuming application will actually
find."

5.1 Use something other than a URI

Not bad as a reminder, but I'd delete and scope us to working with URIs.

5.2 'Hash URI' with fixed suffix

I also do not really see how this solves anything, it just introduces yet
another arbitrary convention, and it doesn't solve any of the problems
with hash URIs.

5.3 'Hashless URI' with site-specific discovery rules

I like this approach, but would note that the addition of using
.well-known and .host-meta will require a general metadata discovery
protocol.


5.4 'Hashless URI' with new HTTP request or response

I agree this might work, but you still have the "reversability" problem
noticed earlier, and it adds unnecessary complexity.

5.5 'Hashless' URI dereferences to its definition

I agree with Ed basically. There is no reason why a URI cannot refer to
both to an object and its description, see URI rule: If IR(u) has a
version with media type 'application/rdf+xml', then take u to be defined
by IR(u), otherwise take u to refer to IR(u).

This should just be part of the media-type definition. As RDF does not
constrain reference to a single *thing* (see the paper on "In Defense of
Ambiguity"), the best we can do with RDF is provide a description whose
interpretation can be a number of things, some of which may be other URIs
and others which may be things in the world outside the Web that we want
to refer to. We can assume when someone is publishing RDF at a URI that
their URI refers to *anything* that satisfies the interpretation of the
RDF statements available at that URI that use that URI.

If they want to refer to the document itself, they need to give that a
distinct name, i.e. a named graph. Then there should be some convention
that says we are talking about the description itself, not its
interpretation, which could be something as simple as using the URI of the
named graph in quotes (finally, a good use for distinguishing strings from
URIs). This is also done via quotes in N3.

5.6 'Hashless' URI dereferences to its definition (incompatibly)

This would also work for me, and I don't see the difference really between
this and 5.5.

I'm going to point out two other solutions:

5.7 Get browsers to do something with RDF

One of the reasons for Linked Data 303  has been that you can put the URI
in the browser and get something resembling an human-readable HTML out via
303+conneg. However, it seems odd to use content negotiation and 303 when
it seems like the real problem is that browser vendors do not support
doing something interesting with RDF, so that when a page is uploaded

5.8 Use an ontology to describe the status of the resources

My one request is to *please* add this. This is the entire point of the
IRW ontology is to give people the options to do this. That people, if
they wish to try to constrain interpretations in some meta-logical
fashion, make distinguishments between IRs and NIRs, and so on - at least
be given the option of making what they want *explicit* and they can do
that in RDFa, RDF/XML available via 303, RDF/XML published directly
without 303, Link Headers, and the like.

5.9 Combine all the various approaches in a unified Metadata Discovery
Protocol.

See above comments, but something for RDF modeled on Eran's "Web Linking"
draft would be ideal. To be honest, we really need to simplify the RDF
stack to get it to take off, and I think the largest simplification would
be "just publish RDF" and "here's a very clear protocol implementers can
follow to get all RDF from a given URI" that then includes all the various
cruft that the community has generated.
Received on Monday, 11 April 2011 17:50:42 UTC