Re: Review of JAR's document from Jonathan Rees on 2011-04-11 (public-awwsw@w3.org from April 2011)

From: Jonathan Rees <jar@creativecommons.org>
Date: Mon, 11 Apr 2011 17:38:43 -0400
To: Harry Halpin <hhalpin@w3.org>
Cc: David Booth <david@dbooth.org>, AWWSW TF <public-awwsw@w3.org>
Message-ID: <BANLkTinZV4zB+SiekqGM_K9qpmH4KZUAQQ@mail.gmail.com>
Thanks for your detailed comments, and hoping to hear from you tomorrow.

On Mon, Apr 11, 2011 at 1:50 PM, Harry Halpin <hhalpin@w3.org> wrote:
> Late, but better than never. Will try to make telecon tomorrow -
> reviewing, although I started a few days ago and missed JAR's latest
> changes:
>
> http://www.w3.org/2001/tag/awwsw/issue57/latest/
>
> The goal of this document should be to precisely define the problem,
> perhaps iterate through a few possible solutions, and then finally settle
> on a solution. I see we have just got to describing the problem and some
> possible solutions. My big executive summary is:
>
> - Add in IRW ontology or specialized vocabulary as one solution

I did include :accessibleVia for exactly this purpose and inspired to
some extent by IRW. Would like to hear your other ideas regarding this
approach - particular properties that solve particular problems. From
where I stand I don't see any others needed at present, other than
maybe one relating a URI to its definition and maybe one relating an
IR to a version of the IR. (I have my own ontology somewhere, can dig
it up.)

> - Add in the need for a Metadata protocol

Good, this goes in 4.7 I think. (assuming by "metadata" you really
mean "URI data" or something of the sort)

> - And I strongly support "application/rdf+xml" mimetype, and future RDF
> mime-types, just meaning that this URI denotes whatever things the RDF
> statements that use that URI accessible from the URI itself describe.

Would this include RDFa? If so then all Creative Commons metadata
becomes wrong, doesn't it?

I think there are now about 8 or 10 RDF mime types. I think we might
need a registry to keep track of them.

Does the RDF representation take priority over all others? I.e. you
would have to conneg for RDF if you wanted to know what the URI meant?
What if RDF were succeeded by something better in the future?

How would you know whether the document defined the URI, and what
would you do if it didn't?

> - I'm becoming more partial to a quoting mechanism to describe the named
> graph, i.e. the document itself. Historically in languages like LISP
> quotes mean "do not interpret", which is precisely what we want them to
> mean here.

Yes, but how is this relevant?

> - Add in part about browser support in browser.

Good.

> - 1. Introduction -
>
> Upon first reading, you use the term "Peak XY". Why not just use a URI,
> like http://www.example.org/PeakXY? I think the problem space should be
> constrained to "If a user-agent is presented with a URI, how can that
> agent determine the *intended* meaning of the URI." Right now, Section
> reads like an introduction to a general theory of discovering the meaning
> of symbols, which is difficult - albeit related - waters.

Hmm. Wonder why what it says isn't clear - it seemed straightforward
to me. It's just stating a general pattern, not trying to relate to
any general theory of anything.

The danger we constantly run is that people think that "meaning" means
something different in RDF as compare to the real world. This may be
true of linked data but it's not true of the way I use RDF (which I
think agrees with TimBL, OWL, the KR folks, etc.). So this is why I
like to hammer home the point that there are real phenomena of
communication and meaning and that RDF/URI is just one vessel for
them. Perhaps you can correct me, or tell me how to say it better.

> Given that caveat, I would notice that (before the "nature of definitions"
> paragraph) "The primary ways terms agents in natural language determine
> the meaning of a term is by its use in context. However, on the Web the
> context in which a URI is presented can often be limited, and as to enable
> interoperability between user-agents there should be a clear algorithm to
> follow that lets the intended meaning of the URI be clear."

I don't see how the two situations are different. In RDF you get
meaning from context, and in natural language you get meaning from
dictionaries (e.g. French and the Academy's dictionary), and there is
nothing normative about FYN, it's just there to be helpful.

> - intention of who?

?

> - I am not entirely sure about this term "dereference". I would prefer the
> term "access", as I think its a bit more obvious. Can you explain a bit
> better the
> difference between them? It seems when you access/dereference, you can
> successfully use a HTTP code to retrieve an associated information
> resource.

"Dereference" is something you do to a URI. "Access" is something you
do to a resource. To call a URI "accessible" would be simply
incorrect.

I get this term from RFC 3986 and it seems as good a term as any. It's
nice because it's not too objectionable, and it's normative.

Dereference is more general than HTTP.

> 2. Glossary
>
> - Put Glossary at end. Otherwise, I doubt anyone will get past it.

Sounds reasonable

> accessible via
>    When a URI is dereferenceable, "the information resource accessible
> via a URI" (abbreviated IR(that URI), see below) is the information
> resource whose versions are the versions obtained by dereferencing
> that URI.
>
> definition:
>
> The "information" could be prose, RDF, OWL, or some combination. ->
> "The "information" could be human-readable prose in natural language,
> machine-readable RDF, OWL, or some combination."

How about:
   any human-readable or machine-readable language,
   or combination of languages.

> fixed information resource
>
> I thought the entire point of this according to TimBL was that it was just
> an information resource that *did* not change. I would merge this
> definition with that of information resource, with fixed being just the
> subset that is not intended to change, in particular over time.

Hmm. I think I had it that way in an earlier draft. It seems pretty
important, so that's why I pulled it out.

It does need its own definition, since this is where everything
grounds out - information resource is defined in terms of it. This
goes back to whether to call them "representations" or "fixed
information resources", which I've brought up about ten times on this
list to little response.

If you try to combine the two definitions, the way it reads is: an
information resource is -- hey wait, here's what a fixed information
resource is -- and then an information resource is --
which is pretty awkward.

> term
>
> A URI, word, name, or phrase that can serve in subject or object position
> in a statement. -> To be pedantic, a URI can also serve as a predicate.
> Just say "that can serve in a position that forms a statement. On the
> Semantic Web, statements are RDF triples where a URI could be in the
> subject, object, or predicate position.."

hmm.  will fix.

by the way, do you have a reference for 'semantic web'?

> refer
>
>    For the purposes of this report, reference is just one way to mean.
> There may be other ways to mean other than to refer, but none are
> specified here. -> This just confuses me a bit.

Why?

>I tried to present a
> more coherent theory in my dissertation distinguishing between
> meaning/reference, but you can also just state that "To refer to
> something, a term should be understood by an agent as "standing-in"
> for some object in the universe of discourse, where that object can be
> separated from the term in space or time."

That's more of a theory than I wanted to have here. That's why I said
"for the purposes of this document".  So not sure what to do.

> version (of an information resource)
>
> This just confuses me. An information resource associated with another
> one? So is anything linked a "version"? I know you've done some
> deep-thinking on this Jonathan, but I'm not convinced by this definition
> quite yet.

In order to be able to explain IRs, we need to be able to apply
metadata predicates at both the representation level and the IR level,
and relate the two levels via metadata. (This is the ONLY way, after
five years, that I've been able to make sense of 'information
resource', and I think it works very well - a longer conversation.)

"a version of" is the relation between the representation-like things
that have metadata, and the IRs that all have metadata.  We could say
that you can apply metadata to both representations and to IRs, and
(if need be) have a bigger class that's the union of the two, but I
feel that the IR idea comes across much better if IRs and
representations are seen as the same kind of thing.

Since many TAG members seem to hate the idea of considering
representations to be IRs, or applying metadata to representations, I
chose the "fixed IR having metadata is a version of an IR having
metadata" approach over the "representation having metadata is a
representation of an IR having metadata" approach, inspired to some
extent by what Roy Fielding is doing with HTTPbis (which I initially
found repulsive but have come to accept).

If I can manage to get enough of your time so that you can understand
what I'm saying, I'd be happy to take your advice on which of the two
approaches is more likely to be understood.

>    A fixed information resource associated with an information resource
> is a version of the information resource. -> "When an information
> resource that is fixed as an octet-stream  but this resource  is
> associated with another information resource that changes, the fixed
> information resources can be considered versions of the original
> changing information resource. For example, a version is  "snapshot"
> of a changing information resource at a given time, or via forks, and
> so on."

OK, this just tells me that my attempt to be precise, concise and
consistent has left me with something hard to understand. Maybe I
should go back to "representation" - although that has a long
*history* of confusing people...

Time for gensyms maybe.

> Use-cases
>
> 3 - General methods in current use.
>
> 3.1 Colocate definition and use: "Just collating definition and use is not
> enough, as one of the features of URIs is that they can be removed from a
> given context and then re-used in another one."

I would say it this way: that this is fragile because the use can get
separated from the definition.  (SPARQL is a good example.)

This would of course apply to the "just be clear" idea that you've
proposed - i.e. for an "ambiguous" URI just provide additional triples
to clarify which sense is meant.

Hmm, so you're with David that the critiques should be inline with the
method descriptions.

> 3.2 Link to documents containing definitions
>
> One could say "Link to a URI with the definition using a special kind of
> link", as I think you want to separate linking from just having the
> definition accessible from the URI."

hmm.

> 3.3 Register a URI scheme or URN namespace
>
> I think the answer to this should be a strong "No" and should be
> discouraged, rather than heavily described as currently is. I feel too
> much space is used on this example.

There are people advocating this and they need to feel that they are
heard, in this document at least. We can follow on with something that
takes a stand later.

And I think many readers will not be adequately informed on this
subject, and not realize that a URI scheme registration is the same
kind of thing as a URI definition, which they are.

However you are probably right about the space.

I don't think we're in any danger of anyone actually doing this.

> 3.4 Use the LSID getMetadata() method
>
> I understand why this is in here, but again, I'd say discourage it.

Again there is a specific audience who, like you and me, need to feel
they're heard; and this report shouldn't be encouraging or
discouraging, but just presenting information.

> 3.5 'Hash URI'
>
> You might want to add "Combined with content negotiation, which determines
> the  media-type, there could be a problem where the hash URI is therefore
> context dependent. So a hash URI for "http://example/sale#p16" could mean
> a segment of a document (paragraph 16) if "text/html" was returned, and
> could mean a  resource describing a canoe if "application/rdf+xml" was
> returned. This is obviously problematic, but seems to be ignored by the
> RDF community so far in practice." You might want to add this to the
> "Critique" bit of Section 4.

Yes, this is the kind of text I was intending to write
>
> 3.6 'Hashless URI' with HTTP 303 See Other redirect
>
> I'm going to point out yet another giant whole in the 303 story. How do
> you get "back" from a URI pointed to by 303. See my comment to 4.6

Why do you need to? And what if there are multiple ways to get back?

Certainly there should be a URI for the URI-defined-in-document
predicate, and then you could just use the inverse of the predicate.

> 4.1 "Fragment identifiers are fragile" -> "fragment identifiers are
> context-dependent"
>
> See above at 3.5.

There are many problems, and conneg/session/useragent/time sensitivity
is just one of them.  I guess each can have its own section heading;
will review.

I was hoping for more detail from you on this:
          "People forget to put it there
          when writing and cut and pasting URIs."
Because it's outside my experience I can't write this up very well.

> 4.4 303 is difficult, sometimes impossible, to deploy
>
> As the person who originally brought this up (you might want to cite my
> email by
> URI), this is a total mess for people to deploy unless they are using tools
> or comfortable using .htaccess. Also, some server software does not support
> .htaccess, and many people do not have access to edit their servers .htaccess
> files.
>
> Another problem is connecting the document URI back to the URI about the
> "resource". So when one uses http://example/p16 one gets redirected to
> http://example/about-p16. However, how does one go BACK from
> http://example/about-p16 to  http://example/p16? One could imagine a
> back-link (we provide this type in IRW), but it's not clearly part of the
> status-code and there's no natural back-link.

Yes, there needs to be a way to express the relationship. I'm hoping
that will be part of the followon consensus effort.

> On a referential leve, I'm just going to point out that the reason that
> the use of the 303 status code can not possibly tell us that the resource
> redirected from was used for referring, arises because the 303 status code
> was specified before the advent of the Semantic Web. As an HTTP response,
> there is no reason why it can't be used to simply to redirect from one
> information resource to another 惺nformation resource, and in fact that
> can and is done.

It was only used for POST. The HTTP WG felt that redefining it in the
case of GET was safe, so that's what they're doing for HTTPbis.

We can amplify this in the rec track document we're going to produce,
although I think HTTPbis pretty well has it covered.

> As put by RFC 1738 "this method exists primarily to allow the output of a
> POST-activated script to redirect the user agent to a selected resource,
> not to solve a logical problem about URIs on the Semantic Web and
> information resources."
>
> I'd add a critique:
>
> 4.8 There is no metadata discovery protocol
>
> I would add "There is no easy way, given a multitude of possible ways to
> access RDF about something, such as RDFa in HTML, following 303, Link
> elements in HTML (i.e. Dublin Core), and following Link headers.
> Therefore, given a URI, a developer does not know how to get all the RDF
> accessible from that URI, much less sort out contradictions if they arise
> in OWL. Practically, this means that a developer cannot deploy RDF at a
> URI and be assured of what RDF a consuming application will actually
> find."

Yes, I think the document ought to say something like this, although
it's not a critique of any method, it's just a statement that
interoperability is desirable and therefore standardization is
desirable.

> 5.1 Use something other than a URI
>
> Not bad as a reminder, but I'd delete and scope us to working with URIs.

Seemed like an important reminder - if the "take it at face value"
solution gets traction then Creative Commons will need to retool and I
think this would be the best approach.

> 5.2 'Hash URI' with fixed suffix
>
> I also do not really see how this solves anything, it just introduces yet
> another arbitrary convention, and it doesn't solve any of the problems
> with hash URIs.

It does not introduce a new convention that clients need to know
about. It doesn't solve all of the problems, but it does address one
complaint which is that the namespaces don't scale. We've heard this a
lot. In conjunction with other remediations such as avoiding conneg
and use of CURIEs I think it would work pretty well.

> 5.3 'Hashless URI' with site-specific discovery rules
>
> I like this approach, but would note that the addition of using
> .well-known and .host-meta will require a general metadata discovery
> protocol.

Yes, this method would require people to know about it - but only
those who care about the performance benefit. I'm not sure why this
wasn't clear - I do say "this is a new protocol".

Benefiting from this would not require a standard discovery protocol,
but it would be nice if it meshed well with information resource
discovery. Will need to think about that.

(Sandro and I have been discussing this.)

> 5.4 'Hashless URI' with new HTTP request or response
>
> I agree this might work, but you still have the "reversability" problem
> noticed earlier, and it adds unnecessary complexity.
>
> 5.5 'Hashless' URI dereferences to its definition
>
> I agree with Ed basically. There is no reason why a URI cannot refer to
> both to an object and its description, see URI rule: If IR(u) has a
> version with media type 'application/rdf+xml', then take u to be defined
> by IR(u), otherwise take u to refer to IR(u).

But not turtle, n3, OWL, RDFa, Manchester syntax, right?  And with a
priority system so that you *must* check for application/rdf+xml
first?

> This should just be part of the media-type definition. As RDF does not
> constrain reference to a single *thing* (see the paper on "In Defense of
> Ambiguity"), the best we can do with RDF is provide a description whose
> interpretation can be a number of things, some of which may be other URIs
> and others which may be things in the world outside the Web that we want
> to refer to. We can assume when someone is publishing RDF at a URI that
> their URI refers to *anything* that satisfies the interpretation of the
> RDF statements available at that URI that use that URI.

Even if the URI doesn't occur in the document, right? Then it could
refer to anything at all.

> If they want to refer to the document itself, they need to give that a
> distinct name, i.e. a named graph.

No, I gave a solution in the draft, you just say [ :accessibleVia
"http://example/doc" ]
meaning the IR at that URI.

Can you explain what named graphs have to do with this?

I don't mean to be testy; I am really interested to understand your
view and am sort of desperate for interaction with someone who takes
this view - I've not been successful at getting others on this list to
represent it.

> Then there should be some convention
> that says we are talking about the description itself, not its
> interpretation, which could be something as simple as using the URI of the
> named graph in quotes (finally, a good use for distinguishing strings from
> URIs). This is also done via quotes in N3.

sorry, I don't get it.

> 5.6 'Hashless' URI dereferences to its definition (incompatibly)
>
> This would also work for me, and I don't see the difference really between
> this and 5.5.

Under 5.5 Creative Commons and Tabulator still work. Under 5.6 they
don't. So from my POV there is a huge difference.

> I'm going to point out two other solutions:
>
> 5.7 Get browsers to do something with RDF
>
> One of the reasons for Linked Data 303  has been that you can put the URI
> in the browser and get something resembling an human-readable HTML out via
> 303+conneg. However, it seems odd to use content negotiation and 303 when
> it seems like the real problem is that browser vendors do not support
> doing something interesting with RDF, so that when a page is uploaded

I would like to include something like this but have not been able to
think of anything concrete enough.

If we were to ask for a miracle from the browser folks, I wonder if a
new URI scheme might be the answer...

Your sentence seems to have been cut off. I'd like to hear more.

> 5.8 Use an ontology to describe the status of the resources
>
> My one request is to *please* add this. This is the entire point of the
> IRW ontology is to give people the options to do this. That people, if
> they wish to try to constrain interpretations in some meta-logical
> fashion, make distinguishments between IRs and NIRs, and so on - at least
> be given the option of making what they want *explicit* and they can do
> that in RDFa, RDF/XML available via 303, RDF/XML published directly
> without 303, Link Headers, and the like.

As I said above this is something I want to do in the
consensus-building phase. It's sort of a meta-problem but I'll try to
say something about it.

There is no reason to distinguish IR vs. NIR, by the way - I really
would like to squash this meme. What you need is the distinction
between u defined by IR(u) and u refers IR(u), which is different.

> 5.9 Combine all the various approaches in a unified Metadata Discovery
> Protocol.
>
> See above comments, but something for RDF modeled on Eran's "Web Linking"
> draft would be ideal. To be honest, we really need to simplify the RDF
> stack to get it to take off, and I think the largest simplification would
> be "just publish RDF" and "here's a very clear protocol implementers can
> follow to get all RDF from a given URI" that then includes all the various
> cruft that the community has generated.

I'd love to see clarity and consensus. This document is just the first
step, and building or altering consensus on httpRange-14 would be the
second.

I actually thought there was a fairly well understood procedure for
getting a definition, but what do I know, I'm not in the trenches
these days.

I can't do this alone. After this document goes out some volunteers
may appear, but who do you think would be interested in working on the
problem?

Jonathan
Received on Monday, 11 April 2011 21:39:13 UTC