Re: Review of JAR's document from David Booth on 2011-04-12 (public-awwsw@w3.org from April 2011)

From: David Booth <david@dbooth.org>
Date: Tue, 12 Apr 2011 09:21:48 -0400
To: Jonathan Rees <jar@creativecommons.org>
Cc: Harry Halpin <hhalpin@w3.org>, AWWSW TF <public-awwsw@w3.org>
Message-ID: <1302614508.1983.66429.camel@dbooth-laptop>
Just a few comments inline . . .

On Mon, 2011-04-11 at 17:38 -0400, Jonathan Rees wrote:
> Thanks for your detailed comments, and hoping to hear from you tomorrow.
> 
> On Mon, Apr 11, 2011 at 1:50 PM, Harry Halpin <hhalpin@w3.org> wrote:
> > Late, but better than never. Will try to make telecon tomorrow -
> > reviewing, although I started a few days ago and missed JAR's latest
> > changes:
> >
> > http://www.w3.org/2001/tag/awwsw/issue57/latest/
> >
> > The goal of this document should be to precisely define the problem,
> > perhaps iterate through a few possible solutions, and then finally settle
> > on a solution. I see we have just got to describing the problem and some
> > possible solutions. My big executive summary is:
> >
> > - Add in IRW ontology or specialized vocabulary as one solution
> 
> I did include :accessibleVia for exactly this purpose and inspired to
> some extent by IRW. Would like to hear your other ideas regarding this
> approach - particular properties that solve particular problems. From
> where I stand I don't see any others needed at present, other than
> maybe one relating a URI to its definition and maybe one relating an
> IR to a version of the IR. (I have my own ontology somewhere, can dig
> it up.)
> 
> > - Add in the need for a Metadata protocol
> 
> Good, this goes in 4.7 I think. (assuming by "metadata" you really
> mean "URI data" or something of the sort)
> 
> > - And I strongly support "application/rdf+xml" mimetype, and future RDF
> > mime-types, just meaning that this URI denotes whatever things the RDF
> > statements that use that URI accessible from the URI itself describe.
> 
> Would this include RDFa? If so then all Creative Commons metadata
> becomes wrong, doesn't it?
> 
> I think there are now about 8 or 10 RDF mime types. I think we might
> need a registry to keep track of them.
> 
> Does the RDF representation take priority over all others? I.e. you
> would have to conneg for RDF if you wanted to know what the URI meant?
> What if RDF were succeeded by something better in the future?
> 
> How would you know whether the document defined the URI, and what
> would you do if it didn't?
> 
> > - I'm becoming more partial to a quoting mechanism to describe the named
> > graph, i.e. the document itself. Historically in languages like LISP
> > quotes mean "do not interpret", which is precisely what we want them to
> > mean here.
> 
> Yes, but how is this relevant?
> 
> > - Add in part about browser support in browser.
> 
> Good.
> 
> > - 1. Introduction -
> >
> > Upon first reading, you use the term "Peak XY". Why not just use a URI,
> > like http://www.example.org/PeakXY? I think the problem space should be
> > constrained to "If a user-agent is presented with a URI, how can that
> > agent determine the *intended* meaning of the URI." Right now, Section
> > reads like an introduction to a general theory of discovering the meaning
> > of symbols, which is difficult - albeit related - waters.
> 
> Hmm. Wonder why what it says isn't clear - it seemed straightforward
> to me. It's just stating a general pattern, not trying to relate to
> any general theory of anything.
> 
> The danger we constantly run is that people think that "meaning" means
> something different in RDF as compare to the real world. This may be
> true of linked data but it's not true of the way I use RDF (which I
> think agrees with TimBL, OWL, the KR folks, etc.). So this is why I
> like to hammer home the point that there are real phenomena of
> communication and meaning and that RDF/URI is just one vessel for
> them. Perhaps you can correct me, or tell me how to say it better.

Perhaps I am in the camp of believing that '"meaning" means something
different in RDF as compare to the real world', because: (a) "meaning"
can be precisely defined in RDF (as a set of assertions, or a set of
satisfying interpretations per RDF semantics), whereas it seems doubtful
that humans could agree on a precise definition; and (b) RDF
applications can easily follow precise algorithms to ensure that they
are following proper protocols in determining meaning, whereas humans
are not so good at that.

> 
> > Given that caveat, I would notice that (before the "nature of definitions"
> > paragraph) "The primary ways terms agents in natural language determine
> > the meaning of a term is by its use in context. However, on the Web the
> > context in which a URI is presented can often be limited, and as to enable
> > interoperability between user-agents there should be a clear algorithm to
> > follow that lets the intended meaning of the URI be clear."
> 
> I don't see how the two situations are different. In RDF you get
> meaning from context, and in natural language you get meaning from
> dictionaries (e.g. French and the Academy's dictionary), and there is
> nothing normative about FYN, it's just there to be helpful.
> 
> > - intention of who?
> 
> ?
> 
> > - I am not entirely sure about this term "dereference". I would prefer the
> > term "access", as I think its a bit more obvious. Can you explain a bit
> > better the
> > difference between them? It seems when you access/dereference, you can
> > successfully use a HTTP code to retrieve an associated information
> > resource.
> 
> "Dereference" is something you do to a URI. "Access" is something you
> do to a resource. To call a URI "accessible" would be simply
> incorrect.
> 
> I get this term from RFC 3986 and it seems as good a term as any. It's
> nice because it's not too objectionable, and it's normative.
> 
> Dereference is more general than HTTP.
> 
> > 2. Glossary
> >
> > - Put Glossary at end. Otherwise, I doubt anyone will get past it.
> 
> Sounds reasonable
> 
> > accessible via
> >    When a URI is dereferenceable, "the information resource accessible
> > via a URI" (abbreviated IR(that URI), see below) is the information
> > resource whose versions are the versions obtained by dereferencing
> > that URI.
> >
> > definition:
> >
> > The "information" could be prose, RDF, OWL, or some combination. ->
> > "The "information" could be human-readable prose in natural language,
> > machine-readable RDF, OWL, or some combination."
> 
> How about:
>    any human-readable or machine-readable language,
>    or combination of languages.
> 
> > fixed information resource
> >
> > I thought the entire point of this according to TimBL was that it was just
> > an information resource that *did* not change. I would merge this
> > definition with that of information resource, with fixed being just the
> > subset that is not intended to change, in particular over time.
> 
> Hmm. I think I had it that way in an earlier draft. It seems pretty
> important, so that's why I pulled it out.
> 
> It does need its own definition, since this is where everything
> grounds out - information resource is defined in terms of it. This
> goes back to whether to call them "representations" or "fixed
> information resources", which I've brought up about ten times on this
> list to little response.
> 
> If you try to combine the two definitions, the way it reads is: an
> information resource is -- hey wait, here's what a fixed information
> resource is -- and then an information resource is --
> which is pretty awkward.
> 
> > term
> >
> > A URI, word, name, or phrase that can serve in subject or object position
> > in a statement. -> To be pedantic, a URI can also serve as a predicate.
> > Just say "that can serve in a position that forms a statement. On the
> > Semantic Web, statements are RDF triples where a URI could be in the
> > subject, object, or predicate position.."
> 
> hmm.  will fix.
> 
> by the way, do you have a reference for 'semantic web'?
> 
> > refer
> >
> >    For the purposes of this report, reference is just one way to mean.
> > There may be other ways to mean other than to refer, but none are
> > specified here. -> This just confuses me a bit.
> 
> Why?
> 
> >I tried to present a
> > more coherent theory in my dissertation distinguishing between
> > meaning/reference, but you can also just state that "To refer to
> > something, a term should be understood by an agent as "standing-in"
> > for some object in the universe of discourse, where that object can be
> > separated from the term in space or time."
> 
> That's more of a theory than I wanted to have here. That's why I said
> "for the purposes of this document".  So not sure what to do.
> 
> > version (of an information resource)
> >
> > This just confuses me. An information resource associated with another
> > one? So is anything linked a "version"? I know you've done some
> > deep-thinking on this Jonathan, but I'm not convinced by this definition
> > quite yet.
> 
> In order to be able to explain IRs, we need to be able to apply
> metadata predicates at both the representation level and the IR level,
> and relate the two levels via metadata. (This is the ONLY way, after
> five years, that I've been able to make sense of 'information
> resource', and I think it works very well - a longer conversation.)

But I don't think that is needed in *this* document, because *this*
document (AFAICT) is about the *mechanics* of providing and obtaining
definitions -- not about how they are interpreted.

> 
> "a version of" is the relation between the representation-like things
> that have metadata, and the IRs that all have metadata.  We could say
> that you can apply metadata to both representations and to IRs, and
> (if need be) have a bigger class that's the union of the two, but I
> feel that the IR idea comes across much better if IRs and
> representations are seen as the same kind of thing.
> 
> Since many TAG members seem to hate the idea of considering
> representations to be IRs, or applying metadata to representations, I
> chose the "fixed IR having metadata is a version of an IR having
> metadata" approach over the "representation having metadata is a
> representation of an IR having metadata" approach, inspired to some
> extent by what Roy Fielding is doing with HTTPbis (which I initially
> found repulsive but have come to accept).
> 
> If I can manage to get enough of your time so that you can understand
> what I'm saying, I'd be happy to take your advice on which of the two
> approaches is more likely to be understood.
> 
> >    A fixed information resource associated with an information resource
> > is a version of the information resource. -> "When an information
> > resource that is fixed as an octet-stream  but this resource  is
> > associated with another information resource that changes, the fixed
> > information resources can be considered versions of the original
> > changing information resource. For example, a version is  "snapshot"
> > of a changing information resource at a given time, or via forks, and
> > so on."
> 
> OK, this just tells me that my attempt to be precise, concise and
> consistent has left me with something hard to understand. Maybe I
> should go back to "representation" - although that has a long
> *history* of confusing people...

Yes, I think it will be more easily understandable if existing terms are
used.

> 
> Time for gensyms maybe.
> 
> > Use-cases
> >
> > 3 - General methods in current use.
> >
> > 3.1 Colocate definition and use: "Just collating definition and use is not
> > enough, as one of the features of URIs is that they can be removed from a
> > given context and then re-used in another one."
> 
> I would say it this way: that this is fragile because the use can get
> separated from the definition.  (SPARQL is a good example.)
> 
> This would of course apply to the "just be clear" idea that you've
> proposed - i.e. for an "ambiguous" URI just provide additional triples
> to clarify which sense is meant.
> 
> Hmm, so you're with David that the critiques should be inline with the
> method descriptions.
> 
> > 3.2 Link to documents containing definitions
> >
> > One could say "Link to a URI with the definition using a special kind of
> > link", as I think you want to separate linking from just having the
> > definition accessible from the URI."
> 
> hmm.
> 
> > 3.3 Register a URI scheme or URN namespace
> >
> > I think the answer to this should be a strong "No" and should be
> > discouraged, rather than heavily described as currently is. I feel too
> > much space is used on this example.
> 
> There are people advocating this and they need to feel that they are
> heard, in this document at least. We can follow on with something that
> takes a stand later.
> 
> And I think many readers will not be adequately informed on this
> subject, and not realize that a URI scheme registration is the same
> kind of thing as a URI definition, which they are.
> 
> However you are probably right about the space.
> 
> I don't think we're in any danger of anyone actually doing this.
> 
> > 3.4 Use the LSID getMetadata() method
> >
> > I understand why this is in here, but again, I'd say discourage it.
> 
> Again there is a specific audience who, like you and me, need to feel
> they're heard; and this report shouldn't be encouraging or
> discouraging, but just presenting information.
> 
> > 3.5 'Hash URI'
> >
> > You might want to add "Combined with content negotiation, which determines
> > the  media-type, there could be a problem where the hash URI is therefore
> > context dependent. So a hash URI for "http://example/sale#p16" could mean
> > a segment of a document (paragraph 16) if "text/html" was returned, and
> > could mean a  resource describing a canoe if "application/rdf+xml" was
> > returned. This is obviously problematic, but seems to be ignored by the
> > RDF community so far in practice." You might want to add this to the
> > "Critique" bit of Section 4.
> 
> Yes, this is the kind of text I was intending to write
> >
> > 3.6 'Hashless URI' with HTTP 303 See Other redirect
> >
> > I'm going to point out yet another giant whole in the 303 story. How do
> > you get "back" from a URI pointed to by 303. See my comment to 4.6
> 
> Why do you need to? And what if there are multiple ways to get back?
> 
> Certainly there should be a URI for the URI-defined-in-document
> predicate, and then you could just use the inverse of the predicate.
> 
> > 4.1 "Fragment identifiers are fragile" -> "fragment identifiers are
> > context-dependent"
> >
> > See above at 3.5.
> 
> There are many problems, and conneg/session/useragent/time sensitivity
> is just one of them.  I guess each can have its own section heading;
> will review.
> 
> I was hoping for more detail from you on this:
>           "People forget to put it there
>           when writing and cut and pasting URIs."
> Because it's outside my experience I can't write this up very well.
> 
> > 4.4 303 is difficult, sometimes impossible, to deploy
> >
> > As the person who originally brought this up (you might want to cite my
> > email by
> > URI), this is a total mess for people to deploy unless they are using tools
> > or comfortable using .htaccess. Also, some server software does not support
> > .htaccess, and many people do not have access to edit their servers .htaccess
> > files.
> >
> > Another problem is connecting the document URI back to the URI about the
> > "resource". So when one uses http://example/p16 one gets redirected to
> > http://example/about-p16. However, how does one go BACK from
> > http://example/about-p16 to  http://example/p16? One could imagine a
> > back-link (we provide this type in IRW), but it's not clearly part of the
> > status-code and there's no natural back-link.
> 
> Yes, there needs to be a way to express the relationship. I'm hoping
> that will be part of the followon consensus effort.
> 
> > On a referential leve, I'm just going to point out that the reason that
> > the use of the 303 status code can not possibly tell us that the resource
> > redirected from was used for referring, arises because the 303 status code
> > was specified before the advent of the Semantic Web. As an HTTP response,
> > there is no reason why it can't be used to simply to redirect from one
> > information resource to another 惺nformation resource, and in fact that
> > can and is done.
> 
> It was only used for POST. The HTTP WG felt that redefining it in the
> case of GET was safe, so that's what they're doing for HTTPbis.
> 
> We can amplify this in the rec track document we're going to produce,
> although I think HTTPbis pretty well has it covered.
> 
> > As put by RFC 1738 "this method exists primarily to allow the output of a
> > POST-activated script to redirect the user agent to a selected resource,
> > not to solve a logical problem about URIs on the Semantic Web and
> > information resources."
> >
> > I'd add a critique:
> >
> > 4.8 There is no metadata discovery protocol
> >
> > I would add "There is no easy way, given a multitude of possible ways to
> > access RDF about something, such as RDFa in HTML, following 303, Link
> > elements in HTML (i.e. Dublin Core), and following Link headers.
> > Therefore, given a URI, a developer does not know how to get all the RDF
> > accessible from that URI, much less sort out contradictions if they arise
> > in OWL. Practically, this means that a developer cannot deploy RDF at a
> > URI and be assured of what RDF a consuming application will actually
> > find."
> 
> Yes, I think the document ought to say something like this, although
> it's not a critique of any method, it's just a statement that
> interoperability is desirable and therefore standardization is
> desirable.
> 
> > 5.1 Use something other than a URI
> >
> > Not bad as a reminder, but I'd delete and scope us to working with URIs.
> 
> Seemed like an important reminder - if the "take it at face value"
> solution gets traction then Creative Commons will need to retool and I
> think this would be the best approach.

I agree with Harry on this.  Better to just delete this and keep the
scope of the document to URIs.

> 
> > 5.2 'Hash URI' with fixed suffix
> >
> > I also do not really see how this solves anything, it just introduces yet
> > another arbitrary convention, and it doesn't solve any of the problems
> > with hash URIs.
> 
> It does not introduce a new convention that clients need to know
> about. It doesn't solve all of the problems, but it does address one
> complaint which is that the namespaces don't scale. We've heard this a
> lot. In conjunction with other remediations such as avoiding conneg
> and use of CURIEs I think it would work pretty well.
> 
> > 5.3 'Hashless URI' with site-specific discovery rules
> >
> > I like this approach, but would note that the addition of using
> > .well-known and .host-meta will require a general metadata discovery
> > protocol.
> 
> Yes, this method would require people to know about it - but only
> those who care about the performance benefit. I'm not sure why this
> wasn't clear - I do say "this is a new protocol".
> 
> Benefiting from this would not require a standard discovery protocol,
> but it would be nice if it meshed well with information resource
> discovery. Will need to think about that.
> 
> (Sandro and I have been discussing this.)

FWIW, I think this approach has potential also.

> 
> > 5.4 'Hashless URI' with new HTTP request or response
> >
> > I agree this might work, but you still have the "reversability" problem
> > noticed earlier, and it adds unnecessary complexity.
> >
> > 5.5 'Hashless' URI dereferences to its definition
> >
> > I agree with Ed basically. There is no reason why a URI cannot refer to
> > both to an object and its description, see URI rule: If IR(u) has a
> > version with media type 'application/rdf+xml', then take u to be defined
> > by IR(u), otherwise take u to refer to IR(u).
> 
> But not turtle, n3, OWL, RDFa, Manchester syntax, right?  And with a
> priority system so that you *must* check for application/rdf+xml
> first?
> 
> > This should just be part of the media-type definition. As RDF does not
> > constrain reference to a single *thing* (see the paper on "In Defense of
> > Ambiguity"), the best we can do with RDF is provide a description whose
> > interpretation can be a number of things, some of which may be other URIs
> > and others which may be things in the world outside the Web that we want
> > to refer to. We can assume when someone is publishing RDF at a URI that
> > their URI refers to *anything* that satisfies the interpretation of the
> > RDF statements available at that URI that use that URI.
> 
> Even if the URI doesn't occur in the document, right? Then it could
> refer to anything at all.

RDF statements can still indirectly constrain the interpretations, even
if the URI does not appear in those statements.

> 
> > If they want to refer to the document itself, they need to give that a
> > distinct name, i.e. a named graph.
> 
> No, I gave a solution in the draft, you just say [ :accessibleVia
> "http://example/doc" ]
> meaning the IR at that URI.
> 
> Can you explain what named graphs have to do with this?
> 
> I don't mean to be testy; I am really interested to understand your
> view and am sort of desperate for interaction with someone who takes
> this view - I've not been successful at getting others on this list to
> represent it.
> 
> > Then there should be some convention
> > that says we are talking about the description itself, not its
> > interpretation, which could be something as simple as using the URI of the
> > named graph in quotes (finally, a good use for distinguishing strings from
> > URIs). This is also done via quotes in N3.
> 
> sorry, I don't get it.
> 
> > 5.6 'Hashless' URI dereferences to its definition (incompatibly)
> >
> > This would also work for me, and I don't see the difference really between
> > this and 5.5.
> 
> Under 5.5 Creative Commons and Tabulator still work. Under 5.6 they
> don't. So from my POV there is a huge difference.

The Creative Commons license use case is a really excellent use case for
demonstrating this ambiguity issue.  Since the point of *this* document
is to focus on the *mechanics* of providing and obtaining definitions --
not interpreting them -- then it seems to me that it falls outside of
the scope of this document.  But I do think it is a use case that we
should exploit in another document.

David

> 
> > I'm going to point out two other solutions:
> >
> > 5.7 Get browsers to do something with RDF
> >
> > One of the reasons for Linked Data 303  has been that you can put the URI
> > in the browser and get something resembling an human-readable HTML out via
> > 303+conneg. However, it seems odd to use content negotiation and 303 when
> > it seems like the real problem is that browser vendors do not support
> > doing something interesting with RDF, so that when a page is uploaded
> 
> I would like to include something like this but have not been able to
> think of anything concrete enough.
> 
> If we were to ask for a miracle from the browser folks, I wonder if a
> new URI scheme might be the answer...
> 
> Your sentence seems to have been cut off. I'd like to hear more.
> 
> > 5.8 Use an ontology to describe the status of the resources
> >
> > My one request is to *please* add this. This is the entire point of the
> > IRW ontology is to give people the options to do this. That people, if
> > they wish to try to constrain interpretations in some meta-logical
> > fashion, make distinguishments between IRs and NIRs, and so on - at least
> > be given the option of making what they want *explicit* and they can do
> > that in RDFa, RDF/XML available via 303, RDF/XML published directly
> > without 303, Link Headers, and the like.
> 
> As I said above this is something I want to do in the
> consensus-building phase. It's sort of a meta-problem but I'll try to
> say something about it.
> 
> There is no reason to distinguish IR vs. NIR, by the way - I really
> would like to squash this meme. What you need is the distinction
> between u defined by IR(u) and u refers IR(u), which is different.
> 
> > 5.9 Combine all the various approaches in a unified Metadata Discovery
> > Protocol.
> >
> > See above comments, but something for RDF modeled on Eran's "Web Linking"
> > draft would be ideal. To be honest, we really need to simplify the RDF
> > stack to get it to take off, and I think the largest simplification would
> > be "just publish RDF" and "here's a very clear protocol implementers can
> > follow to get all RDF from a given URI" that then includes all the various
> > cruft that the community has generated.
> 
> I'd love to see clarity and consensus. This document is just the first
> step, and building or altering consensus on httpRange-14 would be the
> second.
> 
> I actually thought there was a fairly well understood procedure for
> getting a definition, but what do I know, I'm not in the trenches
> these days.
> 
> I can't do this alone. After this document goes out some volunteers
> may appear, but who do you think would be interested in working on the
> problem?
> 
> Jonathan
> 
> 

-- 
David Booth, Ph.D.
http://dbooth.org/

Opinions expressed herein are those of the author and do not necessarily
reflect those of his employer.
Received on Tuesday, 12 April 2011 13:22:13 UTC