- From: Harry Halpin <hhalpin@w3.org>
- Date: Mon, 11 Apr 2011 18:50:39 +0100 (BST)
- To: "David Booth" <david@dbooth.org>
- Cc: "Jonathan Rees" <jar@creativecommons.org>, "AWWSW TF" <public-awwsw@w3.org>
Late, but better than never. Will try to make telecon tomorrow - reviewing, although I started a few days ago and missed JAR's latest changes: http://www.w3.org/2001/tag/awwsw/issue57/latest/ The goal of this document should be to precisely define the problem, perhaps iterate through a few possible solutions, and then finally settle on a solution. I see we have just got to describing the problem and some possible solutions. My big executive summary is: - Add in IRW ontology or specialized vocabulary as one solution - Add in the need for a Metadata protocol - And I strongly support "application/rdf+xml" mimetype, and future RDF mime-types, just meaning that this URI denotes whatever things the RDF statements that use that URI accessible from the URI itself describe. - I'm becoming more partial to a quoting mechanism to describe the named graph, i.e. the document itself. Historically in languages like LISP quotes mean "do not interpret", which is precisely what we want them to mean here. - Add in part about browser support in browser. - 1. Introduction - Upon first reading, you use the term "Peak XY". Why not just use a URI, like http://www.example.org/PeakXY? I think the problem space should be constrained to "If a user-agent is presented with a URI, how can that agent determine the *intended* meaning of the URI." Right now, Section reads like an introduction to a general theory of discovering the meaning of symbols, which is difficult - albeit related - waters. Given that caveat, I would notice that (before the "nature of definitions" paragraph) "The primary ways terms agents in natural language determine the meaning of a term is by its use in context. However, on the Web the context in which a URI is presented can often be limited, and as to enable interoperability between user-agents there should be a clear algorithm to follow that lets the intended meaning of the URI be clear." - intention of who? - I am not entirely sure about this term "dereference". I would prefer the term "access", as I think its a bit more obvious. Can you explain a bit better the difference between them? It seems when you access/dereference, you can successfully use a HTTP code to retrieve an associated information resource. 2. Glossary - Put Glossary at end. Otherwise, I doubt anyone will get past it. accessible via When a URI is dereferenceable, "the information resource accessible via a URI" (abbreviated IR(that URI), see below) is the information resource whose versions are the versions obtained by dereferencing that URI. definition: The "information" could be prose, RDF, OWL, or some combination. -> "The "information" could be human-readable prose in natural language, machine-readable RDF, OWL, or some combination." fixed information resource I thought the entire point of this according to TimBL was that it was just an information resource that *did* not change. I would merge this definition with that of information resource, with fixed being just the subset that is not intended to change, in particular over time. term A URI, word, name, or phrase that can serve in subject or object position in a statement. -> To be pedantic, a URI can also serve as a predicate. Just say "that can serve in a position that forms a statement. On the Semantic Web, statements are RDF triples where a URI could be in the subject, object, or predicate position.." refer For the purposes of this report, reference is just one way to mean. There may be other ways to mean other than to refer, but none are specified here. -> This just confuses me a bit. I tried to present a more coherent theory in my dissertation distinguishing between meaning/reference, but you can also just state that "To refer to something, a term should be understood by an agent as "standing-in" for some object in the universe of discourse, where that object can be separated from the term in space or time." version (of an information resource) This just confuses me. An information resource associated with another one? So is anything linked a "version"? I know you've done some deep-thinking on this Jonathan, but I'm not convinced by this definition quite yet. A fixed information resource associated with an information resource is a version of the information resource. -> "When an information resource that is fixed as an octet-stream but this resource is associated with another information resource that changes, the fixed information resources can be considered versions of the original changing information resource. For example, a version is "snapshot" of a changing information resource at a given time, or via forks, and so on." Use-cases 3 - General methods in current use. 3.1 Colocate definition and use: "Just collating definition and use is not enough, as one of the features of URIs is that they can be removed from a given context and then re-used in another one." 3.2 Link to documents containing definitions One could say "Link to a URI with the definition using a special kind of link", as I think you want to separate linking from just having the definition accessible from the URI." 3.3 Register a URI scheme or URN namespace I think the answer to this should be a strong "No" and should be discouraged, rather than heavily described as currently is. I feel too much space is used on this example. 3.4 Use the LSID getMetadata() method I understand why this is in here, but again, I'd say discourage it. 3.5 'Hash URI' You might want to add "Combined with content negotiation, which determines the media-type, there could be a problem where the hash URI is therefore context dependent. So a hash URI for "http://example/sale#p16" could mean a segment of a document (paragraph 16) if "text/html" was returned, and could mean a resource describing a canoe if "application/rdf+xml" was returned. This is obviously problematic, but seems to be ignored by the RDF community so far in practice." You might want to add this to the "Critique" bit of Section 4. 3.6 'Hashless URI' with HTTP 303 See Other redirect I'm going to point out yet another giant whole in the 303 story. How do you get "back" from a URI pointed to by 303. See my comment to 4.6 4.1 "Fragment identifiers are fragile" -> "fragment identifiers are context-dependent" See above at 3.5. 4.4 303 is difficult, sometimes impossible, to deploy As the person who originally brought this up (you might want to cite my email by URI), this is a total mess for people to deploy unless they are using tools or comfortable using .htaccess. Also, some server software does not support .htaccess, and many people do not have access to edit their servers .htaccess files. Another problem is connecting the document URI back to the URI about the "resource". So when one uses http://example/p16 one gets redirected to http://example/about-p16. However, how does one go BACK from http://example/about-p16 to http://example/p16? One could imagine a back-link (we provide this type in IRW), but it's not clearly part of the status-code and there's no natural back-link. On a referential leve, I'm just going to point out that the reason that the use of the 303 status code can not possibly tell us that the resource redirected from was used for referring, arises because the 303 status code was specified before the advent of the Semantic Web. As an HTTP response, there is no reason why it can't be used to simply to redirect from one information resource to another ´information resource, and in fact that can and is done. As put by RFC 1738 "this method exists primarily to allow the output of a POST-activated script to redirect the user agent to a selected resource, not to solve a logical problem about URIs on the Semantic Web and information resources." I'd add a critique: 4.8 There is no metadata discovery protocol I would add "There is no easy way, given a multitude of possible ways to access RDF about something, such as RDFa in HTML, following 303, Link elements in HTML (i.e. Dublin Core), and following Link headers. Therefore, given a URI, a developer does not know how to get all the RDF accessible from that URI, much less sort out contradictions if they arise in OWL. Practically, this means that a developer cannot deploy RDF at a URI and be assured of what RDF a consuming application will actually find." 5.1 Use something other than a URI Not bad as a reminder, but I'd delete and scope us to working with URIs. 5.2 'Hash URI' with fixed suffix I also do not really see how this solves anything, it just introduces yet another arbitrary convention, and it doesn't solve any of the problems with hash URIs. 5.3 'Hashless URI' with site-specific discovery rules I like this approach, but would note that the addition of using .well-known and .host-meta will require a general metadata discovery protocol. 5.4 'Hashless URI' with new HTTP request or response I agree this might work, but you still have the "reversability" problem noticed earlier, and it adds unnecessary complexity. 5.5 'Hashless' URI dereferences to its definition I agree with Ed basically. There is no reason why a URI cannot refer to both to an object and its description, see URI rule: If IR(u) has a version with media type 'application/rdf+xml', then take u to be defined by IR(u), otherwise take u to refer to IR(u). This should just be part of the media-type definition. As RDF does not constrain reference to a single *thing* (see the paper on "In Defense of Ambiguity"), the best we can do with RDF is provide a description whose interpretation can be a number of things, some of which may be other URIs and others which may be things in the world outside the Web that we want to refer to. We can assume when someone is publishing RDF at a URI that their URI refers to *anything* that satisfies the interpretation of the RDF statements available at that URI that use that URI. If they want to refer to the document itself, they need to give that a distinct name, i.e. a named graph. Then there should be some convention that says we are talking about the description itself, not its interpretation, which could be something as simple as using the URI of the named graph in quotes (finally, a good use for distinguishing strings from URIs). This is also done via quotes in N3. 5.6 'Hashless' URI dereferences to its definition (incompatibly) This would also work for me, and I don't see the difference really between this and 5.5. I'm going to point out two other solutions: 5.7 Get browsers to do something with RDF One of the reasons for Linked Data 303 has been that you can put the URI in the browser and get something resembling an human-readable HTML out via 303+conneg. However, it seems odd to use content negotiation and 303 when it seems like the real problem is that browser vendors do not support doing something interesting with RDF, so that when a page is uploaded 5.8 Use an ontology to describe the status of the resources My one request is to *please* add this. This is the entire point of the IRW ontology is to give people the options to do this. That people, if they wish to try to constrain interpretations in some meta-logical fashion, make distinguishments between IRs and NIRs, and so on - at least be given the option of making what they want *explicit* and they can do that in RDFa, RDF/XML available via 303, RDF/XML published directly without 303, Link Headers, and the like. 5.9 Combine all the various approaches in a unified Metadata Discovery Protocol. See above comments, but something for RDF modeled on Eran's "Web Linking" draft would be ideal. To be honest, we really need to simplify the RDF stack to get it to take off, and I think the largest simplification would be "just publish RDF" and "here's a very clear protocol implementers can follow to get all RDF from a given URI" that then includes all the various cruft that the community has generated.
Received on Monday, 11 April 2011 17:50:42 UTC