- From: Jonathan Rees <jar@creativecommons.org>
- Date: Tue, 2 Nov 2010 14:18:46 -0400
- To: Larry Masinter <masinter@adobe.com>
- Cc: "www-tag@w3.org" <www-tag@w3.org>
On Tue, Nov 2, 2010 at 11:38 AM, Larry Masinter <masinter@adobe.com> wrote: > This idea has been bouncing around for such a long time, > but I updated the document > > http://tools.ietf.org/html/draft-masinter-dated-uri-07 > > based on comments. > > While this isn't posed as a "TAG" submission, since the > TAG has been discussing persistence for a long time, > are there any changes you think I should make (references, > discussions, etc.) I should make before asking for this > to be published? > > Larry > -- > http://larry.masinter.net A few comments: (your draft is indented, my comments outdented) I hope that you will be coordinating with others who are working on similar issues. Somewhere you need to include a warning that two clients can observe completely different content for the same resource, at exactly the same time. Time is not adequate to "identify" anything about what anyone actually observed since it potentially depends on all the details of the observation (IP address, cookies, which physical server responded, etc.). All a DURI really says is that someone observed something at the given URI at a certain time - and this has to be taken on trust. Network Working Group L. Masinter Internet-Draft Adobe Intended status: Informational October 22, 2010 Expires: April 25, 2011 The 'tdb' and 'duri' URI schemes, based on dated URIs draft-masinter-dated-uri-07 Abstract This document defines two URI schemes. The first, 'duri' (standing for "dated URI"), allows indicating a URI as of a particular date (and time). It is not the URI that is "of a particular date". Rather it is (according to your URI/resource theory articulated in RFC 3986) the binding of the URI to a resource and the condition (state, whatever) of the resource to which the URI is bound. I think you should say "indicating a resource as of a particular date" since that can be read as covering both bases. This allows explicit reference to the "time of retrieval", similar to the way in which bibliographic references containing URIs are used. I would say "are written", not "are used". While many people will know what you're talking about, some of your audience won't. You need to either remove the "similar to" or expand on it. The second scheme, 'tdb' ( standing for "Thing Described By"), provides a way of using a way of minting URIs for anything that can be described, Please don't propagate this "anything that can be described" meme. It's a silly and meaningless distinction. Just say "anything" or "any resource". with the ability to fix the description to a given date or time. I would not put the time-binding feature directly into tdb:, but rather use an orthogonal design that uses the time-binding ability of the subject URI. That is, write tdb:duri:... and put the time in the DURI. Then, if the resource is already dated (via URI scheme, or some other "persistent" method) there's no need to repeat the time in the tdb:. This orthogonality also simplifies the specification. The 'tdb' URI scheme may reduce the need to define define new URN namespaces merely for the purpose of creating stable identifiers for concepts or abstractions: it provides a ready means for identifying "non-information resources" by semantic indirection -- a way of creating a URI for anything. Have you checked this hypothesis out with anyone who has created or is thinking of creating a URN namespace? Remember that URN namespaces give something that duri: and tdb: don't, which is a registry (stored in the RFC corpus) of namespace definitions maintained by IETF. As this is the primary value proposition of URNs, I think it unlikely that a URN customer would forego it. 1. Overview and Requirements The URI schemes defined here address several related problems: 1.1. Persistent identifiers [RFC1737] defines several requirements for Uniform Resource Names. In particular, it requires "persistence": Persistence: It is intended that the lifetime of a URN be permanent. That is, the URN will be globally unique forever, and may well be used as a reference to a resource well beyond the lifetime of the resource it identifies or of any naming authority involved in the assignment of its name. Interesting, hadn't heard that definition attached to that term before Many people have wondered how to create globally unique and persistent identifiers. There are a number of URI schemes and URN namespaces already registered. However, an absolute guarantee of both uniqueness and persistence is very difficult. Nay, impossible. In some cases, the guarantee of persistence comes through a promise of good management practice, such as is encouraged in "Cool URLs don't change" [COOL]. However, relying on promise of good management practice is not the same as having a design that guarantees reliability independent of actual administrative practice. Adhering to the design would itself be an "administrative practice". This is a difference of degree, not of kind. Your point is a good one but it needs to be said in a way that does not oversell it. A primary design goal for URIs is that they are intended to mean the same thing, no matter in what context they appear: a "Uniform" way to Identify a Resource. However, even when URIs have Uniform meaning from the point of view of the source of the reference, they don't guarantee stability over time. Despite best efforts and intentions, identifying information can change in unpredictable ways: domain names can disappear or be reassigned, name assigning organizations can change structure, responsibility, disappear, merge, or change in unpredictable ways. Again, the point is good, but it could be expressed better. You don't define "uniform meaning" or "stability" or "identifying information". I don't know what you mean by "from the point of view of the source" -- are you referring to Humpty Dumpty's view that a word means precisely what he wants it to mean? I think that again you are trying to simplify by sidestepping the 3986 factoring of URI -> representation into URI -> resource -> representation. Simplification is good, but in this case it will just amplify the current confusion around this point. Maybe there is a way to replace 3986 with a simpler theory and eliminate the intermediary "resource", but this document is not the place to do it. (I'm not suggesting you use the term "representation", or not.) The way to go with duri: is to say that the duri: "identifies" a resource that is the condition (or state) of the subject resource at the given time. duri:T:X is this resource: "What the resource identified by 'X' at time T was like at time T". The dual "at time T" is needed because there are two time-varying mappings at play, one from URI to resource and another from resource to "representation" (or other time-dependent properties, for non-"informationresources"). There is a significant dependence in the interpretation of many URNs with the concept of "naming authority". The authority is presumably some individual or organization both to insure uniqueness of assignment and also to help with understanding the meaning of the link between the name and the named. You are doing a disservice to URNs by talking of name assignment authority and name resolution service (not authority) in the same breath. These are orthogonal functions. At the birth of a namespace the functions are often carried out by the same organization, and this is why people tend to conflate the two ideas. But the distinction is essential, I think, to the nature of URNs, so it should be celebrated, not glossed. However, authorities, whether individuals or organizations, have a That should be "resolution services", not "authorities". lifetime, and must be consulted at some point to understand the bindings. The functioning of names as unique identifiers and holders of meaning depends on having a reliable infrastructure of consulting the authority or the authorities records to determine the thing referenced. Masinter Expires April 25, 2011 [Page 4] Internet-Draft The 'tdb' and 'duri' URI schemes October 2010 1.2. URIs for abstractions "Abstractions" is a terrible ontological category. What's an example of something that's not an abstraction? I think the distinction you're looking for is - in the words of the passage you quote below - "resources not accessible via the Internet". Just say that instead. The description of URIs [RFC3986] describes a range for 'Resource' that is quite broad: This specification does not limit the scope of what might be a resource; rather, the term "resource" is used in a general sense for whatever might be identified by a URI. Familiar examples include an electronic document, an image, a source of information with a consistent purpose (e.g., "today's weather report for Los Angeles"), a service (e.g., an HTTP-to-SMS gateway), and a collection of other resources. A resource is not necessarily accessible via the Internet; e.g., human beings, corporations, and bound books in a library can also be resources. Likewise, abstract concepts can be resources, such as the operators and operands of a mathematical equation, the types of a relationship (e.g., "parent" or "employee"), or numeric values (e.g., zero, one, and infinity). One might use a URI such as "mailto:" email address to identify a person, You cannot use "mailto:" to identify a person without contradicting the "mailto:" URI scheme registration. Please don't suggest this. or a "http:" URI to identify an abstract comment. However, this leaves the question of how one might identify, within the same context, both the system mailbox and the person to which it is assigned, or the web page at a http URI and the concept it describes. Not "concept" - again web pages can describe anything, not just concepts (whatever those are!). How about "and what it describes" or "the entity it describes" or "the thing it describes" or "the resource it describes". The 'tdb' URI scheme allows ready assignment of URIs for abstractions that are distinguished from the media content that describes them. Not "abstractions". What you are saying is that the "media content" describes something - not necessarily an abstraction - and the 'tdb' scheme allows ready assignment of a URI to whatever it describes (which is not the media content except in pathological situations). The goal, then, of the 'tdb' URI scheme is to provide a mechanism which is, at the same time: permanent: The identity of the resource identified is not subject to reinterpretation over time. Well, anything written by anyone is subject to reinterpretation, that's just the way it goes. But I guess your intent is clear. Why not just say that the URI is not subject to reinterpretation over time, and skip the identity/identified bit? The interpretation *is* what's identified, right? explicitly bound: The mechanism by which the identified resource can be determined is explicitly included in the URI. I don't get this. If an http: URI yields representations, then the *server* knows what the resource is, but it's unlikely any client does - all they know is a couple of representations that they happened to get; and if the server has been offline for ten years, how can *anyone* determine what the resource is? I think you mean to be talking about objectivity: what's meant by the DURI, or how it's to be interpreted, is explicit in the DURI. useful for non-networked items: Allows identification of resources outside the network: people, organizations, abstract concepts. no administration: The mechanism does not depend on reliable administrative processes of authorities for either assignment or interpretation. other than adherence to this RFC, that is. 2. Syntax ... 3. Semantics 3.1. 'duri' Semantics It is traditional in convention references and citations in printed conventional works to include the date of publication; this practice serves the important purpose that the context of the naming can be determined. Since the context can't necessarily be determined, how about "important purpose of determining the context of the naming". Although determining anything about the context other than the time, if that's possible at all, would require investigative work. The meaning of a 'duri' URI is "the resource that was identified by the <encoded-URI> (after hex decoding) at the date(time) given". If one resource can have one representation at one time, and a different representation at a future time, as is permitted by 3986, then this is not good enough for your purposes. You also want to say that it's the resource as it was at that time. See above. For example, "duri:2001:http://www.ietf.org" is a persistent identifier to "http://www.ietf.org" as of 2001. A 'duri' URI may not be a resource locator in a practical sense: the time of location has not yet arrived or has passed. "may not" => "is not necessarily" Not sure what you mean by "time of location". I would say something like "the binding and/or condition may not yet..." 3.2. 'tdb' Semantics The 'tdb' URI scheme is intended to be useful for describing entities, concepts, abstractions, and other items which may not themselves be network accessible resources, but have been at some point described by network accessible resources. Re "describing" please don't introduce yet another near miss for "identify" (term of art) - we already have "name" and "designate". Also you're introducing "item" as a new near-synonym for "resource" or "thing". A 'tdb' URI is intended to be used where the <encoded-URI> identifies a 'document' (something a person could read, peruse, understand) or a fragment thereof, where the document describes some thing or concept. Concepts are not things? That suggests there are *other* things that aren't things, too... How about just "describes some thing" or described something" or "describes some resource". The 'tdb' URI itself then identifies the subject of that document. It is common practice to give a reference for a concept by including a pointer to a document, segment, phrase that defines the concept; 'tdb' attempts to capture this practice in URI space. What is the relation between concepts and resources? What if the document has more than one subject / defines more than one concept? For example, one might use "tdb:2008:http://www.ietf.org" as a persistent identifier for the Internet Engineering Task Force, as described by the "http://www.ietf.org" in 2008. I prefer tdb:duri:2008:http://www.ietf.org The 'tdb' URI scheme differs from other URI or URN methods for identifying abstractions because the designation of what is actually identified by the 'tdb' doesn't depend on knowing the intention of the "assigner" of the identifier. Unlike "tag", "info", "cid", "mid" or related schemes, the identification is not dependent on the context of use. The 'tdb' URI scheme can be thought of as giving a Masinter Expires April 25, 2011 [Page 7] Internet-Draft The 'tdb' and 'duri' URI schemes October 2010 way to invoke a level of semantic indirection to URI resolution. While one could imagine using 'tdb' without a date, it would leave the possibility that a reference that is unambiguous at one time might become ambiguous at some other time. There are two ways that the date is useful for 'tdb' URIs: it fixes the time of access of the resource, for variable descriptions, and it fixes the time of interpretation, for descriptions whose meaning (in natural language) might vary. Why is it that tdb: fixes the time of interpretation, but duri: doesn't? If I know the date of a document's publication, I will do my best to interpret the document in the way that it would have been interpreted around that time. Whether this needs to be specified by the URI scheme, or is just common sense, is not clear. 3.3. Timestamp Semantics It is traditional in convention references and citations in printed conventional works to include the date of publication; this practice serves the important purpose that the context of the naming can be determined. While one could imagine using 'tdb' without a timestamp, it would leave the possibility that a reference that is unambiguous at one time might become ambiguous at some other time. There are two ways that the date is useful for 'tdb': it fixes the time of access of the resource, for variable descriptions, and it fixes the time of interpretation, for descriptions whose meaning (in natural language) might vary. While normally, in a literary work in natural language which makes a reference to another work, both the reference itself and the work referenced are dated, e.g., a footnote in an article written in 1967 might talk about a "private communication" which itself had a date. The difference between a URI and a conventional literary reference is the desire to be able to extract the URI from its context and still retain its meaning. The meaning of a timestamp is the interval specified by the granularity of the time range indicated, in the UTC time zone, as described in [RFC3339]. If necessary, timestamps can include times and even fractional times, so that a generator of 'duri' or 'tdb' URIs can be arbitrarily precise. If there is any ambiguity of the resource within the range of time indicated (for example, if the timestamp consists only of a year, and the resource changes over the course of the year), then the resource state as of the very last instant of the range indicated should be used. Timestamps are allowed to be specified with as much precision as needed. This keeps most 'duri' and 'tdb' URIs relatively short. Masinter Expires April 25, 2011 [Page 8] Internet-Draft The 'tdb' and 'duri' URI schemes October 2010 4. Use as a Locator A 'duri' URI is not directly useful as a resource locator, since many resources vary their content over time. A 'tdb' URI is not a resource locator in a practical sense, since it explicitly requires human interpretation. However, it allows one to know that a resource was described at some point in time; whether the description is still available, or whether that description is still meaningful, is not guaranteed. The resource needn't have been described or even observed at the indicated time. All you know is what you've already said above, that the reference or description relates to the condition of a resource at a particular time, where the resource in question was the one that at that time was "identified" (or "described") by the subject URI. ... One might consider using 'tdb' with a "data" URI to designate concepts that can be described uniquely briefly inline. For example, tdb:2001:data:,The%20US%20president names the concept described by the (text/plain) string "The US president" at the very last instant of 2001. The president is a concept? Of course, this practice is only useful if the referent of the data is (or was at the time) completely unique. Since "data" does not contain a way to designate content-language, (hey, this is a bug, how about if we fix it?) the string in question would have to not be ambiguous as to its language. In the case of 'data', there is no assigning authority at all; the interpretation of the 'tdb' depend on the interpreting community. Although your RFC doesn't explicitly say what resources data: URIs "identify" or "locate", it seems pretty clear from all the examples that they identify things that are pretty close to strings. Therefore RFC 2397 *is* the assigning authority. Interpreting the data: URI as a string is straightforward; interpreting that string as something else has nothing to do with the data: URI scheme, so the comment about "no assigning authority" is confused. Using 'tdb' or 'duri' with an embedded 'urn:' might not seem to be too useful, tdb, very useful, why wouldn't it seem so? but it might be useful where the assignment of names in a URN namespace are not, in practice, permanent, or that one might want to refer to the assignment as of a given date. In this case, it is possible to use a "urn" within a 'duri', e.g., duri:2000:urn:ietf:std:50 might be used to refer to "the document that the IETF considered to be STD 50, as of the last instant of 2000". For 'tdb', many URIs identify resources which do not clearly describe anything at all. The "home page" for an organization isn't nearly as good a resource to use to describe an organization as the organization's "about" page. Depends on what the home page says, and what the about page says. But it is up to the minter of the 'tdb' URI to choose wisely. 6.2. Useful timestamps Timestamps far in the future are suspect, because the future content of a description resource cannot usually be reliably predicted. Timestamps which preceed the availability of the description resource should not be used either. For example, using a http URI with a timestamp before the description resource is also not recommended. However, although these practices are not recommended, there is no assurance that they haven't been used; by itself, a 'tdb' URI by itself does not constitute an assertion that the description resource was available or assigned at the date specified. Note that the use of the "very last instant" allows for the conventional bibliographic convention that a work published in 2009 can use "2009" as the date string, to refer to the work in the year of publication. Masinter Expires April 25, 2011 [Page 10] Internet-Draft The 'tdb' and 'duri' URI schemes October 2010 6.3. Free assignment Because of the many possible schemes that can be used in the <encoded-URI> portion, there should be no difficulty in almost any computational process being able to assign 'duri' or 'tdb' URIs at will. Of course, it is necessary for there to be some resource which is available at some point in time, and to have a clock which is accurate to the granularity of the frequency of assignment. 6.4. Resolution There are no direct resolution servers or processes for 'duri' or 'tdb' URIs. However, a 'duri' URI might be "resolvable" in the sense that a resource that was accessed at a point in time might have the result of that access cached or archived in an Internet archive service. See, for example, the "Internet Archive" project [archive]. And a 'tdb' URI is "resolvable" in the sense that the description resource can be accessed and interpreted. Clients without access to an Internet archive service might take the decoded <encoded-URI> of a 'duri' and attempt resolution of *that* identifier. This will give an approximation whose reliability depends on the what has happened in the time since the date indicated. 6.5. Why Names with Semantics? There are a number of URI and URN schemes that create otherwise unbound "names", where the scheme only provides for uniqueness, with some other agent or process or context providing the authority to interpret the meaning of the identifier at some point in the future. 'duri' and 'tdb' is different, in that it is the agreement between are the describer (the agent creating the URI) and the receiver of the URI (the agent interpreting the URI) to agree upon the semantics without any reference to any third party. 6.6. Avoiding MetaData One might consider the timestamp in a 'duri' or 'tdb' URI to be just one piece of additional metadata about the URI, and consider adding other pieces of metadata as annotation. However, the use of the timestamp is intended primarily as a mechanism of accomplishing uniqueness over time. No other bit of metadata or description readily fills that purpose. Further, the date is not descriptive (an assertion about the URI) but merely refining. Masinter Expires April 25, 2011 [Page 11] Internet-Draft The 'tdb' and 'duri' URI schemes October 2010 6.7. Avoiding 'duri' and 'tdb' Many applications of URIs already provide a context of timestamp. For example, one could imagine a hypertext system where the URIs contained within a document were intended to refer to the resources as of the date of the enclosing document. This would be a reasonable interpretation of URIs within an Internet archive system, for example. Some applications of URIs already implicitly use the level of interpretive indirection that is explicit with 'tdb', For example, within an ontology language definition, the URIs used for abstract concepts, individuals and so forth are generally considered the "thing described by" the URI. In addition, the 'application/rdf+xml' Media Type [RFC3870] uses the fragment identifier resolution as an explicit way of identifying abstract concepts that are described by an RDF document. "Abstract concept" is too limiting as RDF is used for concrete non-concepts as well. 6.8. 'tdb' and levels of indirection The 'tdb' scheme introduces a level of semantic indirection. The puzzles and confusions about use and mention, name and reference, and levels of indirection have been puzzling and amusing for quite a while. "It's long," said the Knight, "but it's very, very beautiful. Everybody that hears me sing it--either it brings tears into their eyes, or else--" "Or else what?" said Alice, for the Knight had made a sudden pause. "Or else it doesn't, you know. The name of the song is called 'Haddock's Eyes.'" "Oh, that's the name of the song, is it?" Alice said, trying to feel interested. "No, you don't understand," the knight said, looking a little vexed. "That's what the name is called. The name really is 'The Aged Aged Man.'" "Then I ought to have said 'That's what the song is called'?" Alice corrected herself. "No, you oughtn't: that's quite another thing! The song is called 'Ways and Means': but that's only what it's called, you know!" "Well, what is the song, then?" said Alice, who was by this time completely bewildered. "I was coming to that," the Knight said. "The song really is 'A-sitting On A Gate': and the tune's my own invention." [LOOK] I don't see what this section contributes. As an alternative I would suggest giving a reference to any useful exposition of the difference between an utterance and the meaning of an utterance, such as duri:20101102:http://en.wikipedia.org/wiki/Use%E2%80%93mention_distinction
Received on Tuesday, 2 November 2010 18:19:17 UTC