- From: Sandro Hawke <sandro@w3.org>
- Date: Tue, 31 Dec 2002 13:35:45 -0500
- To: www-tag@w3.org
The question here in the neighborhood of httpRange-14 [0] is which mental models of the web the TAG should recommend. In particular, how should it recommend people think about the relationship between http URIs and the bytes transmitted during HTTP protocol sessions? It's fairly clear at some low levels how the protocol works, but even within the TAG, people use different higher level abstractions when thinking about what the bytes do or should mean. So far, the TAG has not reached consensus [1]. People look elsewhere for guidance, or privately figure out something that works well enough for their own situation. In this immediate discussion [3] we have two proposed high-level abstractions. In the interest of fairness and ego-separation, I'll name them using "od -d -N 1 < /dev/random": Abstraction 102: An http URI is most strongly associated with something in some domain of discourse (the problem domain) like a person, place, or thing. When you GET that URI, the bytes returned are a representation [4] of that thing. Abstraction 33: An http URI is most strongly associated with a repository or collection of information. When you GET that URI, the bytes returned convey that information. Neither of these have much bearing on web architecture in general. Statements like Roy's [5] The reason we call it a URI that identifies a resource, rather than a UDI that identifies a document, is because we want a URI to reference things in the future -- to point to a source of future useful things. That's what resource means. It is therefore impossible to "retrieve" a resource, since the fact that it is available "over there" is an essential part of it being a resource; the resource remains over there, so the only thing that is retrieved is an instantaneous representation of the resource at the point in time at which it was generated by the origin. doesn't look very different in #33 terms from how it looks in #102's terms. The real distinction is just whether you focus on the collection of information from which the web server generates its response or focus past the server, as Roy does, on the subject of that information. It's tempting to go into why I like #33, and further debate the merits and shortcomings of these two ways of thinking about HTTP (or all the other possible abstractions), but let's not. Instead, let's think about what would constitute a good one. To know that, we'd have to know how this abstraction will be used..... What problems would an answer to httpRange-14 help with? Are there any use cases, or is this all just idle philosophy? I have an application. I want people to publish data on the web, and I want them to do so in a format that is seriously self-describing, so that others can understand and re-use the data with minimal hassle. SGML and XML were great; they let people pick meaningful tag names, and that was a start, but I want to go farther. How about if users could just click on each tag (in some source view) to get to its full documentation, examples, support software, discussion list archives, etc, ...? That would be nice! [ Stop me if you've heard this before. :-) And don't you dare say "Hey Sandro, you should look into this 'semantic web' stuff.] Maybe some software could even do its own kind of "clicking" on the links to download information about how to check the data, or display it nicely, or translate it into other formats I might want. Then I could just fetch some data from a dozen different sources and tell my computer to turn them all into a format I like, and merge them while it's at it. Oh, and of course there will be lots of data *about* the web. I want to share bookmarks, blog feeds, web access control databases, etc. So our data format will use the web in two ways: people will have data about websites and stuff (all the things they bookmark and blog about), AND the format itself will have links for each of its tags, linking to documentation, software, etc, about the tags. Oh, and maybe we could use the web somehow to help us link more of the data. I've got a "friends" list with a bunch of people on it, and so does my friend Matt. Any friend of his is a friend of mine, so maybe we can merge our databases? If we could just agree on what database key to use in identifying people, that would help a lot. We just need some way to pick a string which unambiguously identifies a person.... So that's the set up. The test case will need to be much more precise before we can really see the difference between how 102 and 33 work. The Model and Syntax: We'll use the RDF model, where information is conveyed as subject-property-value triples. What we called "tags" above turn into just more objects, usually in the property role. We will set aside how RDF uses URIs, for now, because that's what we're trying to settle. For syntax we'll use N-Triples, but instead of terms like <uri> we'll use number<uri>, where the number lets us show how we're using URIs in different ways. The Data: Sandro has a dog, Taiko. (There's some text about this dog and photos of him at "http://www.drum.org/~natasha/pets/taiko.html".) Taiko is an Akita. (You can read more about Akitas on the AKC site, at "http://www.akc.org/breeds/recbreeds/akita.cfm".) Let's consider Akita a class of dogs (as the AKC site does), and for the moment simply use the syntax "rdf:type" to name the "class" property. In other words, we want to say something like "Taiko rdf:type Akita", but we want "Taiko" and "Akita" to be clickable and mergeable. Let's also tell people that Taiko appears in the picture they can see on the web at "http://www.hawke.org/sandro/dogsmile.jpg". (We'll just assume a "depicts" property for now, and ignore the fact that I'm also in that picture.) The #102 Version (A): Here we simply take existing web pages as usable representations of the things we want to talk about. 102a<http://www.drum.org/~natasha/pets/taiko.html> rdf:type 102a<http://www.akc.org/breeds/recbreeds/akita.cfm>. works fine for the first part, but how do we do the second part? We can't do 102a<http://www.hawke.org/sandro/dogsmile.jpg> depicts 102a<http://www.drum.org/~natasha/pets/taiko.html> because "http://www.hawke.org/sandro/dogsmile.jpg" is not a representation of a picture. The naming authority (me) intends instead that it represent how Sandro and Taiko feel about each other, and intent is what matters [3]. Maybe this issue is a red herring; the essential point is that sometimes we do want to talk about web pages, web sites, etc, and 102a<...> doesn't do that. Instead we'll use a string literal, a local term (read "_:pic" as "something herein called 'pic'"), and another predicate: _:pic webAddress "http://www.hawke.org/sandro/dogsmile.jpg". _:pic depicts 102a<http://www.drum.org/~natasha/pets/taiko.html> There is something which has a web address "...dogsmile.jpg" and depicts Taiko. The #102 Version (B, C, ...): I've heard suggestions of other 102-style approaches, involving Content-Location headers and such, but I don't know know the details. If someone has a suggestion, go ahead. The #33 Version: Here we introduce another property, linking web pages which have a single, primary subject, which the thing which is that primary subject. 33<http://www.drum.org/~natasha/pets/taiko.html> primarySubject _:taiko. 33<http://www.akc.org/breeds/recbreeds/akita.cfm> primarySubject _:Akita. _:taiko rdf:type _:Akita. 33<http://www.hawke.org/sandro/dogsmile.jpg> depicts _:taiko. So, to have have good Web Architecture, should RDF use 102a<...>, 33<...>, or what? My personal suggestion [6] to the RDF community (which I encourage the TAG to reiterate) uses a hybrid which largely mirrors current practice. The idea is to say that <...> means 33<...> (the web page) if there is no "#" in the URI and 102<...> when there is a "#". (As above, I don't like the "intent" part of 102; I prefer an external formulation like primarySubject, but it can be used like 102.) Beyond this, people can use primarySubject and webAddress explicitely for the less common other cases. Let's call this odd hybrid approach #89. I wouldn't recommend it in general, but it's good for RDF backward compatibility and brevity. I'm now going to make this e-mail much MUCH too long by mentioning another approach, which /dev/random calls #130. 130<...> is 33<...> *or* 102<...>, and you can't tell which from looking at it. You need to use some other data. By my reading, this is what RDF uses today, and I'm not very fond of it. RDF today also tries to leverage media-types at the far end of the link, but I think that's a terrible idea. -- sandro [0] http://www.w3.org/2001/tag/ilist#httpRange-14 [1] http://lists.w3.org/Archives/Public/www-tag/2002Dec/0262 [2] http://lists.w3.org/Archives/Public/www-tag/2002Dec/0243 [3] http://lists.w3.org/Archives/Public/www-tag/2002Dec/0257 [4] I think the intended meaning of "representation" is wordnet's sense #2: http://www.cogsci.princeton.edu/cgi-bin/webwn1.7.1?stage=2&word=representation&posnumber=1&searchtypenumber=2&senses=2&showglosses=1 [5] http://lists.w3.org/Archives/Public/www-tag/2002Jul/0253 [6] http://lists.w3.org/Archives/Public/www-rdf-interest/2002Dec/0125
Received on Tuesday, 31 December 2002 13:39:40 UTC