SemWeb use case for issue httpRange-14 from Sandro Hawke on 2002-12-31 (www-tag@w3.org from December 2002)

From: Sandro Hawke <sandro@w3.org>
Date: Tue, 31 Dec 2002 13:35:45 -0500
To: www-tag@w3.org
Message-Id: <200212311835.gBVIZj116644@wadimousa.hawke.org>

The question here in the neighborhood of httpRange-14 [0] is which mental
models of the web the TAG should recommend. In particular, how should
it recommend people think about the relationship between http URIs and
the bytes transmitted during HTTP protocol sessions? It's fairly
clear at some low levels how the protocol works, but even within the
TAG, people use different higher level abstractions when thinking
about what the bytes do or should mean. So far, the TAG has not
reached consensus [1]. People look elsewhere for guidance, or privately
figure out something that works well enough for their own situation.

In this immediate discussion [3] we have two proposed high-level
abstractions. In the interest of fairness and ego-separation, I'll
name them using "od -d -N 1 < /dev/random":

Abstraction 102: An http URI is most strongly associated with
something in some domain of discourse (the problem domain) like a
person, place, or thing. When you GET that URI, the bytes returned
are a representation [4] of that thing.

Abstraction 33: An http URI is most strongly associated with a
repository or collection of information. When you GET that URI, the
bytes returned convey that information.

Neither of these have much bearing on web architecture in general.
Statements like Roy's [5]

The reason we call it a URI that identifies a resource, rather than
a UDI that identifies a document, is because we want a URI to
reference things in the future -- to point to a source of future
useful things. That's what resource means. It is therefore
impossible to "retrieve" a resource, since the fact that it is
available "over there" is an essential part of it being a resource;
the resource remains over there, so the only thing that is
retrieved is an instantaneous representation of the resource at the
point in time at which it was generated by the origin.

doesn't look very different in #33 terms from how it looks in #102's
terms. The real distinction is just whether you focus on the
collection of information from which the web server generates its
response or focus past the server, as Roy does, on the subject of that
information.

It's tempting to go into why I like #33, and further debate the merits
and shortcomings of these two ways of thinking about HTTP (or all the
other possible abstractions), but let's not. Instead, let's think
about what would constitute a good one. To know that, we'd have to
know how this abstraction will be used..... What problems would an
answer to httpRange-14 help with? Are there any use cases, or is this
all just idle philosophy?

I have an application.

I want people to publish data on the web, and I want them to do so
in a format that is seriously self-describing, so that others can
understand and re-use the data with minimal hassle. SGML and XML
were great; they let people pick meaningful tag names, and that was
a start, but I want to go farther. How about if users could just
click on each tag (in some source view) to get to its full
documentation, examples, support software, discussion list
archives, etc, ...? That would be nice! [ Stop me if you've heard
this before. :-) And don't you dare say "Hey Sandro, you should
look into this 'semantic web' stuff.] Maybe some software could
even do its own kind of "clicking" on the links to download
information about how to check the data, or display it nicely, or
translate it into other formats I might want. Then I could just
fetch some data from a dozen different sources and tell my computer
to turn them all into a format I like, and merge them while it's at
it.

Oh, and of course there will be lots of data *about* the web. I
want to share bookmarks, blog feeds, web access control databases,
etc. So our data format will use the web in two ways: people will
have data about websites and stuff (all the things they bookmark
and blog about), AND the format itself will have links for each of
its tags, linking to documentation, software, etc, about the tags.

Oh, and maybe we could use the web somehow to help us link more of
the data. I've got a "friends" list with a bunch of people on it,
and so does my friend Matt. Any friend of his is a friend of mine,
so maybe we can merge our databases? If we could just agree on
what database key to use in identifying people, that would help a
lot. We just need some way to pick a string which unambiguously
identifies a person....

So that's the set up. The test case will need to be much more precise
before we can really see the difference between how 102 and 33 work.

The Model and Syntax:

We'll use the RDF model, where information is conveyed as
subject-property-value triples. What we called "tags" above turn
into just more objects, usually in the property role. We will set
aside how RDF uses URIs, for now, because that's what we're trying
to settle.

For syntax we'll use N-Triples, but instead of terms like <uri>
we'll use number<uri>, where the number lets us show how we're using
URIs in different ways.

The Data:

Sandro has a dog, Taiko. (There's some text about this dog and
photos of him at "http://www.drum.org/~natasha/pets/taiko.html".)
Taiko is an Akita. (You can read more about Akitas on the AKC site,
at "http://www.akc.org/breeds/recbreeds/akita.cfm".) Let's consider
Akita a class of dogs (as the AKC site does), and for the moment
simply use the syntax "rdf:type" to name the "class" property. In
other words, we want to say something like "Taiko rdf:type
Akita", but we want "Taiko" and "Akita" to be clickable and
mergeable.

Let's also tell people that Taiko appears in the picture they can
see on the web at "http://www.hawke.org/sandro/dogsmile.jpg".
(We'll just assume a "depicts" property for now, and ignore the fact
that I'm also in that picture.)

The #102 Version (A):

Here we simply take existing web pages as usable representations of
the things we want to talk about.

102a<http://www.drum.org/~natasha/pets/taiko.html> rdf:type
102a<http://www.akc.org/breeds/recbreeds/akita.cfm>.

works fine for the first part, but how do we do the second part?
We can't do

102a<http://www.hawke.org/sandro/dogsmile.jpg> depicts
102a<http://www.drum.org/~natasha/pets/taiko.html>

because "http://www.hawke.org/sandro/dogsmile.jpg" is not a
representation of a picture. The naming authority (me) intends
instead that it represent how Sandro and Taiko feel about each
other, and intent is what matters [3]. Maybe this issue is a red
herring; the essential point is that sometimes we do want to talk
about web pages, web sites, etc, and 102a<...> doesn't do that.

Instead we'll use a string literal, a local term (read "_:pic"
as "something herein called 'pic'"), and another predicate:

_:pic webAddress "http://www.hawke.org/sandro/dogsmile.jpg".
_:pic depicts 102a<http://www.drum.org/~natasha/pets/taiko.html>

There is something which has a web address "...dogsmile.jpg" and
depicts Taiko.

The #102 Version (B, C, ...):

I've heard suggestions of other 102-style approaches, involving
Content-Location headers and such, but I don't know know the
details. If someone has a suggestion, go ahead.

The #33 Version:

Here we introduce another property, linking web pages which have a
single, primary subject, which the thing which is that primary
subject.

33<http://www.drum.org/~natasha/pets/taiko.html> primarySubject _:taiko.
33<http://www.akc.org/breeds/recbreeds/akita.cfm> primarySubject _:Akita.
_:taiko rdf:type _:Akita.
33<http://www.hawke.org/sandro/dogsmile.jpg> depicts _:taiko.

So, to have have good Web Architecture, should RDF use 102a<...>,
33<...>, or what?

My personal suggestion [6] to the RDF community (which I encourage the
TAG to reiterate) uses a hybrid which largely mirrors current
practice. The idea is to say that <...> means 33<...> (the web page)
if there is no "#" in the URI and 102<...> when there is a "#". (As
above, I don't like the "intent" part of 102; I prefer an
external formulation like primarySubject, but it can be used like
102.) Beyond this, people can use primarySubject and webAddress
explicitely for the less common other cases. Let's call this odd
hybrid approach #89. I wouldn't recommend it in general, but it's
good for RDF backward compatibility and brevity.

I'm now going to make this e-mail much MUCH too long by mentioning
another approach, which /dev/random calls #130. 130<...> is 33<...>
*or* 102<...>, and you can't tell which from looking at it. You need
to use some other data. By my reading, this is what RDF uses today,
and I'm not very fond of it. RDF today also tries to leverage
media-types at the far end of the link, but I think that's a terrible
idea.

-- sandro

[0] http://www.w3.org/2001/tag/ilist#httpRange-14
[1] http://lists.w3.org/Archives/Public/www-tag/2002Dec/0262
[2] http://lists.w3.org/Archives/Public/www-tag/2002Dec/0243
[3] http://lists.w3.org/Archives/Public/www-tag/2002Dec/0257
[4] I think the intended meaning of "representation" is
wordnet's sense #2:
http://www.cogsci.princeton.edu/cgi-bin/webwn1.7.1?stage=2&word=representation&posnumber=1&searchtypenumber=2&senses=2&showglosses=1
[5] http://lists.w3.org/Archives/Public/www-tag/2002Jul/0253
[6] http://lists.w3.org/Archives/Public/www-rdf-interest/2002Dec/0125

Received on Tuesday, 31 December 2002 13:39:40 UTC