SemWeb use case for issue httpRange-14

The question here in the neighborhood of httpRange-14 [0] is which mental
models of the web the TAG should recommend.  In particular, how should
it recommend people think about the relationship between http URIs and
the bytes transmitted during HTTP protocol sessions?  It's fairly
clear at some low levels how the protocol works, but even within the
TAG, people use different higher level abstractions when thinking
about what the bytes do or should mean.  So far, the TAG has not
reached consensus [1].  People look elsewhere for guidance, or privately
figure out something that works well enough for their own situation.

In this immediate discussion [3] we have two proposed high-level
abstractions.  In the interest of fairness and ego-separation, I'll
name them using "od -d -N 1 < /dev/random":

 Abstraction 102: An http URI is most strongly associated with
 something in some domain of discourse (the problem domain) like a
 person, place, or thing.  When you GET that URI, the bytes returned
 are a representation [4] of that thing.

 Abstraction 33: An http URI is most strongly associated with a
 repository or collection of information.  When you GET that URI, the
 bytes returned convey that information.

Neither of these have much bearing on web architecture in general.
Statements like Roy's [5]

   The reason we call it a URI that identifies a resource, rather than
   a UDI that identifies a document, is because we want a URI to
   reference things in the future -- to point to a source of future
   useful things.  That's what resource means.  It is therefore
   impossible to "retrieve" a resource, since the fact that it is
   available "over there" is an essential part of it being a resource;
   the resource remains over there, so the only thing that is
   retrieved is an instantaneous representation of the resource at the
   point in time at which it was generated by the origin. 

doesn't look very different in #33 terms from how it looks in #102's
terms.  The real distinction is just whether you focus on the
collection of information from which the web server generates its
response or focus past the server, as Roy does, on the subject of that
information.

It's tempting to go into why I like #33, and further debate the merits
and shortcomings of these two ways of thinking about HTTP (or all the
other possible abstractions), but let's not.  Instead, let's think
about what would constitute a good one.  To know that, we'd have to
know how this abstraction will be used.....  What problems would an
answer to httpRange-14 help with?  Are there any use cases, or is this
all just idle philosophy?

I have an application.  

   I want people to publish data on the web, and I want them to do so
   in a format that is seriously self-describing, so that others can
   understand and re-use the data with minimal hassle. SGML and XML
   were great; they let people pick meaningful tag names, and that was
   a start, but I want to go farther.  How about if users could just
   click on each tag (in some source view) to get to its full
   documentation, examples, support software, discussion list
   archives, etc, ...?  That would be nice!  [ Stop me if you've heard
   this before. :-) And don't you dare say "Hey Sandro, you should
   look into this 'semantic web' stuff.]  Maybe some software could
   even do its own kind of "clicking" on the links to download
   information about how to check the data, or display it nicely, or
   translate it into other formats I might want.  Then I could just
   fetch some data from a dozen different sources and tell my computer
   to turn them all into a format I like, and merge them while it's at
   it. 

   Oh, and of course there will be lots of data *about* the web.  I
   want to share bookmarks, blog feeds, web access control databases,
   etc.  So our data format will use the web in two ways: people will
   have data about websites and stuff (all the things they bookmark
   and blog about), AND the format itself will have links for each of
   its tags, linking to documentation, software, etc, about the tags. 

   Oh, and maybe we could use the web somehow to help us link more of
   the data.  I've got a "friends" list with a bunch of people on it,
   and so does my friend Matt.  Any friend of his is a friend of mine,
   so maybe we can merge our databases?  If we could just agree on
   what database key to use in identifying people, that would help a
   lot.  We just need some way to pick a string which unambiguously
   identifies a person.... 

So that's the set up.  The test case will need to be much more precise
before we can really see the difference between how 102 and 33 work.

The Model and Syntax:
  
  We'll use the RDF model, where information is conveyed as
  subject-property-value triples.  What we called "tags" above turn
  into just more objects, usually in the property role.  We will set
  aside how RDF uses URIs, for now, because that's what we're trying
  to settle.

  For syntax we'll use N-Triples, but instead of terms like <uri>
  we'll use number<uri>, where the number lets us show how we're using
  URIs in different ways.

The Data:

  Sandro has a dog, Taiko.  (There's some text about this dog and
  photos of him at "http://www.drum.org/~natasha/pets/taiko.html".)
  Taiko is an Akita.  (You can read more about Akitas on the AKC site,
  at "http://www.akc.org/breeds/recbreeds/akita.cfm".)  Let's consider
  Akita a class of dogs (as the AKC site does), and for the moment
  simply use the syntax "rdf:type" to name the "class" property.  In
  other words, we want to say something like "Taiko rdf:type
  Akita", but we want "Taiko" and "Akita" to be clickable and
  mergeable.
 
  Let's also tell people that Taiko appears in the picture they can
  see on the web at "http://www.hawke.org/sandro/dogsmile.jpg".
  (We'll just assume a "depicts" property for now, and ignore the fact
  that I'm also in that picture.)

The #102 Version (A):

   Here we simply take existing web pages as usable representations of
   the things we want to talk about.

   102a<http://www.drum.org/~natasha/pets/taiko.html> rdf:type
                102a<http://www.akc.org/breeds/recbreeds/akita.cfm>.

   works fine for the first part, but how do we do the second part?
   We can't do

   102a<http://www.hawke.org/sandro/dogsmile.jpg> depicts
                  102a<http://www.drum.org/~natasha/pets/taiko.html>

   because "http://www.hawke.org/sandro/dogsmile.jpg" is not a
   representation of a picture.  The naming authority (me) intends
   instead that it represent how Sandro and Taiko feel about each
   other, and intent is what matters [3].   Maybe this issue is a red
   herring; the essential point is that sometimes we do want to talk
   about web pages, web sites, etc, and 102a<...> doesn't do that.

   Instead we'll use a string literal, a local term (read "_:pic"
   as "something herein called 'pic'"), and another predicate:

   _:pic webAddress "http://www.hawke.org/sandro/dogsmile.jpg".
   _:pic depicts 102a<http://www.drum.org/~natasha/pets/taiko.html>

   There is something which has a web address "...dogsmile.jpg" and
   depicts Taiko. 

The #102 Version (B, C, ...):

   I've heard suggestions of other 102-style approaches, involving
   Content-Location headers and such, but I don't know know the
   details.  If someone has a suggestion, go ahead.
   
The #33 Version:

   Here we introduce another property, linking web pages which have a
   single, primary subject, which the thing which is that primary
   subject.

   33<http://www.drum.org/~natasha/pets/taiko.html> primarySubject _:taiko.
   33<http://www.akc.org/breeds/recbreeds/akita.cfm> primarySubject _:Akita.
   _:taiko rdf:type _:Akita.
   33<http://www.hawke.org/sandro/dogsmile.jpg> depicts _:taiko.

So, to have have good Web Architecture, should RDF use 102a<...>,
33<...>, or what?

My personal suggestion [6] to the RDF community (which I encourage the
TAG to reiterate) uses a hybrid which largely mirrors current
practice.  The idea is to say that <...> means 33<...> (the web page)
if there is no "#" in the URI and 102<...> when there is a "#".  (As
above, I don't like the "intent" part of 102; I prefer an
external formulation like primarySubject, but it can be used like
102.)  Beyond this, people can use primarySubject and webAddress
explicitely for the less common other cases.  Let's call this odd
hybrid approach #89.  I wouldn't recommend it in general, but it's
good for RDF backward compatibility and brevity.

I'm now going to make this e-mail much MUCH too long by mentioning
another approach, which /dev/random calls #130.  130<...> is 33<...>
*or* 102<...>, and you can't tell which from looking at it.  You need
to use some other data.   By my reading, this is what RDF uses today,
and I'm not very fond of it.    RDF today also tries to leverage
media-types at the far end of the link, but I think that's a terrible
idea.

   -- sandro

[0] http://www.w3.org/2001/tag/ilist#httpRange-14
[1] http://lists.w3.org/Archives/Public/www-tag/2002Dec/0262
[2] http://lists.w3.org/Archives/Public/www-tag/2002Dec/0243
[3] http://lists.w3.org/Archives/Public/www-tag/2002Dec/0257
[4] I think the intended meaning of "representation" is
    wordnet's sense #2:
    http://www.cogsci.princeton.edu/cgi-bin/webwn1.7.1?stage=2&word=representation&posnumber=1&searchtypenumber=2&senses=2&showglosses=1
[5] http://lists.w3.org/Archives/Public/www-tag/2002Jul/0253
[6] http://lists.w3.org/Archives/Public/www-rdf-interest/2002Dec/0125

Received on Tuesday, 31 December 2002 13:39:40 UTC