Address meanings, not contents! (Re: Storm blocks and metadata) from Reto Bachmann-Gmuer on 2003-03-27 (www-rdf-interest@w3.org from March 2003)

From: Reto Bachmann-Gmuer <reto@gmuer.ch>
Date: Thu, 27 Mar 2003 17:10:15 +0100
To: Benja Fallenstein <b.fallenstein@gmx.de>
Cc: GZZ developers <gzz-dev@nongnu.org>, www-rdf-interest@w3.org
Message-Id: <98A5A1A6-606E-11D7-BB42-003065CDBE5C@gmuer.ch>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Benja

> It is necessary for the interpretation of the data we get; and it's 
> usually easy to agree on (people won't too often assign different mime 
> types to the same bytes). One thing about content hashes is, when two 
> people put the same file into a hash-based system, they will use the 
> same identifier for it. With MIME types, that's still pretty much 
> true; with more elaborate metadata, it isn't.
I certainly wouldn't argue to put even more metadata in the URI.
> Using the same identifier is important for queries like, "Which 
> documents include this image?" If the three documents that use the 
> image use three different kinds of IDs for it (because they refer to 
> three different kinds of metadata), you're out of luck.
In the common sense meaning of the question "Which documents include 
this image?", "this image" is not defined by the sequence of bytes that 
make up a specific jpeg version of "this image" but rather by a 
specific visual representation of a thing. Giving an URI to the image 
(in the defined, encoding independent common sense meaning) itself and 
referencing this URI rather than the URI of the byte-sequence wherever 
possible allows answering queries that are closer to our real world 
understanding of things (what is concrete for us, is fairly abstract 
for the computer, computers deal with abstractions over the raw data to 
get the stuff non mathematicians can deal with, this 
"abstraction-process" is to be pushed further to get the semantic web).
By the way mime-type isn't so unambiguous, e.g. a text using only a 
restricted set of characters may be encoded to the same sequence of 
bytes using different encodings.

> (...)
>> Higher level applications should not use block-uris anyway but deal 
>> with an abstraction representing the content (like http urls should).
> You mean as in, with content negotiation applied? You use a single URI 
> which maps to different representations of the same resource?
You name it, the *same* resource. (But each representation is also a 
resource itself).

>
>> An example to be more explicit:
>> <urn:urn-5:G7Fj> <DC:title> "Ulisses"
>> <urn:urn-5:G7Fj> <DC:decription> "bla bli"
>
> This, for example, I would not include here. :-) Firstly, it is 
> something I would want to be versioned independently: if I change the 
> description of an image, that should not create a new version of the 
> image.
Surely not! Where I used literal in the examples one could use a uri 
representing the meaning of "bla bli", an attribute value of this URI 
would then be a URI for the english expression of that meaning, an 
attribute of this URI would be an URI representing this expression 
spoken by John, an attribute of this URI would be a byte storm-block 
with the mp3 encoding of it.
I think you need a generic versioning system for rdf statement rather 
than for the data, later statement must have a mean to put earlier 
statement out of the graph (while the older still should be accessible 
in the style of the reification "i used to believe (s p v)"

> Secondly, I don't see a reason why the URI of the image would need to 
> refer to this.
me neither ;-). There must be a misunderstanding here.
> Thirdly, I don't think that when a file is put into the system-- and 
> thus given its identifier-- is necessarily the time to create this 
> kind of metadata. It would seem to hold up the task at hand. Rather, 
> I'd like to be able to add it later on, and maybe someone else can do 
> that even better than me-- like a librarian who has scientific 
> background in giving metadata about stuff.
Of course. Mechanisms of the application should probably add some 
metadata that give the user a chance to find the data later, but there 
should always be the possibility to enter a new version of the metadata.

> (...)
>
>> In this example application should reference "urn:urn-5:G7Fj" (which 
>> does not have a mime type) rather than "urn:content-hash: 
>> Dj&/fjkZRT68" (which has a mime type in a specific context) wherever 
>> possible, in many cases a higher abstraction "urn:urn-5:lG5d" can be 
>> used .
>
> Um, using a urn-5 doesn't work since it's just a random number-- if we 
> use just a random number, we cannot check whether the data we may 
> retrieve from a p2p network is really what the person making the 
> reference wanted us to see. We would need to use "urn:foo:ref:[blah]", 
> which would be the above RDF data, from which we could then get the 
> specific representation.
The urn-5 URIs are intended to reference a certain 
concept/idea/meaning/topic, peoples are free to associate attributes to 
existing URIs. They may be subject to  change  like terms in natural 
language are, if somebody wants to use a term in a specific sense she 
has to make this explicit, maybe using digital signature stuff, but 
more often I think a key free trust system 
(http://www.w3.org/2002/03/key-free-trust.html) is not only enough, but 
more adapted to "fuzzy" trust levels in a P2P network.

>> While you can only deficiently use http to server a block,
>
> Why?
The only http-header you can send back is the length and if you put it 
in the URI the content-type, most http features are unused.

>> you could server the uri of both the abstractions (urn:urn-5:G7Fj and 
>> urn:urn-5:lG5d) directly using http 1.1.features.
> (Again, you'd have to use hashes, or you could be arbitrarily spoofed.)
(Again. No good networking without trust mechanisms ;-)

> (...)
>>>> And how do you split the metadata in blocks
>>>
>>> Well, depends very much on the application. How do you split 
>>> metadata into files? :-)
>> Not at all ;-). The splitting into file is rudimentary represented 
>> meta-data, if you use RDF the filesystem is a legacy application.
>
> Um, but if you put metadata on an http server, you split it too?
My approach would be to split the data just in time. To make it 
accessible over http a standard request the server could return all the 
statements where a specific URI occurs, or only where it is the 
subject. An extended request could contain the level of expansion 
requested.

> (...)

Cheers,
Reto
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (Darwin)

iD8DBQE+gyJtD1pReGFYfq4RAgiFAKCEEvE6v/NwTl1ebjge5YPx9UAtqACgqXvF
RpcbVqiDuvMrGt9ReDMGZLI=
=TRAL
-----END PGP SIGNATURE-----
Received on Thursday, 27 March 2003 11:10:47 UTC