Re: Uniform access to descriptions

At 10:12 AM -0400 4/12/08, Tim Berners-Lee wrote:
>On 2008-04 -12, at 06:08, Xiaoshu Wang wrote:
>
>>
>>Darn and thanks, Pat. I wish my English is that good.
>>
>>Xiaoshu
>>
>>Pat Hayes wrote:
>>>Reading this exchange (below), I think I might be able to make 
>>>Xiaoshu's case for him. (Xiaoshu, if I have misrepresented you at 
>>>all, please forgive (and correct) me. But I got to this point from 
>>>your recent emails (on and off list), so even if Im wrong, you 
>>>have to bear some of the responsibility :-)
>>>
>
>
>Ok, thanks Pat, Whether or not you were successful in representing 
>what Xiaoshu meant, you have put the argument on the table.

Actually there are two separate arguments. One is about terminology 
(awww:represent vs. represent), and the other is more substantial. 
I'll try to keep them separated.

>
>>>The central point is that now that we have the technology and 
>>>ideas of the semantic web available, we have a wider range of ways 
>>>of representing, and a richer notion of what words like 
>>>''metadata" mean. If we are willing to take fuller advantage of 
>>>this new richness, we make available new ways to do semantic 
>>>things within the same overall design of the pre-semantic web.
>>>
>>>In particular,  awww:represents is a very narrow sense of 'represents'.
>
>Well, it is a specific part of the architecture which is now well 
>defined.  It is a technical term.

<terminology>
I only mentioned this in order to motivate the later possible 
generalization. But for the record, 'representation' already was a 
technical term which was well defined long before awww. I was writing 
papers about knowledge representation before the Internet was 
invented.
</terminology>

>>>Perhpas we can allow a wider sense of representation here.
>
>I'd prefer you to use a different term.   We have tried, with Pat's 
>guidance, to use terms like 'denote' in ways that the philosophical 
>community which came before the AWWW would be happy.  But 
>'representation' in the AWWW is used a technical sense, as 'Packet' 
>in  the Internet Protocol. It is part of a technical design, and we 
>are not free to take it in a wider sense without doing a great 
>disservice to the community.

<terminology>
There simply is no other word that will do. And the size, history 
and, I'm sorry, but scholarly and intellectual authority of the 
community which uses a wider sense of 'represent' so greatly exceeds 
the AWWW community that I don't think you can reasonably claim 
possession of such a basic and central term for such a very narrow, 
arcane and special (and, by the way, under-defined) sense. The 
analogy with 'packet' is inappropriate, unless you can point me to 
some large community, preferably of philosophers, which used 'packet' 
with a wider technical sense for, say, a century before 
packet-switching was invented. I've already conceded "entity" to XML, 
for God's sake.
</terminology>

>>>The REST story was always that URIs/ identify/ resources, and that 
>>>the http response is a/ representation/ of the resource. Nobody 
>>>has ever been able to say what exactly counts as a 'resource'.
>
>No one can ever, in English, say exactly what anything my friend.

Some people try harder than others, though.

>However, for better or worse, RDF  uses the word Resource to mean 
>basically thing, and the AWWW uses Information Resource to mean 
>basically document.
>
>You can understand them in two ways.  One is to read the english and 
>realize that your use 'thing' and 'document' might not quite match 
>that of the writers, and go with the flow until you se how they are 
>used, or you can take them as technical terms, and just read them in 
>the context of the specs.

I do try to do that latter, as I hope you appreciate. But my 
(Xiaoshu's) current point is that it might be worth looking up from 
current technical usage, as it were, and see if the naive 
misunderstanding that one gets from misreading the AWWW terminology 
slightly might not actually be quite a good story after all. It seems 
worth running with it for a while to see what happens. Nothing 
ventured, etc..

At the very least, one might gain some insight into how someone might 
reasonably systematically misunderstand the intent of the AAAW 
without being a complete idiot, when this misunderstanding hangs 
together so coherently.

>
>>>We already have accepted the idea that a given resource may have 
>>>many awww:representations, to be resolved by content negotiation.
>>>
>>>Now, take that story exactly as expressed,  but let the word 
>>>'identify' mean simply/ denote/ or/ name/,
>
>As I think it does.

My name "Patrick John Hayes" denotes me. It does not 'identify' me in 
any architectural sense. If awww:identify just meant denote, then 
awww would, quite literally, having nothing at all to do with Web 
architecture, or indeed any architecture at all. Names don't need 
architecture in order to denote because denotation isn't an 
architectural matter.

>
>>>and allow that the/ resource/ can be something entirely 
>>>unconnected to the Internet (such as, say, me), and allow 
>>>'representation' to include not/ just/ the awww:representation 
>>>relationship between a byte stream and something like an html web 
>>>page, but more generally/ any kind of representation of a thing/, 
>>>so that an image of me can be a representation of me, and an RDF 
>>>description can be another representation of me, and my home page 
>>>can be yet another representation of me - remember, here the 
>>>resource in question is/ me/, not some information resource. So, 
>>>what follows from this vision? Well, it means that your insistence 
>>>that the RDF and a JPEG image must be different resources is 
>>>misplaced. Not that its false, but it misses the point. Their role 
>>>here is not as resources, but as/ representations/. And seen in 
>>>this light, it seems quite natural that one might use conneg to 
>>>decide which of them is most appropriate.
>>>
>>>Now, of course, this is not how 'representation' has traditionally 
>>>been used in Webarch discussions. It is not awww:representation. 
>>>But it is a perfectly good usage of the word 'representation': in 
>>>fact, somewhat better than the traditional webarch sense, which is 
>>>so special and peculiar as to almost be a distortion.
>
>The same is true of an Internet Packet.

OK, sorry about the potshot. My point was only to emphasize that if 
the story still makes sense (as I think it does) when generalized 
from a particularly narrow sense, then it is particularly worth 
checking the possibility out.

>  The traditional sense of a packet for me really involves physical 
>three dimensional wrapping, and almost always brown paper, an often 
>string.   The use of the term 'packet' for some string . 
>Technical world is full of such co-options of words, and complaining 
>that they don't have their original meaning is inappropriate.

<terminology>
As said above, there is a clear difference between a technical 
co-option of a nontechnical word (and by the way, internet 'packet' 
is a very good metaphorical usage), and a re-use of a technical word 
in a narrower sense. Particularly if you never actually say what that 
narrower sense actually is.
</terminology>
But OK, lets not quibble about terminology, in this forum we already 
have a workable distinction between awww:represent and represent. My 
point wasn't meant to be about terminology so much as to suggest that 
we can keep the terminology but understand it slightly more broadly, 
possibly with advantage.

>  Because there IS no english word which is perfect, because webarch 
>didn't exist before, it was invented. Like concepts in new software 
>systems every minute of each day.  The people who chose words to be 
>co-opted do so with the best of intentions, and with a success which 
>will clearly vary depending on the audience.   
>Others can bemoan an unfortunate choice, but the reader is not, for 
>a technical term, in a position to say "actually this means 
>something else".  This is how we communicate these days.

That all depends on who "we" is, of course. I wonder if the number of 
people who understand what AWWW means is greater or less than the 
number who are misunderstanding it in some way. The idea of 
'information resource' is very hard to grok, to be sure.

<terminology>
If AWWW had used a technical word in a new technical way, then this 
would likely have been harmless. Mathematics re-used 'field' without 
getting confused with agriculture. But the awww/semantics clash over 
the meaning of 'represent' is harmful because the senses are not 
independent: the awww usage is a (very) special case of the original 
meaning, so it is inherently ambiguous every time it is used; and, 
still worse, we need the broader meaning in these very discussions, 
because the TAG has decreed that URIs can denote anything: so we are 
here discussing semantics in a broad sense whether we like it or not. 
And if the word 'represent' is to be co-opted to be used only in one 
very narrow sense, then we have no word left for the ordinary 
semantic sense. To adopt a usage like this is almost pathological in 
the way it is likely to generate confusion (as it already has, and 
continues to do so, in spades.)
</terminology>

>>>It requires us to generalize the 'classical' webarch story to 
>>>allow a broader sense of '/representation/' and a broader sense of 
>>>'/resource/' and a broader sense of '/identify/'. And I think 
>>>Xiaoshu's main point is, let us try doing that, indeed, and see 
>>>what happens; and in fact, one gets a coherent, rational story 
>>>about how Web architecture should work. It isn't the REST model 
>>>any more: it generalizes it to include a much wider range of 
>>>possibilities. (We might call it REST++.) It is a Web much more 
>>>infused with semantics and descriptions than the current Web, one 
>>>which uses its own formalisms (RDF) more architecturally than the 
>>>current Web. In this vision, the semantic Web isn't simply an 
>>>application layer built on top of the pre-semantic Web, but 
>>>instead is something more like an architectural generalization of 
>>>the pre-semantic Web, with semantic technology built into its very 
>>>architecture all the way down.
>
>We could have done the same thing with the Web on top of the 
>internet.  We could have protested that it was unnatural to build 
>something which is fundamentally pages on top of something 
>fundamentally bitstreams.
>
>The point would be:
>
>"let us try doing that, indeed, and see what happens; and in fact, 
>one gets a coherent, rational story about how Internet architecture 
>should work. It isn't the inter-network model any more: it 
>generalizes it to include a much wider range of possibilities. (We 
>might call it IP++.) It is a Net much more infused with pages and 
>links than the current Net, one which uses its own formalisms (HTTP) 
>more architecturally than the current Net. In this vision, the Web 
>isn't simply an application layer built on top of the pre-web Net, 
>but instead is something more like an architectural generalization 
>of the pre-web Net, with web built into its very architecture all 
>the way down".
>
>It is always a choice.  Just think.  Routing tables in RDF.  In 
>fact, DNS in RDF and HTTP is now a very sensible solution, which 
>allowed digital signature of DNS using XMLDsig etc and a lot less 
>reinvention.
>
Two strong arguments against.  1. We can move on more quickly if we 
do not re-invent the lower layers, as the simple invariants which we 
happily assume of the TCP layer in fact take huge amounts of careful 
thought, engineering and administration to achieve.

Fair enough. I waxed too poetical above. And in any case, obviously 
the way to improve the internet or Web or anything else is to provide 
alternative mechanisms which can be adopted if/when they provide an 
advantage. But to be fair to Xiaoshu for a second, I don't think he 
was intending to be this poetic or far-reaching either. All the 
actual technical debate seems to be about the propriety (or lack of 
it) in using conneg to negotiate between representations that aren't 
awww:representations, and this is about using existing machinery in a 
new way.

>  2. We do not arrogantly assume that we will be the only net users 
>doing interesting things, so we want to interconnect with other 
>net-using services like email and peer-peer protocols and so on.

Quite. I did not intend to suggest otherwise.

>>>So, here's a typical Web transaction. A URI U/ identifies/ a 
>>>resource R, and when U is given to http, the Web delivers a/ 
>>>representation/ S of R. Typical classical case: R is a website (or 
>>>a webpage or a server or an http endpoint, or... but anyway, its 
>>>something Internettish), U+http is a route to R and S is a 
>>>awww:representation of R, which is typically a byte-for-byte copy 
>>>of a file which comprises the bulk of R.  Alternative case using 
>>>the more general senses: R is me, U denotes R and S is an RDF 
>>>graph describing R, using FOAF. Describing is one way of 
>>>representing. Another alternative sense: R is me, U denotes R and 
>>>S is a JPEG image of R. Picturing is another way of representing. 
>>>Now, these representations aren't awww:representations of me, of 
>>>course; but they couldn't/ possibly/ be, since I'm not the/ kind 
>>>of thing that can possibly have/ an awww:representation. So if we 
>>>want to run the classical story with things like me - 
>>>non-information resources - as R, then we/ must/ generalize the 
>>>classical notion of 'representation'.
>>>
>
>
>
>>>What these alternative cases have in common, and where they both 
>>>differ from the traditional one, is that the Web 'thing' that is 
>>>located by U+http and which returns the representation S simply 
>>>isn't mentioned. Its not part of the story at all: it's not the 
>>>resource, S doesn't represent it, and its not what the URI 
>>>identifies/denotes. Its just part of the Web machinery, a 
>>>computational thing whose task is to transmit S when requested to 
>>>do so. It has a relationship to R, of course, but rather an 
>>>indirect one: it is a thing that delivers representations of R, 
>>>using http. We might call it a/ storyteller/ for R. R might have a 
>>>whole lot of storytellers, each capable of telling different kinds 
>>>of story about R.  The classical case is where R is its own 
>>>storyteller. This is different from the classical REST/webarch 
>>>story, indeed: but then, as soon as we allow URIs to identify 
>>>things that can't be accessed by transmission protocols, the 
>>>classical story stopped working. We have to broaden our horizons. 
>>>But notice that it follows the same basic description as the 
>>>classical story, just using the terminology more broadly.
>
>So the pictures and the web pages and the RDF documents are not 
>first class objects, and do not have names.

No, I did not say that. Obviously they are first class objects and 
have names. The way we name Web pages is a special case of this 
picture, where the 'storyteller' is the same thing as the resource. 
Things that can be their own storytellers fit nicely within current 
AWWW, with its official understanding of words like 'represent'. (In 
fact, capable of being ones own storyteller might be a way to define 
'information resource'.) But the nice thing about this picture is 
that other kinds of resource, which do not fit at all within the AWWW 
- things that aren't documents, 'non-information resources' - also 
fit within it; still, ironically, using the AWWW language, but with a 
semantic rather than AWWW sense of 'represent'.

Right now, the semantic web really does not have a coherent story to 
tell about how it works with non-information resources, other than it 
should use RDF (plus whatever is sitting on it in higher levels) to 
describe them; which says nothing, since RDF can describe anything. 
URIs in RDF are just names, their Web role as http entities 
semantically irrelevant. Http-range-14 connects access and denotation 
for document-ish things, but for other things we have no account of 
how they should or should not be related, or what anything a URI 
might access via http has got to do with what it denotes.

The way that the three participants (denoted-thing, URI-name and 
Web-information-resource 'storyteller') interact must be basically 
different when the denoted-thing isn't an information resource from 
when it is. All that being suggested here is that there is an account 
that we could give about this, one that works in both cases and which 
fits the language of AWWW quite, er, nicely.

>  It certainly is not the web.  
>Sure, you could build it.  Semantics Transfer Protocol. It would be 
>a interesting study.
>
>I content that it actually not very useful to get back S without 
>knowing what its relationship to R is.

I agree. That is the situation we are in at present, when R is a 
non-information resource. Literally nothing is specified about this 
relationship is, in this case.  In the story being suggested, we 
would at least know that S is a representation of R in some sense. 
Hopefully it would also somehow indicate in what sense.

>  Of course, if it is RDF about R it can say of its own accord.

<terminology>
Of course. But the only point being suggested is that when this 
happens, we might not unreasonably say that this RDF is indeed a 
representation of the (non-information) resource, just as we say this 
for an awww:representation of an information resource.
</terminology>

>   If it is a JPEG we don't know whether it is R or is a JPEG 
>encoding of R or a single frame taken from R or a picture of R one 
>night in a bar.

Right. Life is like that when you start insisting that you can refer 
to things other than documents.

>Two designs suggest themselves.

I wasn't going to suggest any new design, though this is an 
interesting direction of thinking.

>   In one, the relationship is negotiated.   The client sends a 
>request including a header something like:
>
>Accept-response: pictureOf, meaningOf, directionsToHouseOf, stuffAbout
>
>and the server responds including a header something like
>
>Response: pictureOf  ; env="bar"; time="00:26"
>
>The other design is that the returned thing is always just a set of 
>assertions by the server,  explaining the relationships involved. 
>If you like, you can attach anything but the cover note has the 
>semantics of a message from the publisher to the reader.  It might 
>say things like "The R you requested is a person, their name is 
>Archibald, and we know of two photos, the first being a mugshot and 
>the second a holiday snap"
>
>The trouble is there is no way for the client to direct the search.

Is there at present? As I understand the technical aspect of the 
current debate Xiaoshu is engaged in, his suggestion is to use conneg 
to do exactly this kind of negotiation. It doesn't seem to me like an 
obviously unreasonable idea. All the opposition to it seems to of a 
doctrinaire nature (that isn't a 'normal and acceptable' use of 
conneg) rather than anything technical. Some of the emails have had a 
slightly stentorian, even harrumphing, tone. I havn't read any 
technical objection yet.

>Suppose the client wants to to get a mugshot of R.   This may or may 
>not have a URI itself, Rm.
>In either case, the client can ask as long as it likes but may 
>always get back the information that Rm is a photo of R.   It asks 
>for a JPEG, and gets back a picture of the relationship between Rm 
>and R as circles and arrows.  Well, that is in the new world a 
>representation of Rm, so I guess it has to be content.  Or maybe all 
>photos are served in http: space, not stp: space.
>
>>>In this view, then, content negotiation is a much wider topic than 
>>>it has traditionally been. We are dealing with a much wider notion 
>>>of what a 'resource' is, and a much wider notion of what a 
>>>'representation' is. Some resources have/ all kinds/ of possible 
>>>representations. So yes, we have to be prepared to go beyond 
>>>'accepted and expected usage'. Who would have thought otherwise?
>
>
>Well, the interesting thing about IP is that it built on top of the 
>Ethernet system without going beyond Ethernet's 'accepted and 
>expected usage' one single bit.   And the web was built on top of 
>TCP/IP without going outside TCP/IP's 'accepted and expected usage' 
>enough for use to actually modify TCP/IP at all.   So an agent 
>capable of induction might well have thought otherwise.

I was referring to the usage, not the machinery. Is a picture of me a 
representation of me? Yes, it is, though not in the awww sense. An 
RDF description of me? Again, yes but not in the awww sense. OK, 
then, one might reasonably ask, what would be a representation of me 
in the awww sense? And the answer is: there isn't such a 
representation. In the awww sense of 'represent', I am a thing that 
cannot be represented. So are you, and indeed so is almost everything 
in the entire universe, almost everything that, in the brave new 
Semantic-Web world, can be denoted by a URI. Not a very promising 
start for a theory of Web semantics.

>
>
>  	http:
>Internet:		//www.w3.org/
>Web:				People/Berners-Lee/card
>SemWeb: 
>		#i
>
>When you look at the URI  you can see the archaeology, you can count 
>the rings of the tree.  You can see how each layer leverages the 
>previous layer.  #i denotes a person

Stop right there. A person exists and has properties entirely 
separate from the Web. Many people have nothing to do with the Web in 
their entire lives. People are not Web objects. And when the URI is 
being used  in an RDF graph to refer to a person, the fact that it 
starts with http: is nothing more than a lexical accident, which has 
no bearing whatever on the role of the URI as a name denoting a 
person. (BTW, if this is incorrect, then the RDF semantics is 
seriously incomplete. I'd be delighted to be told Im wrong, and given 
enough new information to write a better semantics for it.)

>as described by a document

A person as described by something? What does that mean? Does a 
person become a different person when they are described in a 
different way? The layers of the Web (or any other) description are 
not the layers of reality.

>People/Berners-Lee/card in a domain controlled by the owners of www.w3.org.

Aside from the above remarks, I simply don't think that any of this 
is true. At best it is a hopeful fantasy. I can publish some RDF that 
contains the URI  "http://www.w3.org/People/Berners-Lee/card#foodle", 
and what it asserts might well be true, because I invented that URI 
myself, and I intended it to denote my cat. Its internal syntactic 
structure is relevant only to what happens when it is given to http, 
but I am not using it in that way, and I don't care what http does 
with it: I'm using it, as sanctioned by the RDF model theory, purely 
to denote, and indeed to denote a non-information resource. I believe 
that in doing this I am not disobeying any of the precepts of AWWW, 
since AFAIK these precepts simply do not say anything about using 
URIs to denote non-information resources (other, of course, than that 
they can denote such things.)

>The semantic web in this way builds on a lot of existing social and 
>technical architecture.

Where in any SWeb specification document is any of this social and 
technical architecture related to anything at all to do with 
denotation and semantics?

>
>Feel free, Pat(Xiaoshu), to build such an stp: system.   Feel free 
>to use it to inform the design of HTTP and maybe help us adjust 
>HTTP. But  do not feel free to misrepresent what technical terms in 
>the web architecture mean -- you have to pick other.

<terminology>
I think this particular shoe is on the other foot. If you can 
actually say, clearly enough to prevent continual trails of endless 
email debate, what AWWW actually means by 'represent', then I'd be 
delighted if you would use some technical word to refer to that 
elusive notion. But the word 'represent' and its cognates has been a 
technical word in far larger and more precisely stated forums for 
over a century; and since the day that Web science has included the 
semantic web, AWWW has taken an irrevocable step into the same 
academy. You are using the language of semantics now. If you want to 
be understood, you have to learn to use it correctly.
</terminology>

Pat



-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
http://www.ihmc.us/users/phayes      phayesAT-SIGNihmc.us
http://www.flickr.com/pathayes/collections

Received on Sunday, 13 April 2008 06:44:50 UTC