Re: Uniform access to descriptions from Xiaoshu Wang on 2008-04-13 (www-tag@w3.org from April 2008)

From: Xiaoshu Wang <wangxiao@musc.edu>
Date: Sun, 13 Apr 2008 01:42:33 +0100
To: Tim Berners-Lee <timbl@w3.org>
CC: Pat Hayes <phayes@ihmc.us>, Michaeljohn Clement <mj@mjclement.com>, "www-tag@w3.org WG" <www-tag@w3.org>, noah_mendelsohn@us.ibm.com, Jonathan Rees <jar@creativecommons.org>, Phil Archer <parcher@icra.org>, "Williams, Stuart (HP Labs, Bristol)" <skw@hp.com>
Message-ID: <480156F9.6080705@musc.edu>
Tim Berners-Lee wrote:
>
> On 2008-04 -12, at 06:08, Xiaoshu Wang wrote:
>
>>
>> Darn and thanks, Pat. I wish my English is that good.
>>
>> Xiaoshu
>>
>> Pat Hayes wrote:
>>> Reading this exchange (below), I think I might be able to make 
>>> Xiaoshu's case for him. (Xiaoshu, if I have misrepresented you at 
>>> all, please forgive (and correct) me. But I got to this point from 
>>> your recent emails (on and off list), so even if Im wrong, you have 
>>> to bear some of the responsibility :-)
>>>
>
>
> Ok, thanks Pat, Whether or not you were successful in representing 
> what Xiaoshu meant, you have put the argument on the table.
>
>>> The central point is that now that we have the technology and ideas 
>>> of the semantic web available, we have a wider range of ways of 
>>> representing, and a richer notion of what words like ''metadata" 
>>> mean. If we are willing to take fuller advantage of this new 
>>> richness, we make available new ways to do semantic things within 
>>> the same overall design of the pre-semantic web.
>>>
>>> In particular,  awww:represents is a very narrow sense of 'represents'.
>
> Well, it is a specific part of the architecture which is now well 
> defined.  It is a technical term.
>
>>> Perhpas we can allow a wider sense of representation here.
>
> I'd prefer you to use a different term.   We have tried, with Pat's 
> guidance, to use terms like 'denote' in ways that the philosophical 
> community which came before the AWWW would be happy.  But 
> 'representation' in the AWWW is used a technical sense, as 'Packet' 
> in  the Internet Protocol. It is part of a technical design, and we 
> are not free to take it in a wider sense without doing a great 
> disservice to the community.
>
>>> The REST story was always that URIs/ identify/ resources, and that 
>>> the http response is a/ representation/ of the resource. Nobody has 
>>> ever been able to say what exactly counts as a 'resource'.
>
> No one can ever, in English, say exactly what anything my friend.
>
> However, for better or worse, RDF  uses the word Resource to mean 
> basically thing, and the AWWW uses Information Resource to mean 
> basically document.
>
> You can understand them in two ways.  One is to read the english and 
> realize that your use 'thing' and 'document' might not quite match 
> that of the writers, and go with the flow until you se how they are 
> used, or you can take them as technical terms, and just read them in 
> the context of the specs.
>
>>> We already have accepted the idea that a given resource may have 
>>> many awww:representations, to be resolved by content negotiation.
>>>
>>> Now, take that story exactly as expressed,  but let the word 
>>> 'identify' mean simply/ denote/ or/ name/,
>
> As I think it does.
>
>>> and allow that the/ resource/ can be something entirely unconnected 
>>> to the Internet (such as, say, me), and allow 'representation' to 
>>> include not/ just/ the awww:representation relationship between a 
>>> byte stream and something like an html web page, but more generally/ 
>>> any kind of representation of a thing/, so that an image of me can 
>>> be a representation of me, and an RDF description can be another 
>>> representation of me, and my home page can be yet another 
>>> representation of me - remember, here the resource in question is/ 
>>> me/, not some information resource. So, what follows from this 
>>> vision? Well, it means that your insistence that the RDF and a JPEG 
>>> image must be different resources is misplaced. Not that its false, 
>>> but it misses the point. Their role here is not as resources, but 
>>> as/ representations/. And seen in this light, it seems quite natural 
>>> that one might use conneg to decide which of them is most appropriate.
>>>
>>> Now, of course, this is not how 'representation' has traditionally 
>>> been used in Webarch discussions. It is not awww:representation. But 
>>> it is a perfectly good usage of the word 'representation': in fact, 
>>> somewhat better than the traditional webarch sense, which is so 
>>> special and peculiar as to almost be a distortion.
>
> The same is true of an Internet Packet.  The traditional sense of a 
> packet for me really involves physical three dimensional wrapping, and 
> almost always brown paper, an often string.   The use of the term 
> 'packet' for some string .    Technical world is full of such 
> co-options of words, and complaining that they don't have their 
> original meaning is inappropriate.  Because there IS no english word 
> which is perfect, because webarch didn't exist before, it was 
> invented. Like concepts in new software systems every minute of each 
> day.  The people who chose words to be co-opted do so with the best of 
> intentions, and with a success which will clearly vary depending on 
> the audience.   Others can bemoan an unfortunate choice, but the 
> reader is not, for a technical term, in a position to say "actually 
> this means something else".  This is how we communicate these days.
>
>
>>> It requires us to generalize the 'classical' webarch story to allow 
>>> a broader sense of '/representation/' and a broader sense of 
>>> '/resource/' and a broader sense of '/identify/'. And I think 
>>> Xiaoshu's main point is, let us try doing that, indeed, and see what 
>>> happens; and in fact, one gets a coherent, rational story about how 
>>> Web architecture should work. It isn't the REST model any more: it 
>>> generalizes it to include a much wider range of possibilities. (We 
>>> might call it REST++.) It is a Web much more infused with semantics 
>>> and descriptions than the current Web, one which uses its own 
>>> formalisms (RDF) more architecturally than the current Web. In this 
>>> vision, the semantic Web isn't simply an application layer built on 
>>> top of the pre-semantic Web, but instead is something more like an 
>>> architectural generalization of the pre-semantic Web, with semantic 
>>> technology built into its very architecture all the way down.
>
> We could have done the same thing with the Web on top of the 
> internet.  We could have protested that it was unnatural to build 
> something which is fundamentally pages on top of something 
> fundamentally bitstreams.
>
> The point would be:
>
> "let us try doing that, indeed, and see what happens; and in fact, one 
> gets a coherent, rational story about how Internet architecture should 
> work. It isn't the inter-network model any more: it generalizes it to 
> include a much wider range of possibilities. (We might call it IP++.) 
> It is a Net much more infused with pages and links than the current 
> Net, one which uses its own formalisms (HTTP) more architecturally 
> than the current Net. In this vision, the Web isn't simply an 
> application layer built on top of the pre-web Net, but instead is 
> something more like an architectural generalization of the pre-web 
> Net, with web built into its very architecture all the way down".
>
> It is always a choice.  Just think.  Routing tables in RDF.  In fact, 
> DNS in RDF and HTTP is now a very sensible solution, which allowed 
> digital signature of DNS using XMLDsig etc and a lot less reinvention.
>
> Two strong arguments against.  1. We can move on more quickly if we do 
> not re-invent the lower layers, as the simple invariants which we 
> happily assume of the TCP layer in fact take huge amounts of careful 
> thought, engineering and administration to achieve.  2. We do not 
> arrogantly assume that we will be the only net users doing interesting 
> things, so we want to interconnect with other net-using services like 
> email and peer-peer protocols and so on.
Tim, with due respect, I don't know if it is just me, but I think you 
are arguing for my point of view.  (I don't like to use *my* here, it 
seems as if it is I who have envisioned, but the truth is that I came to 
this point by trying to comprehend what the existing web has.  But I 
don't want to drag others in this fight)

My model, in fact, does NOT change anything.  It simply requires 
readjustment of our thinking.  In other words, I simply want to view the 
semantic web from a more abstract level.  I try (perhaps too vehemently) 
oppose adding the LINK header because I think the current web and HTTP 
is well-designed and sufficient.  My explanation is simply an 
re-interpretation of the web,  it doesn't require any change to existing 
web architecture.  It only requires some adjustment of our thinkings, 
attitude, new ways to put the data.  Of course, along the way, we might 
need some new vocabularies, but will that hurt?  If so, what?

Honestly, maybe I am too self-centered or delusional. but all your 
arguments seems to me are arguing /for/ my point of view but not 
/against/ it. 
>>> So, here's a typical Web transaction. A URI U/ identifies/ a 
>>> resource R, and when U is given to http, the Web delivers a/ 
>>> representation/ S of R. Typical classical case: R is a website (or a 
>>> webpage or a server or an http endpoint, or... but anyway, its 
>>> something Internettish), U+http is a route to R and S is a 
>>> awww:representation of R, which is typically a byte-for-byte copy of 
>>> a file which comprises the bulk of R.  Alternative case using the 
>>> more general senses: R is me, U denotes R and S is an RDF graph 
>>> describing R, using FOAF. Describing is one way of representing. 
>>> Another alternative sense: R is me, U denotes R and S is a JPEG 
>>> image of R. Picturing is another way of representing. Now, these 
>>> representations aren't awww:representations of me, of course; but 
>>> they couldn't/ possibly/ be, since I'm not the/ kind of thing that 
>>> can possibly have/ an awww:representation. So if we want to run the 
>>> classical story with things like me - non-information resources - as 
>>> R, then we/ must/ generalize the classical notion of 'representation'.
>>>
>
>
>
>>> What these alternative cases have in common, and where they both 
>>> differ from the traditional one, is that the Web 'thing' that is 
>>> located by U+http and which returns the representation S simply 
>>> isn't mentioned. Its not part of the story at all: it's not the 
>>> resource, S doesn't represent it, and its not what the URI 
>>> identifies/denotes. Its just part of the Web machinery, a 
>>> computational thing whose task is to transmit S when requested to do 
>>> so. It has a relationship to R, of course, but rather an indirect 
>>> one: it is a thing that delivers representations of R, using http. 
>>> We might call it a/ storyteller/ for R. R might have a whole lot of 
>>> storytellers, each capable of telling different kinds of story about 
>>> R.  The classical case is where R is its own storyteller. This is 
>>> different from the classical REST/webarch story, indeed: but then, 
>>> as soon as we allow URIs to identify things that can't be accessed 
>>> by transmission protocols, the classical story stopped working. We 
>>> have to broaden our horizons. But notice that it follows the same 
>>> basic description as the classical story, just using the terminology 
>>> more broadly.
>
> So the pictures and the web pages and the RDF documents are not first 
> class objects, and do not have names.  It certainly is not the web.  
> Sure, you could build it.  Semantics Transfer Protocol. It would be a 
> interesting study.
>
> I content that it actually not very useful to get back S without 
> knowing what its relationship to R is.  Of course, if it is RDF about 
> R it can say of its own accord.  If it is a JPEG we don't know whether 
> it is R or is a JPEG encoding of R or a single frame taken from R or a 
> picture of R one night in a bar.
>
> Two designs suggest themselves.   In one, the relationship is 
> negotiated.   The client sends a request including a header something 
> like:
>
> Accept-response: pictureOf, meaningOf, directionsToHouseOf, stuffAbout
>
> and the server responds including a header something like
>
> Response: pictureOf  ; env="bar"; time="00:26"
>
> The other design is that the returned thing is always just a set of 
> assertions by the server,  explaining the relationships involved.   If 
> you like, you can attach anything but the cover note has the semantics 
> of a message from the publisher to the reader.  It might say things 
> like "The R you requested is a person, their name is Archibald, and we 
> know of two photos, the first being a mugshot and the second a holiday 
> snap"
>
> The trouble is there is no way for the client to direct the search.
> Suppose the client wants to to get a mugshot of R.   This may or may 
> not have a URI itself, Rm.
> In either case, the client can ask as long as it likes but may always 
> get back the information that Rm is a photo of R.   It asks for a 
> JPEG, and gets back a picture of the relationship between Rm and R as 
> circles and arrows.  Well, that is in the new world a representation 
> of Rm, so I guess it has to be content.  Or maybe all photos are 
> served in http: space, not stp: space.
This is entirely not true.  It is just the opposite.  For all people 
that I can have some influence, this is the design pattern I told them.  
For any resource they would like to put in the web, try to provide at 
least two content-types.  One is HTML and the other is RDF.  In between, 
they can serve any content-types appropriate for their need, image, 
binary, text, mpeg, binary, ....

The reason for the HTML is to serve human users and the reason for RDF 
is for future machine agents.  These are the two major intelligent 
clients we will deal in the future.  Because once a human is given a 
URI, their first instinct is to fire up their browser and see what it 
is.  I guess for an machine agent, their first instinct should to get 
the RDF content of that URI.

Hence, I also suggest them, in both HTML and RDF, try to describe their 
resources meaningfully and tell readers, what other potential resource 
content-types they serve under the same URI.

Such a design pattern, in fact, helps search but - as you suggested- to 
hinder it.  Because search engine either based on the human language or 
machine one - won't be able to search against other kind of media 
types.  Binding HTML and RDF can only help the search engine and the web 
users.  The same would work for machines. 

I really cannot see how could that make trouble for client to direct 
search?  It should be just the opposite as you suggested.
>>> In this view, then, content negotiation is a much wider topic than 
>>> it has traditionally been. We are dealing with a much wider notion 
>>> of what a 'resource' is, and a much wider notion of what a 
>>> 'representation' is. Some resources have/ all kinds/ of possible 
>>> representations. So yes, we have to be prepared to go beyond 
>>> 'accepted and expected usage'. Who would have thought otherwise?
>
>
> Well, the interesting thing about IP is that it built on top of the 
> Ethernet system without going beyond Ethernet's 'accepted and expected 
> usage' one single bit.   And the web was built on top of TCP/IP 
> without going outside TCP/IP's 'accepted and expected usage' enough 
> for use to actually modify TCP/IP at all.   So an agent capable of 
> induction might well have thought otherwise.
>
>
>      http:
> Internet:        //www.w3.org/
> Web:                 People/Berners-Lee/card
> SemWeb:                                    #i
>
> When you look at the URI  you can see the archaeology, you can count 
> the rings of the tree.  You can see how each layer leverages the 
> previous layer.  #i denotes a person as described by a document 
> People/Berners-Lee/card in a domain controlled by the owners of 
> www.w3.org.  The semantic web in this way builds on a lot of existing 
> social and technical architecture.
See above.  I don't follow.  My point of view is being more abstract and 
you give me an example in the opposite direction. 
>
> Feel free, Pat(Xiaoshu), to build such an stp: system.   Feel free to 
> use it to inform the design of HTTP and maybe help us adjust HTTP. 
> But  do not feel free to misrepresent what technical terms in the web 
> architecture mean -- you have to pick other.
Isn't this what I am trying to ask TAG do?  To relax restriction, to 
free people from an ambiguous and arbitrary definition.  Isn't this what 
I tried to argue, don't do if HTTP(x)=200, x=IR? 

In fact, I don't think it matters at all. 

By far, I haven't seen anyone propose something if HTTP(x)=?, then x 
!=IR.  Without the latter logic, tagging a resource with IR at most put 
a tag on a resource, which won't have any practical value except 
increasing cost.  Sure, someone can invoke their own arbitrary logic, 
such as they only deal with IR.  But why should I have to worry about 
that?  When I publish something on the web, there will always be someone 
who likes and someone who dislike.   We are, indeed, free in this sense 
regardless httpRange-14.

Regard,

Xiaoshu
Received on Sunday, 13 April 2008 00:43:35 UTC