- From: <Patrick.Stickler@nokia.com>
- Date: Fri, 31 Jan 2003 10:20:05 +0200
- To: <sandro@w3.org>
- Cc: <www-tag@w3.org>
Apologies in advance for the lengthy post, and appreciation for those having the patience to read it... > -----Original Message----- > From: ext Sandro Hawke [mailto:sandro@w3.org] > Sent: 30 January, 2003 21:46 > To: Stickler Patrick (NMP/Tampere) > Cc: www-tag@w3.org > Subject: Re: RDDL and XML Schema instances are not valid > representations > of namespaces > > > > > If an HTTP GET returns a representation of a resource, and RDDL or > > XML Schema instances are considered valid representations of an > > XML Namespace, then I see no useful value to the concept of > > representation, since there apparently are no bounds as to what it > > might be, and very well might be random. > > Indeed.... > > Each URI string can be used to point to several different things. If you mean indirectly, fine, but not directly. I am very much opposed to the view that a URI can contextually denote different resources. The only mechanism the even remotely resembles contexts in the present Web architecture are XML Namespaces, which of course are opaque to RDF and in fact to most Web applications. And though I agree that URIs can be used to indirectly refer to multiple resources, I consider that out of scope for this discussion. What I am focusing on here is (a) a URI denotes one single thing and (b) if that URI is meaningful to HTTP, there are no well defined boundaries on how far a "representation" returned by HTTP can diverge from the inherent characteristics of that single denoted resource. From what I can see, a representation need not embody *any* characteristics of the resource itself, but can be any arbitrary content. I consider that to constitute a complete breakdown in any real interface between the Semantic Web and the Web since what is denoted by the former has no reliable representation by the latter. The lack of an authoritative and well defined concept of the nature of and constraints on valid representations, as well as canonical (bit-equal) representations for digital resources is a significant omission in the interface between Web and SW. > In > thinking about what a URI string points to, while working with RDF or > namespaces, I find it useful to ask: > > 1. What knowledge base might it be pointing to (if any)? > For every successful GET, over time, will I get a > serialization of the information in that knowledge base at that > time? If GET doesn't get me anything, or what it gets can't be > thoughts of as the contents of a knowledge base, then the URI > is not identifying a knowledge base. Great. But an XML Namespace URI does not denote a knowledge base. It denotes a simple set of strings (names). It does not include any semantics that might *elsewhere* be associated with such names, and in fact, different resources may assign different semantics to the same name (again, my usual example of xhtml:html for Strict versus Frameset, see below). Thus, any content that could be construed as a knowledge base (e.g. RDDL) returned as a representation of an XML Namespace is highly suspect and IMO is not a valid and reasonable representation of a namespace. Now, I'm not arguing that such a representation would not be useful to certain applications, but rather that if we are to have a consistent Web and SW architecture, then we should refrain from associating such things as representations of XML Namespaces, as that far exceeds IMO what a valid representation of a resource is. There's a good bit of nudge-nudge-wink-wink going on here. The W3C should play by its own rules and promote exemplary solutions reflecting sound use of the Web architecture. Not hacks that further confuse the foundational concepts and principles of the Web and SW. If there is to be *any* meaningful interface between the Web and SW, the concept of "representation" and the bounds of what is a valid and acceptable representation and the concept of a canonical representation must be given clear and formal treatment. At present, it is woefully obscure and thus we get folks suggesting that RDDL or XML Schema instances are valid representations of XML Namespaces (which they are not). > 2. What subject (as in topic maps) might the URI be pointing to > (if any)? Does it seem like the text or pictures returned via > GET are conveying information about some one thing, every time, > all the time? Alternatively, has the URI's owning authority > made it clear in some other way what the URI identifies? I don't see this question as relevant to the issue at hand. There is a W3C Reccomendation which explicitly states what an XML Namespace URI denotes -- a particular set of names. That's it. There is no ambiguity there. The XML Namespace http://www.w3.org/1999/xhtml denotes a set of names. There happen to exist other different resources which assign semantics to those terms, and in fact some of those resources conflict with one another, as PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" SYSTEM "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd" defines xhtml:html as <!ELEMENT html (head, body)> yet PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" SYSTEM "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd" defines xhtml:html (same term! same namespace!) as <!ELEMENT html (head, frameset)> so clearly *neither* of the above DTD resources are representations of the namespace itself, since both include knowledge that is not in any way inherent in the namespace resource *and* one would expect representations of a resource to not be in conflict with one another semantically (but of course, since there is no reasonable definition of what a valid representation is, so...) The groundwork of REST and the pairing of the concepts of resource and representation are great, and serve the needs of the Web, but they *must* be taken much further if they are to address the needs of the Semantic Web and facilitate a seamless integration of the Web and Semantic Web. What is or is not a valid representation cannot be left up to arbitrary human intuition on a case by case basis but must be expressed in a sufficiently explicit manner to serve automated agents and the machinery added so that automated agents are provided some clue as to the nature of the representations being obtained. This is particularly true for canonical (bit-equal) representations of specific digital resources. If I have a URI denoting a particular revision of a particular digital resource, there should be some way inherent in the Web architecture to (a) reliably GET a bit-for-bit exact copy of that particular revision of that particular resource and (b) know that that is what I got, or be told otherwise. To date, it's just been hit and miss, and lots of good luck, as the Web architecture has no concept of such a canonical representation even though most folks presume it, and expect it, and will complain loudly when they don't get "files" from the server exactly as they conceive of them. Now, if a human gets a representation that isn't what was expected, it's fair to presume they can figure it out more quickly than a dumb automated SW agent where the "error" may not be detected until much further along a given process or operation, and likely after the content has passed between numerous agents. > This isn't too different from how I think about URIs in general. In > writing HTML or talking to people, I mostly use URIs to point to > reliable, authoritative sources of information. Often that > information is about some particular subject (like a book, e-mail > message, or world event), but I still have to pick a good URI for that > subject. I do so based on the qualities of the information source. > But humans jump quickly to the subject, so when I say "look at > http://yyy" where that URI points them at a news story about a virus, > we'll often talk directly about the virus (with no need to focus on > the news story itself). And your point is? If the URI denotes an information source, it denotes an information source. If it denotes an abstract resource, it denotes an abstract resource. One may indirectly refer to all kinds of things by a given URI, but the URI ONLY DENOTES ONE THING and representations provided by HTTP GET should be valid and reasonable representations OF THAT ONE THING and not of any arbitrary resource that might be indirectly referred to in terms of that URI! That's the point. It is not valid to presume that a RDDL instance is a valid representation of an XML Namespace just because all of the resources described can be *indirectly* referred to by the namespace URI. Those other resources and the information about them provided by a RDDL instance are not inherent to the XML Namespace resource itself and as such have no business in any valid representation of that resource. > Still, if I say "check out http://yyy", and it's a web page about a > book, you might wonder if it's the web page that's interesting, the > book that's interesting, or even the subject of the book that's > interesting. I try to straighten this out in RDF by making the > URI-to-whatever mapping explicit and very well documented. Again, I have no problem with indirect reference to arbitrary resources by any URI, but we're talking about (a) what a URI denotes and (b) what is a valid representation of the specific resource denoted by that URI. If you want to be able to more clearly say things like <#Sandro> x:recommends [ x:bookDescribedBy <http://yyy> ] . <http://yyy> rdf:type <#WebPage> . to recommend a book described by a web page, or <#Sandro> x:recommends [ x:webPageDescribing <http://xxx> ] . <http://xxx> rdf:type <#AbstractConcept> . to recommend a web page describing some abstract concept, or whatever, great. But in either case, the URI itself denotes just one thing, and if you dereference that URI with HTTP GET you should get a representation of that one thing, not of something else that happens to be indirectly referencable by the URI. Taking the above, if you dereference http://yyy you should get a representation of a web page. If you dereference http://xxx you should get a representation of an abstract concept (which might very well be a web page). > The problem in RDF is when people use URIs directly as node labels [as > almost every does] because then it can be very hard to tell which > mapping (which kind of pointing) they had in mind. TimBL is the main > force here arguing for what mapping everyone should have in mind, but > with the WGs sitting out on this issue, consensus seems unlikely, and > URIs in RDF will continue to be only marginally better signifiers than > English words. Well, I see this as being the whole point of RDF. To be able to say what URIs mean in more explicit terms rather than having to guess in terms of whatever arbitrary representations one gets from HTTP. Still, once some software agent (or human) has the sufficient knowledge about a URI to know what it denotes and the nature of the resource denoted, it should be able to expect that representations of that resource, if obtainable, would be reasonably (a) accurate, (b) complete, (c) concise, and (d) precise. And if the resource is a digital resource and the representation is canonical, then in addition to the above, (e) exact (a bit for bit copy). Once we start talking about representations for SW agents rather than representations for Web (human) users, the needed precision and consistency of the definition of representations goes up -- and that is what I think most folks are missing here. Good enough for the Web is not necessarily (and in this case, isn't) good enough for the SW. > Back to XML: XML Namespace Names are URI strings for which sense #2 > always holds; to me they always identify an XML Namespace[1]. I agree. Though it appears we don't agree what an XML Namespace actually is... (see below) > They > may also work in sense #1, where the identified knowledge base is the > collection of information, which you talked about, about schemas, > tools, etc for working with XML documents using the namespace. I strongly disagree. *If* we are to base the Web and SW architecture on the concept of resource and representation, then XML Namespace URIs do not denote knowledge bases, and knowledge bases of the kind embodied in RDDL instances are not valid representations of those XML Namespace resources. XML Namespaces are simple sets of names. That's all. Anything more and we're talking about some *other* resource(s). The W3C needs to play by its own rules... > -- sandro > > > [1] But what is an XML Namespace? It's often described as a > collection of strings, but I find that insufficient. Too bad. That's what it is. If you find that insufficient, then work to have the specification revised. But until and unless it is, neither you nor anyone else has the right to redefine it according to your own preferences -- if you intend to play fair with everyone else in the playground (maybe you don't ;-) > Two > namespaces which conceptually have exactly the same strings in > them Are exactly the same namespace. Period. > still may have different semantics and so are different > namespaces. No. They may not. XML Namespaces define *no* semantics for their members. You are making the error of equating 'namespace' with 'model'. As I've pointed out several times before namespace != vocabulary != model != schema These are all distinct concepts and instantiated by distinct resources and if you wish to talk about all of them, you must assign each of them a distinct URI. Most models are expressed modularly in multiple schemas, and in variant schema languages, and employ multiple functional vocabularies, which have terms grounded in multiple namespaces. Yet nowhere in the W3C recs or notes is the inequality between these types of resources stated explicitly and hence the confusion persists. Even if there were, by coincidence, a 1:1 relationship between a particular namespace, vocabulary, model, and schema such that all terms in a vocabulary were all grounded in one namespace and no other vocabulary used terms from that namespace, and a model only used terms from the single vocabulary and no other model used terms from that vocabulary, and the model had one single schema defining it, etc. there would *still* be four distinct resources there, all needing their own URI to talk about them reliably and accurately -- but too many folks (lazily) use one URI to ambiguously denote all four resources, and that is where we get all this confusion. They take the URI that denotes the namespace, and then (over)use it to also denote (not just refer to) the vocabulary, the model, and the schema. Bad, bad, bad. Can humans twist and coerce the Web to do useful things given such bad practice? Sure. Can SW agents work with such bad practice? I don't think so. Can we get the Web folks to appreciate and support the additional needs of the SW? It's not looking very promising... Hence this tension between the definition of the Web architecture and the greater and more demanding needs of the SW. Regards, Patrick -- Patrick Stickler, Nokia/Finland, (+358 40) 801 9690, patrick.stickler@nokia.com
Received on Friday, 31 January 2003 03:20:11 UTC