- From: Jon Hanna <jon@hackcraft.net>
- Date: Mon, 13 Sep 2004 02:22:10 +0100
- To: <www-tag@w3.org>
a way of telling which URIs are > about things that are not web > pages (things qua things minus the web), and which are about web > pages qua web pages. This doesn't seem good to me as a member of the URIs-can-identify-anything camp and I don't think it will satisfy others or the other camps. In a debate between people who say URIs identify anything (including web pages, though rarely), people who say URIs identify "conceptual documents" (is that what you mean by "web pages"), a few remaining people that don't accept that saying URIs identify given HTML documents (is that what you mean by "web pages") is demonstrably wrong and a few positions in between the best such a system can hope to do is add to the debate, not help resolve it. Determining what a *particular* URI identifies is not the issue, most camps have solved this to our own satisfaction at least, though we would not agree with members of the other camps. Expanded Web Proper Names allow the entire format to > be given in some form as a http: URI as as well. Doing so requires you to buy into one side of the current debate. > 3) A URI can potentially be used as a name of a thing (which for this > discussion is something not "on the web") in an ontology, whether > or not any actual statements are made about such a thing. No. Either a URI can be used as a name for a thing and this makes it "on the web" or it can't be used as a name, depending on your position in the debate. Whether an ontology is involved or not is irrelevant, though I would certainly say that using URIs in ontologies only makes sense if you allow URIs to mean anything; I would accuse some who hold other positions of jumping through hoops to make ontologies work, but I'm trying my best not to be partisan right now. > 4) Alternatively a URI could be used as a name for a representation > about a thing, whether or not a representation is actually > retrieved from it. No. Either a URI can be used as a name for a representation of a thing because a representation is clearly a thing in itself just as a car, a hippogryph and Sir Isaac Newton are things, or it cannot be a URI of a representation, though it can be a URI of a conceptual document for which only one representation is available. > If we are making statements (RDF or otherwise), we want to be able to > make a statement about either a thing denoted by a URI or the > representation denoted by a URI. To one camp there is a clear relationship between what a URI identifies (whether that can be anything, or a conceptual document) and it's representation. RDF is quite capable of doing this if a suitable ontology is produced. (In fairness I must 410 my own attempt at such an ontology, it is quite badly flawed). To the other camp URIs do not denote things. Depending on the > context, that may or may not be obvious. URIs do not depend on context. Anything that depends on context is not a Universal Identifier, since it fails to identify universally (or uniformly for that matter). So, we need a > mechanism to tell whether a URI is about a thing or a representation > of a thing. One solution (explored by WPN and > Larry) is a new scheme (such as wpn: or tbl:). This adds nothing, and removes much. It's just another URI. > The fact of the matter, and the problem, is that URIs are "universal", > they can be used for *lots* of things, from naming namespaces > and ontologies to retrieving webpages. Thus keeping the > definition vague is actually rather useful. There is nothing vague about a URI being used to name an ontology and to retrieve a webpage - even if it's the same URI. There is similarly nothing vague about Inland Revenue, health boards, social services and other agencies retrieving different information when they input my PPSN (what those some of you might know as an National Insurance or Social Insurance number) which is an identifier of me. It's just not a universal identifier so I shouldn't expect it to work in Canada or for the keyboard I'm typing on to be issued one. Hence a URI can identify an ontology and when input into a given system (the web) it can retrieve a particular piece of information. (I'm going to give up trying to argue this from both sides it's tiring trying to do justice to views I don't support, so I'm just going to be partisan from here on in). > If we are determined to stick to a http URI scheme, we > should at least have canonical representations for "things" > to resolve ambiguity. I think that would remove a useful indirection. Looking at the document at <http://www.cogsci.ed.ac.uk/~ht/webpropernames/index.html>: "Having done this, I know at a glance if the page is actually about the Eiffel Tower, or a hotel near the Eiffel Tower, as opposed to the object-oriented programming language Eiffel, or the film The Lavender Hill Mob, and so on. Yet this knowledge depends on fundamental aspects of human intelligence such as language understanding, scene recognition and so forth, which have proved distressingly resistant to automation." rdf:type "These two metadata sentences in fact say the same thing--the first URIs of each triple stand for the Eiffel Tower, the third URIs of each stand for Gustave Eiffel, its architect. However there is no obvious way for an automatic process to detect this fact." There's no obvious way for a human process to detect this either (whether they are looking at representations for the URIs you give or they are reading natural language descriptions). There are mechanisms for both humans and machines to be told this, and also for both humans and machines to realise this by drawing inferences from commonality about what is stated about the problematic resources for which we have multiple identifiers. "Just knowing when two pages describe the same thing would be a huge step forward." That's not always easy with human interpretation of natural language. It's not always hard with machine processing of RDF and OWL. "Proper names are names that refer uniquely to one referent, at least in an ideal situation." URIs are identifiers that refer uniquely to one referent, at least when nobody's made a right hames of things. The distinction between "tiddles" and "cat" is that "tiddles" refers to one cat and "cat" refers to the class of entities which share various characteristics - for example that they are a proper subset of the class called "mammals", have strong limbs, are carnivorous and so on. The distinction between proper names and names is really a distinction between types of referent and as long as we have a means of making that distinction with referents (we have) then we can make that distinction with names (though really the latter distinction is the more valuable except when deciding whether to capitalise a word in certain natural languages). Besides which we need both types of names anyway, unless we are to be prohibited from naming one sort of referent (classes). "Our take on the ordinary understanding of URIs is that a URI addresses a Web-based encoding of a description or depiction of a denotation." I disagree with this "take". This is what HTTP does with URIs, not what URIs do with HTTP. "Also web pages will be used informally to cover both encodings and expressions in one term, and so will both cover the everyday language use of the term (as for HTML pages) but also refer to a wider set of phenomena (such as a URI addressing an audio stream)." Ah so that's what you mean by web pages. Please forgive me for not editing the earlier piece where I placed two possible interpretations into the http-range debate, I think I'd rather let them stand as both are worth mentioning. I did say that saying that URIs identify these is demonstrably wrong, so I shall demonstrate it. Point a web browser and a RDF parser (the validator at <http://www.w3.org/RDF/Validator> will do) at <http://www.hackcraft.net/jon/> the former will receive an HTML document, the latter an RDF document. 1 URI, 1 world wide web, 2 web pages - clearly the URI has not identified a web page. (The matter for debate is whether it has identified me, or identified a conceptual document about me [if the latter is true then the RDF is inaccurate]). "We can now be more precise about what's going on with respect to Web searches. When searching, a user typically wants to fetch expressions constituting descriptions (such as HTML or XML pages) or depictions (such as JPG or SVG images) that actually describe or depict some thing they are interested in." I'm not sure I buy, say, a piece of fiction as matching that. A piece of fiction I'd say is a "conceptual document" because sometimes a cigar is just a cigar. (This isn't ceding much to the "conceptual document" camp though - conceptual documents are things too, so we can play your game when we want to and you can't play ours :-). A software patch is even further from this. I will agree that your description does match many common searches though. "http://www.w3.org/People/thompson/ http://purl.org/dc/elements/1.1/creator http://www.ltg.ed.ac.uk/~ht/ " This is an unusual way of expressing RDF triples, in particular it makes the difference between use and mention of URIs unclear. "we have to interpret the first URI as a mention but the second as a use:" The first URI is being mentioned? This is true in the natural language sentence "http://www.w3.org/People/thompson/ is 34 characters long and uses no character escaping to make use of characters which have special meaning or are prohibited in URIs". To mention a URI we put it in a literal, possibly of type <http://www.w3.org/2001/XMLSchema#anyURI>. Since this isn't allowed with the subject of an RDF triple I assume you mean: <http://www.w3.org/People/thompson/> <http://purl.org/dc/elements/1.1/creator> <http://www.ltg.ed.ac.uk/~ht/> . Which means that the entity identified by <http://www.w3.org/People/thompson/> was created by the entity identified by <http://www.ltg.ed.ac.uk/~ht/>. If the first entity is a (conceptual or otherwise) document about Henry S. Thompson's and <http://www.ltg.ed.ac.uk/~ht/> is Henry S. Thompson then this may well be true. If either the former is Henry S. Thompson or the latter is a document then this is not true. We can deal with these situations though: <http://www.w3.org/People/thompson/> <xx:rep> <_genid:1> . <_genid:1> <dc:creator> <http://www.ltg.ed.ac.uk/~ht/> . ("<http://www.w3.org/People/thompson/> has a representation, this representation was made by <http://www.ltg.ed.ac.uk/~ht/>"). Of course we can usefully merge the two named and one unnamed nodes here with a bit more knowledge, but we don't have to in order to be correct. <http://www.w3.org/People/thompson/> <dc:creator> <_genid:2> . <_genid:2> <foaf:homepage> <http://www.ltg.ed.ac.uk/~ht/> . ("<http://www.w3.org/People/thompson/> was created by some entity. That entity has the homepage <http://www.ltg.ed.ac.uk/~ht/>"). (This entails that the entity in question is a foaf:Agent; that is a person or organisation, though again this is incidental to our being correct). "In the context of the Web, there is clearly a non-arbitrary, although not strictly necessary, relationship between the descriptive terms and whatever the recovered web pages denote. Insofar as we've hinted that a Web Proper Name is a collection of search terms, this analogy is encouraging, particularly because the first step, from search terms to URIs, is automated and distributed." I'm not convinced the connection is strong enough to be usefully reliable. Google searches not only change over time, but also according to the language you search in, whether you allow for results google knows aren't in that language to be returned, whether you are restricting your search to pages from a particular country, and whether you ask for sexual content to be filtered out (and I'm told that "Tour Eiffel" is Parisian slang for an erection, so your example could be affected). Other search engines have yet more variants. Not all of these variants are explicitly dealt with in your system, and this may affect it. "Moreover, when we want to refer to Tim Berners-Lee, we don't have to redescribe him using his title or the book he's written. A name alone determines its referent, at least where all parties involved attach the name to the same referent. Furthermore, this is achieved without appeal to descriptions." The second sentence would be true if I mentioned "Tim Berners-Lee" to a colleague. It would not have been true if I mentioned him to my mum. In order to know if it's true I need to refer to the context in which I am using the name (albeit subconsciously), the context is therefore a part of the naming, I am not using "Tim Berners-Lee" I am using "Tim Berners-Lee"#WebAwareAudience The only context I can imagine endorsing when it comes to naming web resources is that which allows relative URI references to operate. Anything else has to be included explicitly. This is generally heavier than making the name work outside of context. Section 4 generally. I'm tempted to suggest http://www.google.com/search?q=eiffel+tower+paris+-hotel+-webcam as an alternative URI, though I would take that to mean "the results of searching for eiffel tower paris -hotel -webcam on google" rather than to mean eiffel+tower. I *have* used such URIs to refer to things before though, I'm not sure whether or not this is a good idea. I am sure it's not a great idea, and don't see what you offer beyond that, except for the date (which isn't going to tell me much in 10 years, or even 10 days time) and the guarantee of the degree of accuracy (likewise). "Allows use of Web names to be easily distinguished from mention of URIs" Sorry, I just don't buy this. I can use it and I can mention it. I can use http URIs and I can mention them. The only difference I see is that I can use http URIs more fruitfully and indeed with less ambiguity with current technology. "Allows for efficient and reliable determination of whether two URIs identify resources which are about the same thing" I don't buy this either. Certain wpn URIs may be similar enough in terms and even in short name (though many things have multiple short names used often in normal speech that are not at all alike) that they are probably about the same thing. Unless the terms are exactly the same, the dates particularly close and the percentages such that it is mathematically impossible that the refer to different entities then I'm not buying. Even when there is a definite match this could be because of overlapping terms, or terms which are hyponyms of another sense of itself (the sense of "cat" that has "lion" as a hyponym has another sense of "cat" as another hyponym). In comparison smushing on inverse functional properties or owl:sameAs statements is pretty reliable. Granted I may not have all such statements available to do this perfectly, but that is true of the terms used in WPNs too. 4.2 HTTP URIs on the other hand are always strong. 4.3 Why not just use <http://www.ihmc.us/users/phayes/PatHayes.html> and be done with it? Of course there are those who say you can't use <http://www.ihmc.us/users/phayes/PatHayes.html> at all. So depending on one's position on HTTP-Range this is either pointless or its impossible. 5. This answers some of my less pressing objections above, but it's just another representation. I'm not sure it's even a particularly good one (though I'm biased against RDDL at the best of times). 6.1 I'm not sure this is a problem. I think it's a category error about what a URI means and any resolution to the http-range issue will remove it. 6.2 I don't see how this is particularly authoritative. I don't see how it is any more useful than any other representation to either human or machine processing. 6.3 "Interesting", "useful" and "nice to work with" seem to me more important criteria for bookmarks than the degree to which they are relevant to a given search. For that matter many, at times most, of my bookmarks are items I found that were completely irrelevant to what I was researching at the time but which it seemed it would be good to read otherwise. I'm not sure what problem is being solved here. 6.4 Sem-web development already has a large bottom-up component in pretty much everything but the core specs (and those are produced in an open manner). I think the sameAs is very dicey indeed. I don't think the criteria for saying "yep, that's the Eiffel Tower I mean" is strong enough to say "This URI denotes the Eiffel Tower". It could be applied to a URIs meaning "The Eiffel Tower during construction", "The Eiffel tower with the flag of the Third Reich during the occupation of Paris", "The Eiffel Tower", "View from my hotel window during my holiday" and "Man jumps from Eiffel Tower". Further cases would cause some of these URIs to be declared to be owl:sameAs construction, the Nazi occupation of France, holidays and suicide. This could be especially quick given that the way Google works would mean certain pages with the ambiguity that could make this happen more likely to be returned in a set of results. The result is a semantic web grey-sludge scenario.
Received on Monday, 13 September 2004 01:22:36 UTC