Re: Review of SWEO "Cool URIs for the Semantic Web" from Leo Sauermann on 2008-03-16 (public-sweo-ig@w3.org from March 2008)

From: Leo Sauermann <leo.sauermann@dfki.de>
Date: Sun, 16 Mar 2008 15:07:10 +0100
To: Tim Berners-Lee <timbl@w3.org>
CC: public-sweo-ig@w3.org, tag@w3.org
Message-ID: <47DD298E.2020200@dfki.de>
 Hi Tim,,

we closed our round of reviews for the 20071217 version of "cool uris" 
and incorporated feedback.
The feedback and resolved issues are here:
https://gnowsis.opendfki.de/repos/gnowsis/papers/2006_11_concepturi/feedback/index.htm

After another telco with TAG, we will publish a last-call working draft.

The current version of the editors draft is here:
https://gnowsis.opendfki.de/repos/gnowsis/papers/2006_11_concepturi/html/cooluris_sweo_note.html

all answers inline,
 we should talk about the "generic document URI" solution
 in the TAG telco next week.

 Alas, I want to thank you very much for the review,
 and wonder if we should add you as another author to the document,
 as you have significantly contributed.
 Would you accept being listed as a third author next to Danny
 and Max Völkel?
 
It was Tim Berners-Lee who said at the right time 24.02.2008 06:25 the 
following words:
 > I am commenting on the Working Draft of 17 December 2007.
 > http://www.w3.org/TR/2007/WD-cooluris-20071217/
 > The comments vary in weight, but I keep them in document order. My 30
 > comments are marked by **. I hope they make sense.
 >
 > I think this is an important document. It contains a large amount of
 > very valuable material, a few places which are confusing, and a small
 > number places which are, I believe, actively misleading.
 > There are also one or two places where I disagree about the
 > recommendation it makes.
 > On the whole, though, the document is important and I hope energy is
 > found to incorporate these comments.
 >
 > timbl
 > ______________________________________________
 >
 > 1. Introduction
 >
 > ** Suggest add a reference to the N3 Primer as an introduction to RDF
 > which some find easier to get into.
 >
 
 
 done,
 (but too tempting not to say:
 "although usually we don't add references to
 articles that were written by the reviewers themselves"
  :-))))))))
 
 
 >
 > 2. Last para before 2.1.
 >
 > ** Delete the sentence "In short, to locate a Web document—hence the
 > term URL (Uniform Resource Locator)". (Don't go there. Don't try to
 > distinguish between names and locations. See for 
examplehttp://www.w3.org/DesignIssues/NameMyth.html)
 
 argh..ok.
 All this is about the URL VS URI discussion, so deleting the last 
reference to
 URL may be ok, but some replacement sentence may have helped.
 whatever, we run out of time,
 sentence deleted.
 
 
 
 >
 > ** I would note that "URL simply identifies whatever we see when we
 > type it into a browser" is simplistic, as users are aware of the
 > difference between links and permalinks in a blog for example.
 
 on the underlying levels, the note is correct.
 we do not change the text though, the comparison with the browser
 is a good metaphor for what happens underneath.
 (no change done)
 
 
 > 2.1
 >
 > ** After
 >
 > "HTTP/1.1 200 OK
 > Content-Type: text/html
 > Content-Language: en"
 >
 > add
 > "Content-Location: alice-en.html
 > "
 
 hm, ok. But this is two-steps already...
 
 
 >
 > ** Start a new paragraph at "...English <p> Content ..."
 > ** Add text to show how people how conneg works and how the client
 > understands the content-type specific URI.
 
 good, this is also what I thought is needed,
 
 I added this:
 "Here we see Content negotiation [TAG-Alt] in action.
 The server interprets the Accept-Language headers in the request
 and decides to return the English representation of the resource
 in question. Note that the URI of this representation is passed
 back in the Content-Location header, this is not required but a
 recommended good practice (see [CHIPS], 7.2). Clients see that this
 URI is connected to the specific representation (in this case English)
 and search engines can refer to the different representations by using
 the different URIs. This implies that it is possible to have multiple
 representations of the same resource."
 
 
 
 
 >
 > ** Technical bug: The example uses 302 Found to redirect according to
 > the Accept: headers. This is *not* advisable IMHO. It uses an extra
 > round trip to no advantage. Conneg should be done directly.
 > I suggest replacing this example with one without the 302 "twist".
 
 The point of this example is to intentionally show that 302 exists
 BEFORE the semantic web and 303-redirects, and that it is used on
 some websites to do the conneg.
 It should help the reader to understand that redirection is something
 that existed before this article.
 
 [x] says "Note that even though the current practice is to use the 302 
Found",
 so I stick to 302.
 
 is this ok?
 
 [x]http://www.w3.org/TR/chips/#gl7
 
 >
 > 3. URI for Real-world objects
 >
 > ** In general, remove the term "non-information resource" from the
 > entire document. Replace it with "thing". It is wrong. It is used
 > misleadingly to mean "A thing, which is not necessarily an information
 > resource".
 >
 > ** It would I think in document like this be best to stick with "web
 > document" instead of "information resource" too, but that is just for
 > readability. It is already done in places.
 >
 > ** Delete "We call all these real-world objects or (according to WWW-
 > Arch) non-information resources." (It is a bad term, as explained
 > above, and the AWWW does not use it at all).
 
 thanks for this feedback, you tipped the ballot now,
 you are not the only one saying this!
 We were in dire need of such feedback, and having it in an official
 feedback mail helps us much.
 
 for the other feedback, see also:
 https://gnowsis.opendfki.de/repos/gnowsis/papers/2006_11_concepturi/feedback/index.htm#i2
 
 
 > 3 .. Box "1 Be on the web".
 >
 > ** Important architectural philosophical point.
 > Replace "Machines should get RDF data and humans should get a readable
 > representation, such as HTML." with "Machines, and humans through user
 > agents, should get data in RDF (and related standards). In some cases,
 > it may be useful to provide a view of the data in HTML for users with
 > conventional web browsers without data functionality"
 
 Tim, truly an architectural and philosophical point.
 The goal of the semantic web is expressed in your suggested replacement!
 I tripped over this sentence:
 "In some cases, it may be useful to provide a view of the data in HTML 
for users with
 > conventional web browsers"
 
 Looking at the current state of affairs, I see many conventional browsers
 and a marginal use of
 tabulator or other semweb-browsers. Although technologies like fresnel may
 be able to render RDF, they don't do it in a 
user-friendly-web2.0-ajaxy-beautiful
 way.
 
 Thus, in my humble view, I think this sentence is too ambitious.
 We may give the false impression of "we force everyone
 to use rdf browser and declare the end of HTML"
 
 I would not change the box at all, and given the other changes
 coming in the next point, I think the message is conveyed...
 
 If you insist (and you state this point as "important"),
 maybe you can rephrase it in a way involving RDFa, where
 HTML and RDF live happily together.
 
 
 >
 > ** Add text after the box. "This document describes ways of serving
 > both raw data and hypertext views of data. Remember that the most
 > important duty of the provider of data is to provide the data as soon
 > as possible, and raw. [ref to the blog "Give Us the Data Raw, and Give
 > it to Us Now" 
http://blog.okfn.org/2007/11/07/give-us-the-data-raw-and-give-it-to-us-now/
 > ]. Other sites and other applications can often produce hypertext and
 > graphical views of the data. Data such as calendar events, RSS
 > events, bank statements, etc are much more powerfully displayed using
 > multiple client-side views.
 >
 > That said, the ability to dereference a URI in an existing browser and
 > get meaningful results is valuable, and so provision of HTML, if it
 > can be done without undue cost, is valuable. This document describes
 > various ways of doing this."
 
 Ok, I added a reference to dataportability.org,
 and defused your text, which was revolutionary.
 "duty of the provider is to provide the data..."
 --> this flunks the business-model of facebook, linkedin
 and other data-silos.
 There is no need to inflict a duty on these
 valuable and respected institutions,
 despite our personal and institutional quest for open linked data.
 
 Your text also fits better in the introduction as a motivation to
 read the article, at the lower point the reader is not so interested
 anymore in the motivation behind our ideas. I removed "bank statement"
 as this would cause friction by security-
 aware people.. and replaced it with "address book contacts".
 
 result in the introduction, after "The Semantic Web ... and software 
applications to find them".
 
 "Users benefit from getting data raw and now [Give] and in portable 
data formats [DP].
 Providers who host calendar events, RSS feeds, contact details, etc. 
often publish data
 embedded in a fixed user interface, in HTML. Publishing it in RDF 
allows the user to
 use pick any application to view and work with the same data, for 
example in an integrated
 calendar, and find new use for the information.
 
 
 >
 >
 > Diagram before 3.1
 >
 > ** The relationships are a big vague. I think the relationships
 > expressed by the arrows in the diagram are both "description". The
 > two describing documents have different content-types. Maybe change
 > the arrows to read "description", and add "read by semantic web
 > applications" under the RDF and "Read by web browsers" under the HTML.
 >
 
 The text already explains in much detail what we mean, we assume the reader
 is now familiar with conneg.
 no change.
 
 >
 > 3.1 Distinguishing between web documents and real-world objects
 >
 > ** This section has major flaws in its argumentation. It says "Above
 > we assumed that there is a distinction between web documents
 > (information resources)andreal-world, non-document objects (non-
 > information resources). The question is where to draw the line between
 > them. "
 >
 > That is, with respect, NOT the question. That is a question is one
 > which has proved unproductive. It is not fruit full to try to define
 > from scratch "Information resource" The question is to distinguish
 > between something and a document about something. That distinction
 > has been introduced already in the document and explained well. Now
 > we have to explain that 200 means "Here is the content of the document
 > you requested" and 303 means "Here is the URI of a document about the
 > thing you requested". When that has been explained, then the class
 > of things which get a 200 will be clear by people understanding the
 > protocol.
 >
 > Later, it says 'The problem now is that web documents are also part of
 > our perceived world, hence they are real-world objects in their own
 > right.". But this is NOT a problem. Once you have thrown out non-
 > information resources" and replaced it with "things". ((For
 > example, mobydick#this may denote a book, and mobydick may denote a
 > library catalog card about the book. Both the book and the card are
 > documents, one is about the other. That is the relationship which is
 > important.))
 >
 > I propose removing section 3.1
 
 Good comment!
 It sucks somehow as we ENLARGED section 3.1 and refined it
 based on this TAG feedback:
 https://gnowsis.opendfki.de/repos/gnowsis/papers/2006_11_concepturi/feedback/Feedback_2007_09_19_TAG.html#sec3
 
 Given your suggested changes, the explanation given in section 3.1 is 
still
 needed to refer to the formal definition given in AWWW, 2.2. Based on your
 suggestions and the feedback from other reviewers [x], section 3.1
 now reads like this:
 
 <section 3.1 begin>
 Above we assumed that there is a distinction between web documents and 
things (real-world objects).
 According to W3C guidelines ([AWWW], section 2.2.), we have an Web 
document
 (there called information resource) if all its essential 
characteristics can
 be conveyed in a message. Examples are a Web page, an image or a 
product catalog.
 The URI identifies both the entity and indirectly the message that 
conveys the
 characteristics. Real-world objects are therefore entities whose 
characteristics
 can not be conveyed in a message, but are entities on their own.
 The key to understand the difference, is that a Web document often 
describes a thing.
 For example the person Alice is described on her homepage. Bob may not 
like the look
 of the homepage, but fancy the person Alice.
 
 Our recommendation is to err on the side of caution: Whenever an object 
of interest is
 not clearly and obviously a document (all its essential characteristics 
can be
 conveyed in a message), then it's better to use two distinct URIs, one 
for the
 resource and another one for the document describing it.
 </end>
 
 >
 > 4.1. Hash URIs
 >
 > ** Change "and therefore cannot identify a Web document" to "and
 > therefore does not necessarily identify a Web document"
 >
 
 done.
 (ah, now it gets easier again...)
 
 >
 > The diagram just before 4.2
 >
 > ** Remove "303 redirect". I hope that was a typo (copy/pasteo).
 >
 > ** Please add the Content-Location: headers to this diagram.
 
 This was not a typo, it was us accidentially making the thing too 
complicated.
 You are right, with conneg and Content-Location its much easier!
 result code is 200, I assume...
 will be done.
 
 
 >
 > 4.2. 303 URIs
 >
 >
 > ** Change "to a different (information) resource which can be
 > represented as a document and can give you the information that you
 > want." to "to a document which has information *about* the thing you
 > asked about."
 
 good idea! thx, done
 
 
 >
 > ** Major technical question about the implementation of 303. I know
 > that dbpedia does it the way described, but there are a lot of good
 > reasons to do it by a 303 to a generic URI for the document, which
 > then itself does a conneg to RDF and HTML.
 >
 > - It is no more round trips than the dbpedia way
 > - It gives the client a URI to bookmark which is generic. This is
 > important:
 > - It allows the user with an RDF-capable client to bookmark the
 > document, and mail it to another user (or another device) which then
 > dereferences it and gets the HTML view. This use of generic resources
 > is important.
 > - It provides the server with the ability to add representation in new
 > languages in the future.
 > - It is standard conneg and so probably more supported on servers
 >
 > Just because client started with the URI of a thing, it doesn't mean
 > that the document involved is not a first class document on the WWW.
 > Best practices for this document apply. One of these is the use of
 > Generic Resources. (See for example 
http://www.w3.org/DesignIssues/Generic.html
 > and the new ontology )
 
 You are right in your argumentation, but its a tricky area.
 The fact is tough, that we still will have URIs for the various 
representations,
 such as RDF/HTML/EN/DE, etc. In the "Content-Location" header, the http
 server will return those URIs anyway.
 I assume the client should bookmark the Thing-Uri, not the generic-document
 uri, or a representation-URI. hm.... but he doesn't see the thing-uri
 anymore in the browser address bar after the 303 has been executed.
 
 We will discuss this in the 20th March TAG telco?!
 
 I hesitate though, to change the whole document now.
 We have no time left.
 To make everyone happy, I humbly suggest to
 add your conneg-document-uri-for bookmarking solution
 as another solution, after the existing ones.
 Also explaining the certain problem that people
 SHOULD send/bookmark/use/reference-in-triples
  the thing-URI but if this is not feasible in browser
 bookmarks because of redirects, it would be good to use
 the generic-document-uri.
 
 We don't know yet what is better for the semantic web, or?
 lets have the reader be able to understand both sides and decide himself?
 
 
 >
 >
 > 4.3 Choosing ...
 >
 > ** I think a whole sentence at least could elaborate that if you use
 > 303 for an ontology, like FOAF, then the network delay can be
 > intolerable for any client looking up a set of terms, even though the
 > client has already loaded everything there is to know.
 
 I added this:
 "When using 303 URIs for an ontology, like FOAF, the network delay can 
reduce a
 client's performance considerable. A client looking up a set of terms 
through
 303 may use many requests, even though the first request has already 
loaded
 everything there is to know."
 
 
 >
 > ** The text says: "To address scalability issue with the management of
 > a large set of URIs in case of the 303 solution, the usage of a SPARQL
 > endpoint or comparable services is advised". Why? There is no
 > justification for this. The 303 to an encoded SPARQL endpoint is IMHO
 > clumsy and a proxied normal URI would be better. In future, we may
 > have ways of associating whole URI subtrees with a SPARQL server, but
 > we don't yet. Suggest remove the sentence or expand and explain it.
 
 what I actually meant is that you shouldn't download the whole set of
 data with many 303s but rather see if you find a SPARQL endpoint that
 can answer your questions more effectively by running a possible query
 on the server.
 
 I added this:
 
 ""
 When hosting large-scale datasets with the 303 solution, clients may be 
tempted
 to download all data using many requests. We advise to additionally 
provide
 SPARQL endpoints or comparable services to answer complex queries on 
the server
 directly, rather than to let the client download a large set of data 
via HTTP.
 ""
 
 >
 > ** The text says: "Note also, that both 303 and Hash can be combined,
 > allowing to spread a large dataset into multiple parts and have an
 > identifier for a non-document resource. An example for a combination
 > of 303 and Hash is:
 > http://www.example.com/bob#thisBob, the person with a combined URI."
 > This is strange. Where is the 303 in this? This (bob#this) is an
 > important way of generating URIs, and deserves a section (insert new
 > 4.3) of its own. For when databases are exposed for example, or other
 > virtual RDF linked data spaces generated from underlying systems.
 
 now I have to admit that we are out of time, especially for a proper review
 for a new section.
 I humbly suggest to leave it as is, at least we have a reference to this
 idea in now, and see that we are able to publish the note within SWEO.
 We only have a few weeks left, and writing and reviewing a new section
 is not possible in this time.
 
 for short: Tim, this is needed but lets blog it later...
 
 
 >
 > 4.3 ... Conclusion
 >
 > ** In first para, change "grow much" to "grow out of control" or "grow
 > extremely".
 
 I like "out of control", done.
 
 >
 > ** Change "303 URIs should be used for large sets of data that are, or
 > may grow, beyond the point where it is practical to serve all related
 > resources in a single document." to
 > "URIs of the bob#this form can be used for large sets of data that
 > are, or may grow, beyond the point where it is practical to serve all
 > related resources in a single document.</p><p>
 > 303 URIs may also be used for such data sets, making neater-looking
 > URIs, but with an impact on run-time performance and server load."
 
 ok,  done.
 
 
 >
 > ** Delete the paragraph "If in doubt, it's better to use the more
 > flexible 303 URI approach.".
 
 even better, Danny Ayers suggested to somewhere in the document use the
 phrase, "follow your nose", here is the excellent place, I used:
 
 ""If in doubt, follow your nose.""
 
 
 >
 > 4.5 Linking
 >
 > ** After the example box, change "This allows RDF-aware clients to
 > find a human-readable version of the resource" to "This allows RDF-
 > aware clients to find a human-readable resource". (The ?x! foaf:page
 > is not at all guaranteed to be an HTML version of ?x!
 > rdfs:isDefinedBy .)
 
 Ok, this is (I think) a little less fluent to read, but conveys more of 
the
 AWWW. done.
 
 
 >
 > ** "authoritative". In what way is the document authoritative? When
 > an ontology defines a term, then the rdfs:isDefined by really means
 > the document gives definitive information from the owner of the term.
 > With alice's company giving data about alice, it is not clear that
 > this is authoritative. I would delete the rdfs:isDefined by unless
 > changing the example. I am not sure though whether the semantics of
 > this are that closely defined.
 
 Assuming that the company publishes *some* data about Alice in RDF,
 rdf:seeAlso or rdfs:isDefinedBy are both ok.
 rdfs:isDefinedBy has a stronger semantics (there is usually only one of 
them),
 and in this case we have exactly this semantics - the company basically
 "owns" (in a humorical-capitalistic view now, don't take me too serious 
here)
 Alice. Hence the company "defines" the public view of its employee
 Alice. It also controls both URIs we are using here.
 
 I would not change it, it conforms to RDFS semantics.
 
 
 
 >
 > ** Add a paragraph:
 >
 > "The client also can deduce similar link information directly from the
 > HTTP headers: that a thing is described by the document its URI
 > redirects to with a 303s; that the content-location resource is a
 > content-specific version of the generic document, and so on.
 > Ontologies for these relations are not discussed here"
 
 I changed your sentence slightly and added it AFTER the
 "this allows RDF-aware..." para you
 refer to below:
 
 ""The client also can deduce similar link information directly from the 
HTTP
 headers: that a thing is described by the a web document which can be 
found at
 the end of a 303 redirect; that the Content-Location resource is a content-
 specific version of the generic document, and more. Ontologies for these
 relations are not discussed here""
 
 
 >
 > (Note the AWWSW group is looking at formalizing that more).
 
 hey, I was always hoping that we had something on the level of RDF/S
 that captures foaf:page or skos:isSubjectOf!
 hope to see it soon....
 
 
 >
 >
 > ** In the para <<This allows RDF-aware Web clients to discover the RDF
 > information. The approach isrecommended in the RDF/XML specification
 > ([RDFXML], section 9). If the information on the Web page differs
 > significantly from the RDF version, then we recommend using rel="meta"
 > instead ofrel="alternate".>> rewrite:
 >
 > <<This allows RDF-aware Web clients to discover the RDF information.
 > The approach is recommended in the RDF/XML specification ([RDFXML],
 > section 9). If the RDF data is *about* the web page, rather than an
 > expression of the information in it, then we recommend using
 > rel="meta" instead of rel="alternate".
 > >>
 >
 > (I think this distinction is important, and very much in line with the
 > distinctions made throughout the document)
 
 Yes, thats much cleare than what we had before.
 I like it, it connects very good to the previous argumentation about
 things and documents about things...
 done.
 
 
 >
 >
 > 5. Examples from the web
 >
 > Last line of section 5:
 >
 > Change "A better URI would be for 
examplehttp://ontoworld.org/rdf/Karlsruhe
 > ." to "A better URI would be for 
examplehttp://ontoworld.org/data/Karlsruhe
 > ." This is a cooler URI as it allows conneg to be introduced to allow
 > the same data to be expressed in rdf/xml or n3 or RIF or whatever we
 > think of next.
 
 you are the king, you are the king,
 coolness is given by though :-)))
 done.
 
 

 
 

-- 
____________________________________________________
DI Leo Sauermann       http://www.dfki.de/~sauermann 

Deutsches Forschungszentrum fuer 
Kuenstliche Intelligenz DFKI GmbH
Trippstadter Strasse 122
P.O. Box 2080           Fon:   +49 631 20575-116
D-67663 Kaiserslautern  Fax:   +49 631 20575-102
Germany                 Mail:  leo.sauermann@dfki.de

Geschaeftsfuehrung:
Prof.Dr.Dr.h.c.mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313
____________________________________________________
Received on Sunday, 16 March 2008 14:08:13 UTC