Comments on "Architecture of the World Wide Web, First Edition" from Patrick Stickler on 2004-02-03 (public-webarch-comments@w3.org from February 2004)

From: Patrick Stickler <patrick.stickler@nokia.com>
Date: Tue, 3 Feb 2004 13:42:06 +0200
To: public-webarch-comments@w3.org
Message-Id: <FE2F4E9C-563D-11D8-886D-000A95EAFCEA@nokia.com>
I applaud the TAG for their efforts in bringing this document to
maturity. Overall, I find it to be clear and consistent, and its
value to the future evolution of the web and semantic web will
surely be immense.

That said, I have some general comments on this final revision, and
unfortunately, also a few non-trivial objections to how some particular
topics are addressed.

I hope that the TAG will find this input to be constructive and useful.

--

Section 2.1, para 11, "Applications may apply rules...":

Consider adding example of acceptable presumption of equivalence
when case distinctions exist only for web authority. E.g.

"http://Weather.Example.Com/Oaxaca" = 
"http://weather.example.com/Oaxaca"

but not necessarily

"http://weather.example.com/Oaxaca" = 
"http://weather.example.com/oaxaca"

--

Section 2.3, last para:

Consider adding example of URI ambiguity which illustrates the problem.
E.g. if the same URI is used to denote two distinct resources, then
when statements are made using that URI, it is unclear which resource
is meant, resulting in confusion and potentially undesirable side
effects.

--

Section 2.3, last sentence:

Correct "URI ambiguity arises a URI is used..." to "URI ambiguity
arises _when_ a URI is used...".

Also, consider adding a link from "Web resources" at end of sentence
to the informal note at the end of section 3 -- as this is the first
time that term occurs in the document, and this is quite a central
term (even if informal). Possibly even consider adding the term
to the list of terms in section 5.

--

Section 3.1, para 1:

Consider changing text to "...  Access may take many forms, including
retrieving a representation of the state of the resource (for instance,
by using HTTP GET or HEAD), adding or modifying a representation of
the state of the resource (for instance, by using HTTP POST or PUT,
which in some cases may change the actual state of the resource if
the submitted representations are interpreted as instructions to that
affect), and deleting some or all representations of the state of the
resource (for instance, by using HTTP DELETE, which in some cases may
result in the deletion of the resource itself)."

--

Section 3.2, para 1, last sentence:

Consider changing to "A message may even include metadata about the
message itself (for message-integrity checks, for instance).

-- 

Section 3.3.1, para 2:

Para 2 does not seem quite correct.

In order to know the authoritative interpretation of a fragment
identifier, one does not dereference the entire URI *containing*
the fragment identifier, but must first obtain a different URI by
truncating the fragment identifier, dereference that different URI,
and based on the MIME type of the representation returned (if any)
interpret the fragment identifier.

It may be the case that certain clients, such as browsers, perform
that fragment identifier truncation automatically -- but that is not
the same as actually dereferencing the *complete* URI reference with
fragment identifier.

Granted, paragraph 3 seems to get it right, but the language of
paragraph 2 does not read correctly to me.

Question: are the methods PUT, POST or DELETE meaningful for
URI references with fragment identifiers, in terms of interacting
with the state of the secondary resources denoted? If not, then
it seems there is a good principle that one should use URIs
without fragment identifiers whenever possible to maximise
the utililty of those URIs.

--

Section 3.3.1, last para, last sentence:

This sentence seems misleading, as if one can infer something
about the nature of a secondary resource by interpreting a
URI reference with fragement identifier.

One cannot infer the nature of any URI denoted resource based
either on the URI *or* based on any representation obtained by
dereferencing that URI, either directly, or for URI references
with fragment identifiers, by first dereferencing the base URI
and interpreting the fragment in terms of the MIME type of
the returned represenatation.

This last sentence could either be removed or clarified/reworked.

--

Section 3.4, para 1:

Consider changing "...parties can identify and communicate about
a Web resource." to simply "...parties can identify and communicate
about a resource.".

Or then, again, link "Web resource" to a definition of that term.

-- 

Section 3.4, para 1, last sentence:

The phrase "authoritative interpretation of representations of
the resource" is a bit unclear. The owner of the URI can specify
the denotation of the URI and what representations of that
resource are accessible, but is it not the case that the MIME
type specifications define the interpretation of any given
representation -- insofar as the web architecture is concerned?

I.e., for a given representation, it is the MIME type specification
that defines the interpretation of that representation, not the
owner of the URI denoting the represented resource. ???

--

Section 3.4, para 2:

The text of this paragraph is a bit too strong regarding URI owner's
rights.

The owner of a URI has the right to decide which representations
of the denoted resource are accessible via that URI -- but in fact
anyone has the license to create a representation of that resource,
and indirectly associate that representation via another URI
that is declared (e.g. using own:sameAs) as semantically equivalent.
I.e. the rights of the owner of a URI are limited to the access of
representations via that particular URI, not (necessarily) to total
control of the resource denoted as well as any and all representations
of that resource (accessible via other URIs).

--

Section 3.5.1:

Para 3 seems to contradict the last statement of para 1. In para 1
it is said that POST requests and responses cannot be referenced
by URIs, yet para 3 describes a means to do just that.

It seems that what is meant to be said in para 1 is that, per the
default behavior of POST, the request and response are not normally
assigned distinct URIs by which they can be later referenced. ???

--

Section 3.6, para 1:

How can "...they both conclude that the resource is unreliable"
since (a) they cannot determine from either the URI or any
representation what resource the URI actually denotes, and
(b) the behavior of a given server providing access to
representations of a resource is all that can be unreliable.
The resource itself is (typically) not part of the system.

For all Nadia and Dirk know, the URI actually denotes the
union of the weather for Oaxaca and the #1 insurance
company in Oaxaca -- and getting representations reflecting
either of those facets of the resource ar perfectly acceptable,
and *consistent* with the nature of the resource denoted.

A better example of "unreliability" might be a service which
frequently returns 404 responses rather than useful representations
or one which often returns representations which do not
accurately reflect the state of the weather in Oaxaca, or
one which sometimes returns XHTML but other times returns
plain text. Yet in such cases, it is the service resolving
the URI to representations that is unreliable or inconsistent,
not the resource itself.

--

Section 3.2.1:

Is it really nessessary to posit this good practice? The SHOULD
suggests that owners of URIs who don't provide representations
of the denoted resources (either initially, or ever) are not
"doing the right thing".

This issue has been under recent discussion relating to the info:
URI scheme -- where there are many organizations not (presently)
prepared to provide representations for term resources which
could be denoted using http: URIs rather than creating the new
info: URI scheme -- yet failure to provide representations for
those terms (even if temporarily) is not really bad practice.

And encouraging folks to mint http: URIs to denote resources
which initially will not have representations, facilitates
providing representations at a later date without impacting
the use of those URIs as identifiers.

Owners of URIs should be free to decide whether any representations
are made available, and should *NOT* feel obligated to provide
representations if they themselves have no need to do so. URIs
without representations may simply be less valueable/useful
than those with representations. But it shouldn't be considered
bad practice to not provide any representations.

I recommend that this particular "good practice" be removed,
even though language should remain which reflects that URIs
with accessible representations are usually more useful than
those without.

--

Section 4.3, para 1, 2nd sentence:

Consider changing "cell phones" to "mobile phones", consistent
with subsequent paragraphs.

--

Section 4.5.4:

I see alot of problems in this section... sorry.

I do not see the necessity to introduce the concept of a "namespace
document" into this particular publication. If a given URI does
in fact denote a "namespace" resource, and that URI can be dereferenced
to obtain a representation, then that representation is a representation
of the namespace. Period. Nothing more need be said. No special kind
of representation need be highlighted here.

Furthermore, because *any* URI may be used as a namespace name, and XML
Namespaces imposes *no* requirements that the URI used as a namespace
name actually denote a "namespace" resource (it might very well
denote the concept of 'slightly watery fudge pudding') the language
in this section suggests (incorrectly so) that (a) users may
presume that URIs used as namespace names denote "namespace" resources,
and therefore (b) if a URI used as a namespace name is dereferencable
to a representation, that representation will describe the namespace
in question. Both are false and will result in confusion.

[the definition of 'namespace document' in section 5 is simply false]

The TAG should significantly rework this section, stressing the
fact that the interpretation by an XML processor of a URI used as a
namespace name (i.e. used as syntactic punctuation) should be presumed
to be disjunct from the interpretation of that URI as a resource
identifier in the broader web context.

None of the web-significant meaning of URIs when used as namespace names
need have any relevance whatsoever to the terms grounded in those 
namespaces
nor to the models or vocabularies employing those terms.

It is incorrect to suggest that there is any semantic relation between
the meaning of a URI used as a namespace name and the meaning of terms
grounded in that namespace.

Per the good practices "QNames indistinguishable from URIs" and
"QName mapping", what counts is the URI denoting the term -- and 
discovery
of knowledge about that term, vocabularies to which the term belongs, 
models
employing that term, schemas for validating proper usage of that term, 
style
sheets defining visual presentation per that term, etc. should
begin with the term URI *alone*. The namespace name has *no* semantic
significance whatsoever to the meaning and intended usage of that term.

Namespace names are syntactic machinery introduced solely so that folks
wouldn't have to use full URIs as element and attribute names. 
Semantically,
they are no different from entities. The Web would have been far better
off if syntax such as <&dc;title> would have been adopted rather than
namespace names and prefixes -- yet the result is the same: the
contraction of URIs to managable aliases.

It is disappointing to see the TAG continuing to promote the idea
that any semantics associated with a URI used as a namespace name
has any relation whatsoever to the semantics of terms grounded in
that namespace.

--

Section 5:

Consider adding the term "Web resource" with a definition such as
"A resource identified by a web-dereferencable URI".

--

Section 5: Dereference a URI:

Consider expanding to "Indirectly access the resource identified by
the URI via a representation of that resource."

--

Section 5: Namespace document:

Strongly advise the removal of both this term from the publication
entirely but particularly this incorrect definition (see discussion
above). The assertion that every URI used as a namespace name denotes
a namespace document is false.

--

Section 5: Secondary resource:

This definition is difficult to read and seems to be gramatically
ill formed. It should be reworked a bit. Perhaps "A resource that
is indicated by a fragment identifier component of a URI reference,
which must be interpreted in terms of a representation obtained
by dereferencing the base URI of the URI reference along with
the media type of that representation". ???

--

End of comments...



Regards,

Patrick


--

Patrick Stickler
Nokia, Finland
patrick.stickler@nokia.com
Received on Tuesday, 3 February 2004 11:11:27 UTC