Re: non-authoritative syntaxes for fragment identifiers from Al Gilman on 2004-09-03 (uri@w3.org from September 2004)

From: Al Gilman <Alfred.S.Gilman@IEEE.org>
Date: Fri, 3 Sep 2004 11:44:04 -0400
To: Myriam Amielh <myriam.amielh@cisra.canon.com.au>, uri@w3.org
Message-Id: <p06110400bd5e36caa594@[10.0.1.2]>
[standard caveat: just my $AUS .02 -worth, here.]

At 1:31 PM +1000 9/3/04, Myriam Amielh wrote:
>If the use of non-authoritative fragment identifier syntaxes in URIs 
>is allowed, although at the user's own risk, such URIs should be 
>valid. Therefore, I suggest that RFC2396bis clarifies whether a URI 
>with non-authoritative fragment identifier is still a valid URI or 
>not.

Sorry, what do you mean by "is a valid URI or not"?

I don't recognize this as a question that this specification sets out
to answer.

What is really true is not that the interpretation of the fragment
"is specified by the MIME type of the representation recovered" but rather
that it may depend on the type of the representation recovered,
which is not known for sure until the recovery transaction is complete
in the case of HTTP recovery, and hence one cannot be assured
in general that one knows the type of the recovered representation prior
to recovery.

Since the fragment *cannot be interpreted for sure prior to
ascertaining the type of the recovered representation,* its
interpretation *is not further specified in this document.* It's
outside of the scope of scheme-generic or even scheme-utility
processing for this document at this time.

[watch this space for scheme-utility additions in the areas of #id
practices tied to XML-ID and ID schemes in SGML variants or more
widely that work enough like ID in XML (once the latter has been
clarified). You could come forward with a similar utility module that
would be for slicing components in composite media presentations.]

This document by its scope cannot tell you that your URI is right, is
'valid.' It can only tell you that your URI or your media-type-specific
processing is *wrong* because it uses a syntactic decomposition that
is at odds with what we have in this documents said may be done in
a generic URI-processing library.

If your outlaw fragment syntax forces URIs to violate the syntax
set out here for all URIs, then this specification says that the processors
who process URIs in that way and the authors who publish URI-claimants
with the intention that they be processed in that way are wrong, are
injecting noise into the Web system where there ought to be
duly-modulated signal.

But failing that, this specification says nothing about the validity
or invalidity of your pattern of practice.  All it says is that those strings
cannot be distinguised from URIs by the rules set out here.

On the narrower question (sorry, I think that this is what you really
want to know):

First, I don't think that you should consider a fragment syntax that
is controlled by an MPEG specification to be 'non-authoritative' in
the sense of the AWW quote you cite where it says "such
interpretations are not authoritative because they are not licensed
by specification."

In your case the question would be did you act on inferences which
were based on the syntax of the fragment, commiting such action
before recovering an MPEG resource? Or did you commit such actions
after dereferencing the URI and ascertaining that the recovered
resource claims to conform to the MPEG specification in which the
interpretation of the fragment syntax is set out.

If you had recovered an MPEG-type resource, then using the MPEG rules
to interpret the #fragment is pretty darn safe. Even if the MIME-type
registration document fails to explain it or explains it wrong. That's
just a bug in the synchronization of the type metadata across
different documents attempting to describe the same type.

If you interpret the contents of the fragment according to MPEG rules
before recovering an MPEG representation, for many schemes such as
'http' your are on think ice and should treat the interpretation as a
working hypothesis but not a given.

My personal suggestion for a friendly amendment to the 'opacity'
clause in the AWW would be to say just that:

"Processors MAY form educated guesses based on examining the contents
of the URI, but they MUST retain the user or downstream process's
ability to correct errors arising out of such guesses." Processors
MUST not interpret the contents of a URI in any mechanical, invisible
and irrevocable way based on assertions not imposed by the governing
specifications."

Al

PS:

The reliance on atomic MIME types for type-qualification of recovered 
resources is a bone
of contention as regards the Web Architecture.  It would be more 
fitting to insert a remark
in the AWW to this effect than to try to extend this document to 
define what is "a valid
URI."  All this document defines is "a string meeting the general 
requirements on URIs
imposed by this document."


>Hello,
>
>The issue I would like to submit here is the following: Does the use 
>of a non-authoritative fragment identifier syntax make a URI 
>invalid? In relation to this problem, I have a suggestion for the 
>Last Call on RFC2396bis.
>
>In the AWWW document, Paragraph 4 of clause 3.3 specifies:
>
>"Parties that draw conclusions about the interpretation of a 
>fragment identifier based solely on a syntactic analysis of all or 
>part of a URI do so at their own risk; such interpretations are not 
>authoritative because they are not licensed by specification."
>
>This clause seems to allow the use of a non-authoritative fragment 
>syntax although there is no guarantee it can always be processed. I 
>think it is reasonable to allow the use of non-authoritative 
>fragment syntaxes, especially considering that:
>
>- although in some cases Internet media types owners may not 
>need/want to define a syntax, content owners may want to address 
>fragments of content, and have to define non-authoritative syntaxes,
>- in the future, it may be beneficial to establish common 
>conventions for addressing fragments consistently across multiple 
>representations of a content. Indeed at the moment, very few 
>Internet media types have defined a syntax for fragment identifiers.
>
>At the moment, both the RFC2396bis and the AWWW specify that:
>
>The semantics of a fragment identifier are defined by the set of
>    representations that might result from a retrieval action on the
>    primary resource.  The fragment's format and resolution is therefore
>    dependent on the media type [RFC2046] of a potentially retrieved
>    representation, even though such a retrieval is only performed if the
>    URI is dereferenced.
>This does not clearly state whether the use of a non-authoritative 
>scheme is valid or not. Another situation could happen if a 
>non-authoritative fragment syntax is widely used on the web for a 
>particular representation and later on an Internet media type owner 
>registers a fragment syntax. Both schemes could potentially coexist 
>and be deployed assuming that the syntaxes use a mechanism to help 
>the processor identify which scheme applies (for instance using a 
>scheme name as for the Xpointer Framework).
>
>If the use of non-authoritative fragment identifier syntaxes in URIs 
>is allowed, although at the user's own risk, such URIs should be 
>valid. Therefore, I suggest that RFC2396bis clarifies whether a URI 
>with non-authoritative fragment identifier is still a valid URI or 
>not.
>
>Best regards
>Myriam
Received on Friday, 3 September 2004 17:32:54 UTC