Re: fragment identifier as uri? (Was: 000106 Minutes) from Joseph M. Reagle Jr. on 2000-01-28 (w3c-ietf-xmldsig@w3.org from January to March 2000)

From: Joseph M. Reagle Jr. <reagle@w3.org>
Date: Fri, 28 Jan 2000 18:06:49 -0500
To: Dan Connolly <connolly@w3.org>
Cc: "IETF/W3C XML-DSig WG" <w3c-ietf-xmldsig@w3.org>, "C. M. Sperberg-McQueen" <cmsmcq@acm.org>, "Henry S. Thompson" <ht@cogsci.ed.ac.uk>, Tim Berners-Lee <timbl@w3.org>
Message-Id: <3.0.5.32.20000128180649.04939530@localhost>
http://www.w3.org/Signature/2000/01/URI-IDREF.html

[Dan's take on URIs and IDREFs is below and a worthwhile read, I'm
trying to summarize the WG's position in response.]

The reason I asked the XML schema editors about the URI datatype is
because I needed to understand the syntactical validation constraints
placed over that type. If it permits fragments, we will likely have to
create a user-generated  type by specifying a [1]pattern facet over
the string type. This is admittedly awkward, and I'm of a mixed mind
on it as are many of the WG members, but the reasons for this follow:

   [1] http://www.w3.org/TR/xmlschema-2/#dt-pattern

The WG is presently doing two things "oddly" in its treatment of
references.
 1. Our present course is to define a URI-clean (sans the fragment),
    such that:
    URI-clean = [ absoluteURI | relativeURI ]
    This is done because the treatment of XPATH/XSLT or other fragment
    expressions in the context of a URI can be confusing. As  XPath is
    a feature some WG members will want to use, the semantics of the
    transform are very important to the signature and it makes sense
    that they be explicitly represented as part of a transform. As
    part of a transform that we identify the WG _can_ properly specify
    any serialization or canonicalization necessary for XPATH/XSLT to
    work for our application. (Given that serialization and attribute
    order are purposefully not specified by those specs and are punted
    to the application, I wonder how other applications will address
    this issue (consistent serialization) when they are expressed
    merely as part of a URI...)
 2. However, we still need to support signature references to XML
    elements within a local document. (Where a signature is enveloped
    by or [2]enveloping XML content in the same document. ) Given our
    URI definition it makes sense to rely upon IDREFs for this purpose
    for the following reasons:

 1. I believe this was the intent of ID/IDREF as specified in XML1.0.
 2. This method permits those members not keen on XPath to reference
    local XML elements (within the same document) by using presently
    implemented XML and not having to support XPath immediately.

   [2] http://www.w3.org/Signature/Drafts/WD-xmldsig-core-20000114/#def-SignatureEnveloping

However, there are a number of reasons/arguments not to do this
 1. It is my understanding that ID/IDREFs are not thought of that
    highly by Berners-Lee as they permit "closed-world" references:
    "The local identifier space is a subset of URI space. When an
    attribute is defined as a URI, the simple "#" prefix gives access
    to the local ID space - while still allowing great power of
    expression by reference to anything else on the Web. When the
    "IDREF" form is used, this is not possible. The IDREF form is a
    weak form IMHO and  not wise for new designs which are not to be
    deliberately constraining."
    [3]http://www.w3.org/DesignIssues/Syntax.html
 2. For XML applications to understand IDREFs they need access to the
    DTD. However, I've heard arguments that this is not the case.
    (Though I'm not sure how relevant the DTD is in any case as this
    this document will have element types from two different
    DTD/schemas: the document and the signature.)
 3. The end result of this is rather kludgey as already noted.

   [3] http://www.w3.org/DesignIssues/Syntax.html

Consequently the following to arguments were forwarded:
 1. [4]Boyer has proposed we use XPath (or some profile subset/hack)
    for doing local references. Everyone must support this particular
    XPath instance, though not the whole specification.
 2. [5]Karlinger has seconded Boyer's argument, or even suggested that
    any XPath specification of a URI needs to be interpreted in the
    context of our application serialization and canonicalization
    profile.

   [4] http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2000JanMar/0011.html
   [5] http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2000JanMar/0028.html

However, [6]this issue was discussed at the FTF meeting last week with
the result that:

   [6] http://www.w3.org/Signature/Minutes/SanJose/#IDREF

Schaad: let's stay with what we have until we hear a compelling
argument that we understand and agree with before we move away from
what we have.
Reagle: what about the "clean-URI" type, no such thing. Result: Define
our own 'clean-URI' XML datatype.

__

At 22:30 00/01/26 -0600, Dan Connolly wrote:
 >[copy to w3c-archive in case I write something useful; feel free
 >to forward to anywhere, including public forums like the dsig WG]
 >
 >"C. M. Sperberg-McQueen" wrote:
 >> 
 >> At 14:32 00/01/11 -0500, you wrote:
 >> >The URI schema data type does envision "#fragment" being a valid URI, right?
 >
 >Yes, I believe "#fragment" is supposed to be a happy value in
 >the case where the schema says the datatype is the one given at
 >http://www.w3.org/TR/1999/WD-xmlschema-2-19991217/#uri
 >
 >> The type we define almost certainly should allow values like
 >> "#fragment" -- we just have to be careful to use the right term for
 >> it.  If people need both types, and wish to distinguish them,
 >> then that's a good requirement for version 2.
 >
 >Like somebody said (David Beech?), I think it's somewhat misleading to
 >call that datatype "URI". uriRef (or URI-Reference or whatever) is more
 >consistent with the URI spec,
 >http://www.ietf.org/rfc/rfc2396.txt
 >
 >
 >> I don't know.  Dan has persuaded me to be cautious in using
 >> the terms 'URI' and 'URI reference', but so far I have not managed
 >> to get fully straight on which is which.  In general, I believe
 >> 'URI reference' is more general, but at the last Schema ftf, Dan
 >> persuaded me that that was only true on some axes, and on other
 >> axes the generality ran the other way.  Result:  I am terminally
 >> confused.
 >
 >Perhaps you haven't read
 >	URI terminology, esp. in XML specs
 >	From: Dan Connolly (connolly@w3.org)
 >	Date: Mon, Jan 10 2000 
 >	http://lists.w3.org/Archives/Public/uri/2000Jan/0002.html
 >
 >But in case you have, and you're still confused, I'll try again...
 >
 >What I know for certain is that RFC2396 clearly defines two syntactic
 >constructs and much of their semantics:
 >	URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ]
 >and
 >	absoluteURI   = scheme ":" ( hier_part | opaque_part )
 >
 >URI-reference is what you know and love from HTML as the thing inside
 >the href="..." (except for the I18N-friendly but mathematically awkward
 >conventions in HTML 4 for non-ascii characters in URIs. Let's don't go
 >there for now). The following are all URI-references:
 >
 >	../foo
 >	#bar
 >	../foo#bar
 >	http://example.com/
 >	http://example.com/#bar
 >
 >absoluteURI is the thing that you give to lynx or wget or libwww when
 >you want to suck some bytes (and maybe a MIME type) down from the
 >network.
 >It generally refers to a resource you can get at via the network
 >(but not always; e.g. uuid:23j23lkj32 or isbn:nnnn or whatever).
 >Of those above, only the following is an absolute URI:
 >
 >	http://example.com/
 >
 >You may have seen the term "URI" used as the union of those syntactic
 >constructs. But as I worked out a formalization of all this stuff,
 >	http://www.w3.org/XML/9711theory/URI.html in
 >	http://www.w3.org/XML/9711theory/
 >I decided that it doesn't make sense to look at their union
 >semantically.
 >It would be like taking the union of time-points and time-offsets.
 >Yes, the value 3 makes sense both as the time point '3 seconds since the
 >epoch'
 >and as the time-offset between 12:00:00 and 12:00:03, but they're quite
 >differnent
 >beasts.
 >
 >Similarly, http://example.com/ should be looked at differently when
 >it's used as a URI-reference than when it's used as an absoluteURI.
 >When it's a URI-reference, it's not something you can hand to your
 >network layer and get content back; you have to combine it with
 >a base absoluteURI to get the referent of the URI-reference,
 >another absoluteURI; then you can hand that to the network layer
 >and get bytes back. Don't let the fact that
 >	X + http://example.com/ = http://example.com/
 >for all X confuse you.
 >
 >Now let's check your understanding; try this: the URI-reference
 >http://example.com/ refers to the absoluteURI http://example.com/
 >regardless what base absoluteURI that URI-reference is...
 >um... added to.
 >
 >The hard part is generalizing that sentence:
 >	With respect to some base absoluteURI, a URI-reference
 >	refers to a ?????.
 >
 >There's no standardized term to put in the ????, even though it's
 >the one the Namespace spec needs so badly. absoluteURI almost
 >works, except when the URI-reference in question has a fragmentID.
 >
 >i.e.
 >
 >	http://example.com/xyz + ../foo#bar = http://example.com/foo#bar
 >
 >but what do you call http://example.com/foo#bar ? it doesn't match
 >the syntax of absoluteURI, so that's no good.
 >
 >It was just called a URI in RFC1630, but the IETF folks objected cuz
 >the #bar part doesn't affect the network operation of the thingy;
 >but it doesn't make sense, web-architecturally, to treat URIs
 >with #fragids as second-class citizens.
 >
 >But it's not a URI-reference; it's the referent of a URI-reference.
 >It's a time-point now, no longer a time-offset.
 >
 >I use the term absolute-uri-with-optional-fragid in converstations
 >like this one sometimes, and I abbreviated that to URIwf in my
 >formalism:
 >
 >   URIwf tuple of abs: absoluteURI, fragment: Fragment 
 >   absoluteURI tuple of scheme: URISchemeID, path: PathName 
 >
 >[...]
 >
 >   asserts
 >
 >     \forall i1, i2: absoluteURI, if1, if2: URIwf, r1, r2:
 >URI_reference,
 >        frag: Fragment
 >
 >        i1 # frag == [i1, frag];
 >
 >        combine(if1.abs, asRef(if2)) == if2;
 >
 >        combine(if1.abs, wrt(if2, if1)) = if2;
 >
 >        combine(i1, r1).fragment = fragment(r1);
 >
 >        % asRef is 1-1
 >        asRef(if1) = asRef(if2) => if1=if2;
 >
 >
 >> I hope against hope that this helps, though I recognize that "I
 >> do not know" cannot be a reassuring answer.  I'll have to look it
 >> up and ask Dan to consult the entrails of a goat or two ...
 >
 >I hope this explanation is less mystical than goat entrails ...
 >
 >I think it might make a nifty informational RFC, especially if I
 >elaborate
 >on both the historical notes and the formalism.
 >
 >After all...
 >
 >"Dan received a B.S. in Computer Science [...] His research interest is
 >investigating the value of formal descriptions of
 >chaotic systems like the Web, especially in the consensus-building
 >process."
 >	-- my bio
 >	http://www.w3.org/People/all#connolly%40w3.org
 >
 >-- 
 >Dan Connolly
 >tel:+1-512-310-2971
 >http://www.w3.org/People/Connolly/
 >


_________________________________________________________
Joseph Reagle Jr.   
Policy Analyst           mailto:reagle@w3.org
XML-Signature Co-Chair   http://www.w3.org/People/Reagle/
Received on Friday, 28 January 2000 18:07:00 UTC