RE: [w3c sml][4665] Clarify URI equivalence in reference to RFC 3986

I have updated the bug description to reflect that the proposal discussed on 9/20. I also mentioned there that the WG reached consensus on the amended proposal on 9/20.


From: public-sml-request@w3.org [mailto:public-sml-request@w3.org] On Behalf Of Pratul Dublish
Sent: Wednesday, September 26, 2007 9:09 AM
To: John Arwe; public-sml@w3.org
Subject: RE: [w3c sml][4665] Clarify URI equivalence in reference to RFC 3986

The consensus is reflected in the latest SML IF spec
3.4.1 URI equivalence

SML-IF uses URI equivalence extensively to resolve references among documents in the interchange set. To determine whether two URIs are equivalent, consumers MUST perform case sensitive simple string comparison based on codepoint-by-codepoint comparison of the corresponding characters in the URIs. Whenever a relative URI is tested for equivalence with another URI, SML-IF uses the [base URI] property as specified in the Infoset [XML Information Set<http://dev.w3.org/cvsweb/~checkout~/2007/xml/sml/build/sml-if.html?content-type=text/html;%20charset=utf-8#XMLInfoset>] to define a base URI for relative URIs. The [base URI] property can be defined on any of the information items in the interchange set.

However, the bug description does not mention the resolution.

Kumar – please update the bug to record the resolution.

From: public-sml-request@w3.org [mailto:public-sml-request@w3.org] On Behalf Of John Arwe
Sent: Wednesday, September 26, 2007 9:01 AM
To: public-sml@w3.org
Subject: RE: [w3c sml][4665] Clarify URI equivalence in reference to RFC 3986


Not re-opening if I now correctly understand it to be case sensitive.  The original 9/12 proposal said case INsensitive, Kumar's 9/24 email seems to say the consensus was for case sensitive comparison which I agree with.

Since I did not see a proposal updated with the consensus view I was/am not positive that the consensus view was case sensitive comparison.

If the consensus was case INsensitive comparison then I will see if the intervening discussion has changed my mind about it, and if not raise it again.  Fair enough?

Best Regards, John

Street address: 2455 South Road, Poughkeepsie, NY USA 12601
Voice: 1+845-435-9470      Fax: 1+845-432-9787
Pratul Dublish <Pratul.Dublish@microsoft.com>

09/26/2007 11:25 AM

To

John Arwe/Poughkeepsie/IBM@IBMUS, "public-sml@w3.org" <public-sml@w3.org>

cc

Subject

RE: [w3c sml][4665] Clarify URI equivalence in reference to RFC 3986







John
The WG reached consensus on this issue. Are you asking that the issue be reopened for discussion?

Thanks!
Pratul

From: public-sml-request@w3.org [mailto:public-sml-request@w3.org] On Behalf Of John Arwe
Sent: Wednesday, September 26, 2007 7:35 AM
To: public-sml@w3.org
Subject: RE: [w3c sml][4665] Clarify URI equivalence in reference to RFC 3986


So it sounds like the net in the spec is case-sensitive, something I did not observe as a change in the email thread.  Presumably consensus on this was reaching during a call, which is fine.

> Proposal: Uri equivalence in SML-IF should be defined as case insensitive simple string comparison based on codepoint-by-codepoint comparison of the corresponding characters in the uri.

To some degree, this is a redux of several other discussions we have had.  In the end, much of the disagreement we found centered on differing interpretations of words like "defined as".  Some read that to mean "==", i.e. this and only this, floor=ceiling.  Others read it to be setting a floor and no ceiling.  If we can get such a range of interpretations (that are quite different really, if you are an implementer) within our relatively small workgroup, I think we can expect no more clarity in the wider audience that will read the spec later.  Hence I have and will continue to comment that such cases need to be explicitly phrased to ack the floor/ceiling constraints separately (and usually using rfc2119 keywords).

> As for normalization, we do not preclude it.

See above; a valid (although not your intent, based on later qualifications in the proposal) reading of "defined as case insensitive" would be that a paranoid consumer must normalize.  Just because the spec places responsibility on the producer does not mean the producer always correctly keeps up its end of the bargain.  Fine as a spec writer to say that such a producer is non-compliant, but if figuring out non-compliance is hard enough it's not all that practical.

> We put that [JA: normalization] burden on the producer rather than the consumer. Once the producer guarantees it, the consumer need not perform any further normalization since a simple string comparison is guaranteed to achieve interop.

Again, somewhat a language issue.  "when a producer is writing out an SML-IF document, it can apply normalizations" does not (my reading) place any rfc2119 burden on a producer, and it certainly doesn't guarantee anything... even if it had been actually part of the proposal's text, which it was not.

If the current draft clearly says (paraphrased, more detail allowed) case sensitive URI comparison is the floor, a consumer may climb further up the ladder (no ceiling), then it sounds like not only is the equine deceased, but the autopsy is finished and the offending bacterium's DNA has been sequenced and published.  woo-hoo, next bug.

Best Regards, John

Street address: 2455 South Road, Poughkeepsie, NY USA 12601
Voice: 1+845-435-9470      Fax: 1+845-432-9787
Kumar Pandit <kumarp@windows.microsoft.com>

09/24/2007 03:16 AM


To

John Arwe/Poughkeepsie/IBM@IBMUS, "public-sml@w3.org" <public-sml@w3.org>

cc

Subject

RE: [w3c sml][4665] Clarify URI equivalence in reference to RFC 3986










I do not have a strong opinion on the case issue. Since the WG seemed to prefer case sensitive comparison, that is how I have defined the comparison in the checked in spec.

As for normalization, we do not preclude it. We put that burden on the producer rather than the consumer. Once the producer guarantees it, the consumer need not perform any further normalization since a simple string comparison is guaranteed to achieve interop.


From: public-sml-request@w3.org [mailto:public-sml-request@w3.org] On Behalf Of John Arwe
Sent: Friday, September 21, 2007 11:07 AM
To: public-sml@w3.org
Subject: RE: [w3c sml][4665] Clarify URI equivalence in reference to RFC 3986


As I interpret "case insensitive", it means "a"=="A".  Is that how you mean it?

If so then I fail to see how this is simpler than "case sensitive" comparison.  Case-insensitive requires normalization (to one case), which is in direct opposition to your own first argument.  Thus, to me, it is confusing unless there is some other factor not yet expressed that makes this a tradeoff worth doing.  Requiring case normalization _might_ also cause burdens on implementations that choose to support encodings other than utf-8/16.  I am _far_ from an expert on encodings, but I do remember hearing about cases involving Katakana I believe where case-folding was quite expensive.  If this becomes a critical factor in the decision I will have to talk to some globalization folks.

This seems to me like it might be another floor/ceiling discussion.  3986 clearly discusses the tradeoff between computational cost and the risk of concluding that two URIs identify distinct resources when they do not, because of incomplete normalization (I think it refers to this case as false negative, but the language makes my head hurt so I wrote it out).  It seems as if we can allow simple string comparison as a floor without precluding more sophisticated implementations from climbing further up the comparison ladder.  Remember that the risk in our context of believing two URIs refer to different things is the risk of getting the model boundary wrong in the interchange case.  Interop would only be guaranteed at the floor of course, as usual, but it is still sufficient for interop.

Best Regards, John

Street address: 2455 South Road, Poughkeepsie, NY USA 12601
Voice: 1+845-435-9470      Fax: 1+845-432-9787
Kumar Pandit <kumarp@windows.microsoft.com>
Sent by: public-sml-request@w3.org

09/19/2007 06:15 PM




To

Sandy Gao <sandygao@ca.ibm.com>, "public-sml@w3.org" <public-sml@w3.org>

cc

Kumar Pandit <kumarp@windows.microsoft.com>

Subject

RE: [w3c sml][4665] Clarify URI equivalence in reference to RFC 3986













I added the ‘case insensitive’ clause because, if an implementation is based on a file-system that uses case insensitive paths then it fits nicely with the current proposal. That said, I do not have a strong bias towards that option. I am ok with defining the comparison as case sensitive while keeping the rest of the definition as is.

It is not clear from your reply if you agree with the proposal (sans the case-insensitive part) or if you want to base your decision on whether the URI/IRI gurus agree with it first. Can you please clarify?

Since no one has disagreed with the proposal (except the concern about case-insensitive part), if you agree with the amended wording, we may actually be able to get this into the second draft today.


From: public-sml-request@w3.org [mailto:public-sml-request@w3.org] On Behalf Of Sandy Gao
Sent: Wednesday, September 19, 2007 11:11 AM
To: public-sml@w3.org
Subject: Re: [w3c sml][4665] Clarify URI equivalence in reference to RFC 3986


This is a simple proposal, and being simple is normally good, but I'll leave this to the URI/IRI gurus to determine whether the simple solution is good enough to cover real-life scenarios.

One thing that worries me is the "case insensitive" part. Why? As far as I can tell, this doesn't match any of the steps in "6.2. Comparison Ladder" of RFC 3986. If we want the simplest possible solution, then we should use what's defined in 6.2.1 and compare strings character-by-character case-sensitivly.

Thanks,
Sandy Gao
XML Technologies, IBM Canada
Editor, W3C XML Schema WG<http://www.w3.org/XML/Schema/>
Member, W3C SML WG<http://www.w3.org/XML/SML/>
(1-905) 413-3255 T/L 969-3255
Kumar Pandit <kumarp@windows.microsoft.com>
Sent by: public-sml-request@w3.org

2007-09-12 11:02 PM






To

"public-sml@w3.org" <public-sml@w3.org>

cc

Kumar Pandit <kumarp@windows.microsoft.com>

Subject

[w3c sml][4665] Clarify URI equivalence in reference to RFC 3986
















Here is my proposal to resolve this issue.

Proposal:
Uri equivalence in SML-IF should be defined as case insensitive simple string comparison based on codepoint-by-codepoint comparison of the corresponding characters in the uri.

Justification:
1.        Performance: Simple string comparison provides highest performance. Although it is true that two aliases of the same uri may not compare as equal without normalization, the problem does not exist in the specific context of an SML-IF producer. This is because, when a producer is writing out an SML-IF document, it can apply normalizations (if necessary) such that a given uri always appears in the same way. This allows consumers to perform fast string comparison without needing to perform any type of normalization.

RFC 3986 section 2 (Comparison Ladder) describes many different forms of normalizations (syntax-based/case/percent-encoding/path-segment/scheme-based/protocol-based). If we want a consumer to perform normalizations, we not only make a consumer less efficient but also need to add very specific normalization step definitions in the SML-IF spec. On the other hand, if we leave the burden of normalization to the producer, we can keep the SML-IF spec much simpler and allow consumers to be more efficient. This way the spec does not need to talk about any specific comparison ladder step(s) to be performed by a producer. The producer is free to apply any (or none) normalization steps as long as it knows it will write a given uri in the same format.
2.        Precise definition: RFC 3986 section 6.2.1 (Simple String Comparison) discusses issues involved in performing a string comparison but does not provide a precise definition of how the comparison must be performed. In other words, it leaves some room for interpretation. We should avoid this by presenting an unambiguous definition based on that discussion.

Received on Wednesday, 26 September 2007 21:31:41 UTC