- From: Mark Baker <distobj@acm.org>
- Date: Fri, 25 Sep 2009 00:20:22 -0400
- To: Robin Berjon <robin@berjon.com>
- Cc: public-webapps <public-webapps@w3.org>
On Tue, Sep 15, 2009 at 7:26 AM, Robin Berjon <robin@berjon.com> wrote: >>>> The regex could just as easily have been written to exclude the >>>> authority component of the URI. Do you have a better example? >>> >>> It could have, but it wasn't — interoperability isn't what happens when >>> people write to a W3C working group to get their code debugged, it's what >>> happens when real people write code on their own. >> >> Sure, some people will write really bad code. I just don't think we >> have to accommodate all of them. > > Of course, but the above piece of code isn't bad at all. It gets the job > done, and it's not more generic than one has reason to expect it to be, > especially with web development background. What we're accommodating are > expectations of interoperability and least surprise — in my book that hardly > qualifies as catering to "really bad code". Am I missing something? The code will only work on implementations which use http URIs and with a local domain. How many of them do that? If not all, then that is bad code. >>> Let us assume that we don't at all say what is returned by the many >>> attributes that normally expose URIs. What regex would you "just as >>> easily >>> have written" to match an unspecified value? Here are some samples from >>> several implementations given an image linked to as /img/dahüt.svg: >>> >>> A: http://magic.local/img/dahüt.svg >>> B: file://mushroom.local/img/dahüt.svg >>> C: file:///img/dahüt.svg >>> D: file:///C|/img/dahüt.svg >>> E: \\myphone\img\dahüt.svg >>> F: C:\MY DOCUMENTS AND SETTING\MY USERS\MY MARKB\MY DOCUMENTS\MY >>> WIDGETS\MY >>> ARSE\DAH~1.VML >>> G: http:///img/dah%FCt.svg >>> H: cool-product:/img/dah%u0055%u0308t.svg >>> I: inode:DEADBABEC0EDBEEF >>> J: many more things... >> >> Some of those aren't URIs, and some aren't hierarchical. Of the >> others, "[:/]//?*/(.*$)" should cover it. > > Sure, but in the absence of any indication from the specification, why > should implementers use a URI there? In fact, one could make the case that > it makes better sense to pick something that cannot I guess I missed your qualification; "Let us assume that we don't at all say what is returned by the many attributes that normally expose URIs". I don't understand why you'd want to do that. I thought we just finished agreeing that we need to identify things with URIs. > I'll note in passing that your regex doesn't take into account cases that > might expose the query string in some implementations and not in others. > Would you consider it to be "really bad code"? Certainly one could construct > a more robust regex than yours, but it's a lot better to provide the means > for implementations to be interoperable from the start rather than having to > document which hacks work everywhere. [..] > >> But if it would simplify things, I wouldn't be averse to a getBaseURI() >> call. > > I'm not sure what exactly that would cover, and how it would help. It would permit a developer to easily turn a URI into a relative path. >>> Let's imagine we say nothing and you're an implementer: what would you >>> do? >>> Everyone in this discussion understands that introducing new schemes >>> should >>> be done with caution — what I don't understand is what architectural >>> value >>> you are seeing in not using URIs to identify resources, encouraging >>> non-interoperable solutions, or sweeping the issue under the rug by >>> delegating to a special name instead of a scheme. >> >> I'm not doing any of those things AFAICT. I encourage resources to be >> identified by URIs. I just don't see a need to tell implementations >> what their URIs should look like, other than to say they should be >> hierarchical for obvious reasons. > > Should they include query strings? "Should"? No. But if there's a good reason to use them I don't see why it should be prohibited. > Fragments? Ditto. > Can they contain UTF8 > characters? URIs can't contain non-ASCII UTF8 of course, but escaping issues are well understood (if inconsistently implemented). > What are the security implications of reusing an existing scheme > with a magic name (given that it could be highjacked)? The same implications there's always been. > This is a case where saying less only bring more problems. If we were to go > your suggested route of telling implementations everything about what the > URIs should look like except what the scheme would be, what we'll end up > specifying is a URI scheme without a scheme name You say that saying less can bring problems, and I agree that what I'm proposing is to say very little, but then you say I'm "telling implementations everything about what the URIs should look like"? I'm a bit confused about your complaint, as those sound contradictory to me. >. Apart from scoring a > perfect Montesquieu on the Mint New URI Schemes With A Trembling Hand, I'm > not sure that it buys us much that simply providing implementers with > information that they've been asking for doesn't. Well, there are good reasons to avoid new URI schemes as RFC 4395 describes. Minting a new one is *always* the most expedient thing to do - which is why implementers are often the ones to create them - but rarely is it the right thing to do by the Web at large. Mark.
Received on Friday, 25 September 2009 04:21:03 UTC