Re: [widgets] Widgets URI scheme... it's baaaack! from Mark Baker on 2009-09-25 (public-webapps@w3.org from July to September 2009)

From: Mark Baker <distobj@acm.org>
Date: Fri, 25 Sep 2009 00:20:22 -0400
To: Robin Berjon <robin@berjon.com>
Cc: public-webapps <public-webapps@w3.org>
Message-ID: <e9dffd640909242120t16006322xbb0cb12d01efa698@mail.gmail.com>
On Tue, Sep 15, 2009 at 7:26 AM, Robin Berjon <robin@berjon.com> wrote:
>>>> The regex could just as easily have been written to exclude the
>>>> authority component of the URI.  Do you have a better example?
>>>
>>> It could have, but it wasn't — interoperability isn't what happens when
>>> people write to a W3C working group to get their code debugged, it's what
>>> happens when real people write code on their own.
>>
>> Sure, some people will write really bad code.  I just don't think we
>> have to accommodate all of them.
>
> Of course, but the above piece of code isn't bad at all. It gets the job
> done, and it's not more generic than one has reason to expect it to be,
> especially with web development background. What we're accommodating are
> expectations of interoperability and least surprise — in my book that hardly
> qualifies as catering to "really bad code".

Am I missing something?  The code will only work on implementations
which use http URIs and with a local domain.  How many of them do
that?  If not all, then that is bad code.

>>> Let us assume that we don't at all say what is returned by the many
>>> attributes that normally expose URIs. What regex would you "just as
>>> easily
>>> have written" to match an unspecified value? Here are some samples from
>>> several implementations given an image linked to as /img/dahüt.svg:
>>>
>>>  A: http://magic.local/img/dahüt.svg
>>>  B: file://mushroom.local/img/dahüt.svg
>>>  C: file:///img/dahüt.svg
>>>  D: file:///C|/img/dahüt.svg
>>>  E: \\myphone\img\dahüt.svg
>>>  F: C:\MY DOCUMENTS AND SETTING\MY USERS\MY MARKB\MY DOCUMENTS\MY
>>> WIDGETS\MY
>>> ARSE\DAH~1.VML
>>>  G: http:///img/dah%FCt.svg
>>>  H: cool-product:/img/dah%u0055%u0308t.svg
>>>  I: inode:DEADBABEC0EDBEEF
>>>  J: many more things...
>>
>> Some of those aren't URIs, and some aren't hierarchical.  Of the
>> others, "[:/]//?*/(.*$)" should cover it.
>
> Sure, but in the absence of any indication from the specification, why
> should implementers use a URI there? In fact, one could make the case that
> it makes better sense to pick something that cannot

I guess I missed your qualification; "Let us assume that we don't at
all say what is returned by the many attributes that normally expose
URIs".  I don't understand why you'd want to do that.  I thought we
just finished agreeing that we need to identify things with URIs.

> I'll note in passing that your regex doesn't take into account cases that
> might expose the query string in some implementations and not in others.
> Would you consider it to be "really bad code"? Certainly one could construct
> a more robust regex than yours, but it's a lot better to provide the means
> for implementations to be interoperable from the start rather than having to
> document which hacks work everywhere.
[..]
>
>> But if it would simplify things, I wouldn't be averse to a getBaseURI()
>> call.
>
> I'm not sure what exactly that would cover, and how it would help.

It would permit a developer to easily turn a URI into a relative path.

>>> Let's imagine we say nothing and you're an implementer: what would you
>>> do?
>>> Everyone in this discussion understands that introducing new schemes
>>> should
>>> be done with caution — what I don't understand is what architectural
>>> value
>>> you are seeing in not using URIs to identify resources, encouraging
>>> non-interoperable solutions, or sweeping the issue under the rug by
>>> delegating to a special name instead of a scheme.
>>
>> I'm not doing any of those things AFAICT.  I encourage resources to be
>> identified by URIs.  I just don't see a need to tell implementations
>> what their URIs should look like, other than to say they should be
>> hierarchical for obvious reasons.
>
> Should they include query strings?

"Should"?  No.  But if there's a good reason to use them I don't see
why it should be prohibited.

> Fragments?

Ditto.

> Can they contain UTF8
> characters?

URIs can't contain non-ASCII UTF8 of course, but escaping issues are
well understood (if inconsistently implemented).

> What are the security implications of reusing an existing scheme
> with a magic name (given that it could be highjacked)?

The same implications there's always been.

> This is a case where saying less only bring more problems. If we were to go
> your suggested route of telling implementations everything about what the
> URIs should look like except what the scheme would be, what we'll end up
> specifying is a URI scheme without a scheme name

You say that saying less can bring problems, and I agree that what I'm
proposing is to say very little, but then you say I'm "telling
implementations everything about what the URIs should look like"?  I'm
a bit confused about your complaint, as those sound contradictory to
me.

>. Apart from scoring a
> perfect Montesquieu on the Mint New URI Schemes With A Trembling Hand, I'm
> not sure that it buys us much that simply providing implementers with
> information that they've been asking for doesn't.

Well, there are good reasons to avoid new URI schemes as RFC 4395
describes.  Minting a new one is *always* the most expedient thing to
do - which is why implementers are often the ones to create them - but
rarely is it the right thing to do by the Web at large.

Mark.
Received on Friday, 25 September 2009 04:21:03 UTC