RE: Status of RFC 1738 -- 'ftp' URI scheme from Cheney, Austin on 2011-01-12 (uri@w3.org from January 2011)

From: Cheney, Austin <Austin.Cheney@travelocity.com>
Date: Wed, 12 Jan 2011 13:10:18 -0600
To: URI <uri@w3.org>
Message-ID: <9FB4E1C2C67D214BAF184CE12F7DF4DB29139D2348@SGTULMMP005.Global.ad.sabre.com>
> I don't follow. You can use an HTTP URI as identifier in RDF. How it 
> works depends on the context; it's not a quality of the URI scheme.

What is a HTTP URI?

>> No, RFC 3986 is pretty clear about differentiated intention.
>> Intention is important to every technology specification and when
>> ignored security abuses arise.
>
> I really have trouble following.

As far as difference of intention RFC 3986 is pretty clear:

>> If I understand your concern correctly then I would respond:that RFC
>> 3986 has section 1.1.3.  URI, URL, and URN, which clarifies the
>> distinction.
>
> RFC 3986:
> A URI can be further classified as a locator, a name, or both.  The
> term "Uniform Resource Locator" (URL) refers to the subset of URIs
> that, in addition to identifying a resource, provide a means of
> locating the resource by describing its primary access mechanism
> (e.g., its network "location").

> A URL *is* a URI. Do we agree on this?

This is my position that I email to the list yesterday:
> It can be said that URL is class of URI.  URI defines a syntax for
> addressing resources, and URL consumes that syntax.  

> What aspect of URLs isn't defined by RFC 3986, and *also* is 
> scheme-independent?

The question is not entirely relevant considering how RFC 1738 is
written.  The document is primarily written based on examples of various
schemes, so therefore RFC 1738 is not scheme independent.  How they are
different regardless of the scheme examples referenced by RFC 1738 can
be summarized by any transmission based action that does occur, or is
expected to occur, with regard to an address.

> I don't understand what you're talking about. I made a statement about 
> the obsoletes/updates relations between RFCs which do not work well
> when the first RFC combines to many things in the same document; which 
> results in it being potentially *partly* obsolete.

A reply to my email last night indicated that my position lacked
precision.  When I was more precise in a reply you claimed in regards to
the word "obsolete":
> That the standards track classification system is to coarse-grained
> when an old RFC does too many things at once is a known issue and
> entirely orthogonal :-).

If the word "obsolete" does not accurately describe the document and is
a primary point of this discussion then it is hardly orthogonal.

>> As long as there exists a mandatory functional requirement that a
>> class of URI return resources at the specified address the intention
>> of URL, and thus RFC 1738 remains valid.  If then RFC 1738 remains
>> valid, and its language and its examples are out dated then the only
>> responsible course of action is to submit an internet draft with
>> revised language.
>
> I still don't understand what you would want that document to say. 
> Please elaborate.

What is the standard action for a user-agent when it encounters an
address to a resource that is not present at the supplied address?
Should the response be something entirely scheme dependent?  A point
raised far earlier is that there is no standard for the file scheme, and
so some user-agents behave in a non standard way.  In the absence of a
standard what is the correct response?  The answer to this question
doesn't matter, because user agents can handle it any way they want if
there is no standard.

Hypothetical 1:
The prior reasoning only holds true if address reliant transmissions are
statically directed.  If address reliant transmissions are directed to
pass through an intermediary, for example, then a uniform and scheme
independent response to such problems becomes extremely convient.  For
example consider a document wrapped in a SOAP envelope and transmitted
over email.  A mail server could be configured to interpret the SOAP
header while the document is in transit and send a transmission by
unrelated means to a choice of different agents, such as a logging
system or a customized auto-responder or a HTTP load balancer.  If the
SOAP envelope becomes corrupt email will continue to do its job and pass
the document accordingly, however the email server can no longer
dynamically make a choice of schemes for its alternate transmission and
returns an error to some other party.  There is no scheme independent
method to account for such responses so an arbitrary choice must be
made.

In this case the only resolution is to pick some arbitrary scheme to
send to, and pick the protocol by which it will be sent, and pick some
method of formatting.  Ultimately this choice is irrelevant so when this
situation is encountered people install additional support services and
force a decision upon a specifically chosen audience.  This is a
potential security problem if either of those unrelated things can be
compromised and if there is not adequate security dedicated to each.

Hypothetical 2:
If a person wanted to open an email repository to open consumption by
HTTP, such as these emails that are available on the W3C, you could
easily address the emails as web pages and allow HTTP to provide all the
transmission work and error handling.  The emails are auto-populated
HTML documents on the web, after all.  What if you wanted to point to a
resource residing in that archive that is not text?  Could you do it
with HTTP if you were provided authentication?  Probably not.  You know
where the resource is and how to address it, which completes the
requirements for URI.  If the resource is not residing in a path under
the web root of a web service then you cannot access it by HTTP.  If you
are not granted remote access to enter the operating system of the
machine in which that resource resides you cannot access it by the file
scheme either no matter how much access you have on the file system.
You could address the resource as a path relative to the remote email
address, and such an address would be valid, unique, and universal in
accordance with URI because email addresses have domains, but it would
still not return the resource.  There is not a scheme independent method
or a scheme to access that resource even though a valid URI can be
supplied.

At the very least is there is a standard to help define how to create
new URI scheme handling so I could write a solution for my own use?
Yes, the scheme literal is merely an instance of URI.  Is there standard
guidance for sending, encoding, and packaging automated responses or
error handling of new or unique scheme?  No.  If I reconfigured my
application to accept a scheme that I wish to use privately and
internally is there guidance for configuring some generic language
agnostic acceptance of that traffic and sending responses?  No.  Will it
work if I built all this from the ground up?  Maybe not, because I
accounted for addressing and only half of the transmission requirements.
Is there application agnostic guidance then for how to supply a unique
scheme as a means of addressing on top of a prior existing transmission
protocol?  No.  What else is there to do?  Come complain about URL
needing to be updated to address these concerns on the URI list.

The point I am trying to make with hypothetical 2 is that prior practice
seems to indicate that URI schemes are created using new vocabulary and
URI syntax, but are inherently bound to a single specific transmission
protocol.  There is no language defining what a URI scheme is, aside
from a dedicated point in a URI string, or how it must work.  Why can't
URI schemes be dynamic about which transmission protocol they will
transmit against?  This is an application preference issue and is
completely removed from and not specified by RFC 3986.  Data URIs, for
example, do not transmit any ware.  They store base64 encoded strings
from binary or text that is URI encoded.  If JavaScript code were
supplied as a data URI fragment that is interpreted and executed could a
means of transmission be returned natively to the data scheme identifier
to prevent malicious execution of the JavaScript to transmission
requests of its own?  Yes, because data URIs, like everything other URI
scheme I have seen is bound to a dedicated means of transmission, which
is to not transmit.  Since there is no guidance for altering the
transmission of a given scheme I could not redefine how a given scheme
transmits in my own user agent.  So, why must an instance of URI scheme
be associated with a means of transmission and why must that association
be completely dedicated and beyond user interference?
Thanks,

Austin Cheney, Travelocity User Experience
CISSP TS/SCI
Received on Wednesday, 12 January 2011 19:22:47 UTC