W3C home > Mailing lists > Public > whatwg@whatwg.org > October 2012

Re: [whatwg] URL: file: URLs

From: Boris Zbarsky <bzbarsky@MIT.EDU>
Date: Sun, 28 Oct 2012 13:51:38 -0400
Message-ID: <508D70AA.4020802@mit.edu>
To: Anne van Kesteren <annevk@annevk.nl>
Cc: whatwg@lists.whatwg.org
On 10/27/12 3:35 PM, Anne van Kesteren wrote:
> This is covered as we do this for all URLs currently with a "relative
> scheme" (http/ws/...). I know you indicated this as potentially
> problematic

Let's have that fight separately.  ;)

>> 2)  file:// URIs are parsed as a "no authority" URL in Gecko.  Quoting the
>> IDL comment:
...
> The parser in the specification should handle these in the same way.

Same as the comment I quoted?  As same as something else?

> I have not introduced a "no authority" concept however. The parser in
> the specification also preserves the host as other user agents seem to
> preserve it.

Well, the Gecko parser preserves the host at this stage assuming the URI 
was correctly formatted with a host.  Again:

   blah://foo/bar => blah://foo/bar

The interesting things happen when you have 0, 1, or 3 slashes between 
':' and "foo".  The handling of "foo" after this point is a separate issue.

>> 4)  For "no authority" URLs, including file://, on Windows and OS/2 only, if
>> what looks like authority section looks like a drive letter, it's treated as
>> part of the path.  For example, "file://c:/" is treated as the filename
>> "c:\".  "Looks like a drive letter" is defined as "ASCII letter (any case),
>> followed by a ':' or '|' and then followed by end of string or '/' or '\\'".
>> I'm not sure why this is checking for '\\' again, honestly.  ;)
>
> Is this part of URL parsing or part of doing something with the
> resulting URL?

In Gecko, it's part of URL parsing.  More precisely, it's part of the 
normalization performed as part of constructing a "URL" object from a 
string.  Since this is also how we parse URLs, it's effectively all part 
of the package.

But note that it would be a bit odd of file://c:/ claimed to have a host 
of "c" with a default port or some such...

>> 5)  When parsing a "no authority" URL (including file://), and when item 4
>> above does not apply, it looks like Gecko skips everything after "file://"
>> up until the next '/', '?', or '#' char before parsing path stuff.
>
> So the host is dropped?

In Gecko, I believe so, yes.  I'm not saying this is desirable; just 
what Gecko does.

>> 6)  On Windows and OS/2, when dynamically parsing a path for a "no
>> authority" URL (not sure whether this is actually web-exposed, fwiw...)
>> Gecko will do something involving looking for a path that's only an ASCII
>> letter followed by ':' or '|' followed by end of string.
...
>> 7)  When doing URI equality comparisons
...
>> 8)  When actually resolving a file:// URL
> These points do not seem to be about parsing, correct?

Well, point 6 is about parsing, sort of.

7 and 8 are not, though at some point we'll need to define equality 
comparisons anyway.

-Boris
Received on Sunday, 28 October 2012 17:52:22 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 30 January 2013 18:48:11 GMT