[Bug 15254] New: Don't forbid underscore in host names in URLs

https://www.w3.org/Bugs/Public/show_bug.cgi?id=15254

           Summary: Don't forbid underscore in host names in URLs
           Product: HTML WG
           Version: unspecified
          Platform: PC
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: HTML5 spec (editor: Ian Hickson)
        AssignedTo: ian@hixie.ch
        ReportedBy: lambda@continuation.org
         QAContact: public-html-bugzilla@w3.org
                CC: mike@w3.org, public-html-wg-issue-tracking@w3.org,
                    public-html@w3.org


Step 6 of section 2.6.3, resolving URLs
<http://www.w3.org/TR/html5/urls.html#resolving-urls> requires that the ToASCII
algorithm of IDNA 2003 (RFC 3490, http://tools.ietf.org/html/rfc3490) be called
with the UseSTD3ASCIIRules flag set. The UseSTD3ASCIIRules flag says that the
rules specified in STD3 (RFC 1122) for host names should be enforced. This
means that host name labels are restricted to an alphanumeric character,
followed by alphanumeric and hyphens, followed by an alphanumeric character.

Host names in the wild can contain underscores, and most software seems to cope
just fine with them. I discovered this problem when someone had problems
submitting such a URL to Reddit
<http://www.reddit.com/r/boston/comments/neb4h/boston_hockey_player_didnt_get_kicked_from_the/c38emxu>,
which enforces the host name restriction. However, none of the browsers I tried
(Firefox, Chrome, Safari, and Opera, all on Mac OS X 10.7.2) implemented this
restriction; that host name works fine in all of them. I've checked the Alexa
Top Million Sites <http://s3.amazonaws.com/alexa-static/top-1m.csv.zip>, and
found over a dozen hosts that contain underscores in their names.

I would recommend relaxing the UseSTD3ASCIIRules restriction, by a willful
violation of RFC 3490 (or its successor, RFC 5891
<http://tools.ietf.org/html/rfc5891>, if that is ever used), to allow the
underscore in the same places that a hyphen is allowed.

-- 
Configure bugmail: https://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

Received on Saturday, 17 December 2011 09:58:33 UTC