[web-nfc] reg-name ABNF is unspecified (#350)

sleevi has just created a new issue for https://github.com/w3c/web-nfc:

== reg-name ABNF is unspecified ==
#278 introduced the `reg-name` within the ABNF, and prosaically describes it 
> a [registrable domain](https://url.spec.whatwg.org/#host-registrable-domain) owned by the issuing organization

However, the actual ABNF for `reg-name` is unspecified, and this can lead to some ambiguity. For example, the domain name system is 8-bit clean; `\x00.domain.example` is a valid domain name, which is made up of the labels [(1 byte, `\x00`), (6 bytes, `domain`), (7 bytes, `example`)] when transmitted on DNS wire format.

It's true that the host record type (A, or AAAA) uses "preferred name syntax" of letters, digits, and hyphens, but even that has caveats. For example, the URL Standard (and its predecessors) allowed underscores (`_`) within host names, unescaped. Many user agents would pass this on to their name resolution libraries (like `gethostbyname`), which would not enforce that input comes in preferred name syntax, and would wire-encode such names when looking up A and AAAA records.

Unfortunately, it's not yet uncommon to see underscores in hostnames, and [only recently](https://cabforum.org/2018/11/12/ballot-sc-12-sunset-of-underscores-in-dnsnames/) (and to some disruption [1](https://bugzilla.mozilla.org/show_bug.cgi?id=1516561), [2](https://bugzilla.mozilla.org/show_bug.cgi?id=1517617), [3](https://bugzilla.mozilla.org/show_bug.cgi?id=1515788), [4](https://bugzilla.mozilla.org/show_bug.cgi?id=1516453), [5](https://bugzilla.mozilla.org/show_bug.cgi?id=1516599)) have browsers reiterated that certificates for those domains should never exist.

Thus, it would be better to precisely describe the character space for `reg-name` in no ambiguous terms. If it's meant that `reg-name` should be imported from RFC 3986, that would be clearer. However, that ABNF and algorithm described is one known to cause significant pain for consistent, cross-browser implementation. ABNF-style approaches, and their failure to adequately describe processing models and error handling, are why things like the [URL Standard](https://url.spec.whatwg.org/) exist.

To better ensure interoperability, please consider writing out a clear processing model intended for handling these sorts of tags, and a clear processing model intended for interoperably encoding these tags. Of particular concern and consideration is the need to make sure that it's clear the rules for distinguishing the "domain" portion and any extended attributes.

As a concrete example, one implementation issue I could foresee is the use of this API on IPv6 addresses, which, in URL presentation form, are encoded such as `[::1]`. It's subtle, but if we assume `reg-name` is meant to refer to the domain name, then this feature should not be allowed to work on such hosts. However, if a user agent doesn't prohibit this/explicitly check, they might permit a tag of `[::1]:foo` to be created. Is that valid or not? When another user agent encounters that tag, if they merely scan for `:`, they may parse the tag as (`[`, `:1]:foo`)(if they look for the first `:`), or (`[::1]`, `foo`) if they look for the last `:`. However, since `:` is allowed in the `other` of the NFC tag, it means a tag like `[::1]:foo:bar` is even more confusing to parsers!

Hopefully, the precision of the URL Standard provides a good model to concretely provide interoperable algorithms and encoding rules, to avoid many of the issues I highlighted above.

Please view or discuss this issue at https://github.com/w3c/web-nfc/issues/350 using your GitHub account

Received on Thursday, 12 September 2019 14:13:55 UTC