[i18n-drafts] [articles/idn-and-iri/index] Information in the multilingual web addresses article needs to be updated (#564)

xfq has just created a new issue for https://github.com/w3c/i18n-drafts:

== [articles/idn-and-iri/index]  Information in the multilingual web addresses article needs to be updated ==
[source] (https://www.w3.org/International/articles/idn-and-iri/) [en]

Some information in this article needs to be updated, like:

1. Only URI (RFC3986) and IRI (RFC3987) are mentioned in the article. We might want to add information about the WHATWG URL Standard.
2. We should update the HTML 4.0 example to "HTML".
3. We should update the links to the RFC specifications to point to https://www.rfc-editor.org/
4. "top level domains" should be "top-level domains"

>  In this case, if we were to use percent-escaping to transform the (same) characters in the address so that they to conform to the URI requirements, we would base the escapes on the bytes that represent 引き割り.html in Shift-JIS.

5. "they to conform to" above should be "they conform to".
6. `mod_fileiri` looks unmaintained, should we keep the reference to it?
7. The reference to Internet Explorer and Netscape should probably be removed.

> You can run a basic check to see whether IDNs work on your system using this simple test.

8. ^ There should be a more up-to-date test.

> Different browsers use different strategies to determine whether the URI should be shown in Unicode or punycode.

9. "URI" should be "IRI" instead above?
10. The handling of IDNs by different browsers is mentioned, but we should link to some more updated resources like https://chromium.googlesource.com/chromium/src/+/main/docs/idn.md and https://wiki.mozilla.org/IDN_Display_Algorithm#Algorithm

> There is a similar issue with the use of simplified vs. traditional characters in the Chinese Han script.

11. This isn't a huge problem, because if a character isn't unified, most people who know Simplified Chinese or Traditional Chinese can easily see the difference. The bigger problem are things like Kangxi radicals (such as U+2F04 乙 and U+4E59 乙) and duplicate encoded characters (such as 㘽 U+363D and 㦳 U+39B3), because the glyphs are often the same.

> There are some improvements needed to the specifications for IDN and IRIs, and these are currently being discussed. For example, there is a need to extend the range of Unicode characters that can be used in domain names to cover later versions of Unicode, and to allow combining characters at the end of labels in right to left scripts.

12. What's the status of this? ^
13. "ICANN Guidelines for the Implementation of Internationalized Domain Names Version 2.1" should be updated. There is now a [newer version](https://www.icann.org/resources/pages/idn-guidelines-2011-09-02-en).
14. The link to "IDN and IRI test pages" has been moved.
15. The link to "IDN-enabled TLDs supported by Mozilla.org" should be updated.
16. It might be useful to add or link to related information about the differences between IDNA2003, IDNA2008, and `UTS #46`. For example, 2003 normalizes ß to ss while 2008 makes it a valid character.
17. A link to `UTS #46` should be added in the Further Reading section.

> Examples of registered IDNs

> IDN and URI [PDF], Michel Suignard

> Opera International Domain Name support

> Safari International Domain Name support

18. These four links are broken. ^

I can raise a PR to fix some of the issues above.

Please view or discuss this issue at https://github.com/w3c/i18n-drafts/issues/564 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Monday, 4 December 2023 06:46:40 UTC