[i18n-discuss] Further comments on Unicode FAQ: Unicode and the Web (#26)

r12a has just created a new issue for https://github.com/w3c/i18n-discuss:

== Further comments on Unicode FAQ: Unicode and the Web ==
Some comments on Unicode and the Web https://corp.unicode.org/%7Easmus/proposed_faq/unicode_web

[1] In https://corp.unicode.org/%7Easmus/proposed_faq/unicode_web.html the links to W3C content should not show .en.html or index.en.html.   For example, https://www.w3.org/International/questions/qa-choosing-encodings has German, Spanish, Brazilian Portuguese and Swedish translations. We content negotiate access to those pages and provide translations where available, but including the extensions blocks that.

Also, we usually prefer not to show naked URLs in text.  I suggest replacing

See https://www.w3.org/International/questions/qa-choosing-encodings.en.html.

with

See the W3C article `<a href="https://www.w3.org/International/questions/qa-choosing-encodings">Choosing & applying a character encoding</a>`.

etc

---

[2] Q: We are setting up a database for use with our web server. Does Unicode cover all the character sets we need for a web server?

I feel like this answer should start with "Yes."

Perhaps it would also be worth describing how Unicode greatly simplifies the storage of multilingual data, since most non-Unicode encoded database data will be in multiple languages and code pages, and managing or extending that is a pain in the neck.  That all goes away with Unicode encoded databases.

---


[3] Q: What are Numerical and Named Character References?

This is not really about non-ASCII characters (especially for people working from non-ASCII keyboards). For example, many keyboards have non-ASCII § and ± which don't need to be escaped because you can type them directly. It's rather that this allows you to add the odd character to the text when you don't have a way to input it directly from the keyboard, or to clearly see invisible or ambiguous characters in the source.

I'm dubious about "not handled well by many search engines".  Is that true??  I'm also not particularly impressed by other cons listed.

So here's a suggestion for a rewrite of those 3 Q&As, as a single Q&A:


> Q: What are Numerical and Named Character References?
> 
> Instead of simply including a character such as an “a” in a file, you can instead write it using the Unicode code point value as a Numerical Character Reference (NCR), such as “`&#x61;`” (using the hex code point value) or “`&#97;`” (using the decimal code point value). For help with calculating hexadecimal and decimal NCRs, see the `<a href="https://r12a.github.io/app-conversion/">Unicode code converter</a>` page.
> 
> Named character references are similar, except that they use abbreviations, such as “`&eacute;`” instead of numbers.
> 
> This can be useful when you don't have a character on your keyboard, such as a trademark sign (™) or alpha (α). It can also be useful to clarify visually ambiguous characters in your source code, such as distinguishing a non-break space (`&#xA0;/&nbsp;` vs. a normal space) from an ordinary space, or to make it clear the use of invisible characters or visually ambiguous characters in your source code (such as `&#x200F;/&rlm;`).
> 
> You should avoid overuse of NCRs because they make it harder to read source text when direct character input would suffice. It also takes longer to create them.
> 
> A similar character escape mechanism can be used in CSS, but the format is slightly different.
> 
> For more information about character escapes on the Web see the W3C page `<a href="https://www.w3.org/International/questions/qa-escapes">Using character escapes in markup and CSS</a>`.


By the way, one of the main reasons i use NCRs is to prevent normalisation in example text.  For example, to produce NFD e-acute in an editor that automatically NFC-normalises your text you can use `e&eacute;`. It's particularly useful for examples involving nuktas and such.  But i suspect that that use case might be a little esoteric for inclusion here(?)

---

[4] And finally, does the Q&A about email really belong in a FAQ about the Web – don't we have an FAQ about email?

hope that helps

Please view or discuss this issue at https://github.com/w3c/i18n-discuss/issues/26 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Thursday, 18 August 2022 10:51:22 UTC