Re: Embedded Content

On Wed, Oct 15, 2014, at 07:07, Ivan Herman wrote:
> Thanks for the pointer, Nick. I didn't realize it was that messy...
> 

It was Randall that pointed out the mess, not me!

That said, the article Randall linked to is about JavaScript's internal
string encoding, which is -- as the article discusses -- a bizarre
halfway house between UCS-2 and UTF-16.

That shouldn't (AFAIK) affect the issue of mandated encodings for
embedded content. User agents can still write unicode text from
JavaScript onto the wire as UTF-8.

As I understand it, the use case for embedding is as follows:

"For small annotation bodies, the overhead associated with creating a
concrete resource elsewhere on the web is unacceptable, so we want a way
to embed sufficiently small bodies in the Annotation resource itself."

If embedded bodies will be small, the advantages of UTF-16 over UTF-8
for asian texts will be minimal, and thus I'd be in favour of omitting
the character encoding and mandating UTF-8.

That said, I will happily reverse my position if someone has evidence
that omitting support for UTF-16 in embedded bodies will negatively
affect adoption of our standard in China/Thailand/etc.

-N

Received on Wednesday, 15 October 2014 11:37:29 UTC