Re: Support for XHTML5 from Johnston, Patrick - Hoboken on 2015-12-04 (public-scholarlyhtml@w3.org from December 2015)

From: Johnston, Patrick - Hoboken <pjohnston@wiley.com>
Date: Fri, 4 Dec 2015 18:38:29 +0000
To: Sebastian Heath <sebastian.heath@gmail.com>, W3C Scholarly HTML CG <public-scholarlyhtml@w3.org>
Message-ID: <463F4610-7948-4193-8DE5-5654C94D20D0@wiley.com>

I am not a big fan of SHOULDs, but I agree we should consider that UTF-8 perhaps doesn’t cover the breadth of scholarly research, in particular in the case of ancient or fictional languages (though Klingon is apparently unofficially supported).
Rather than making it a SHOULD, I would say MUST unless a UTF-8 encoding is not openly available.

An ancillary issue is that even though there are UTF-8 encodings for a lot of ancient languages, browsers don’t do much of a job of supporting them: http://www.fileformat.info/info/unicode/block/egyptian_hieroglyphs/utf8test.htm, so I assume that some consideration of polyfills is needed. (What is surprising, considering the geek chic factor, is that Klingon doesn’t get much traction either: http://www.wazu.jp/gallery/Test_Klingon.html.)

p

From: Sebastian Heath <sebastian.heath@gmail.com<mailto:sebastian.heath@gmail.com>>
Date: Friday, December 4, 2015 at 11:53 AM
To: W3C Scholarly HTML CG <public-scholarlyhtml@w3.org<mailto:public-scholarlyhtml@w3.org>>
Subject: Re: Support for XHTML5
Resent-From: <public-scholarlyhtml@w3.org<mailto:public-scholarlyhtml@w3.org>>
Resent-Date: Friday, December 4, 2015 at 11:53 AM

My only further comment on the encoding issue is that I work in and with scholarly communities that have had trouble getting their glyphs into the unicode standard. Those are long stories with legit concerns on both sides; with the true obscurity of long "dead" alphabets being a factor, of course.

  Meaning, "SH SHOULD be UTF-8 and this document assumes that is the case in its examples and discussion" is a more welcoming approach than MUST.

 -Sebastian

On Fri, Dec 4, 2015 at 11:47 AM, Silvio Peroni <silvio.peroni@unibo.it<mailto:silvio.peroni@unibo.it>> wrote:
Hi Ivan,

But I seem to be the only one worrying about that, so I don't mind
backing away from it if it means we can make progress on the rest.

No, you are not the only one worrying about that. I think it is perfectly fine to require that an SH would be in Unicode, and probably UTF-8 is the right way to go due to its widespread use.

Yes, please! I don’t really care about HTML syntax vs. XHTML syntax compared with the encoding issue…

The use of a mandatory encoding like UTF-8 is a very good requirement for having the minimum amount of troubles when processing SH documents – a.k.a., handling different encodings is a real nightmare. Brrr…

Have a nice day :-)

S.

----------------------------------------------------------------------------
Silvio Peroni, Ph.D.
Department of Computer Science and Engineering
University of Bologna, Bologna (Italy)
Tel: +39 051 2094871<tel:%2B39%20051%202094871>
E-mail: silvio.peroni@unibo.it<mailto:silvio.peroni@unibo.it>
Web: http://www.essepuntato.it

Twitter: essepuntato

Received on Friday, 4 December 2015 18:39:05 UTC