Re: XHTML is no longer being maintained

We faced a similar issue with our CP/LD standard for scholarly content (https://www.niso.org/publications/z39105-2023-cpld).

Originally, we settled on XHTML to ensure that we could easily process the content using XML processors, but we received significant pushback from various sides (especially the user-facing community). Given that there are several reliable ways to process HTML into a DOM, we settled on “just” HTML5.

An additional benefit (for us) was that by dropping the XML, we now have almost entirely stratified the content layer (HTML) from the data layer (RDF). XML has too many affordances to overload content with information that is really just (meta)data and should be treated as such.

-Rinke

--
Rinke Hoekstra
Sr. Director Architecture – Knowledge
Industry Director of Elsevier’s Discovery Lab
ELSEVIER - Amsterdam
r.hoekstra@elsevier.com<mailto:r.hoekstra@elsevier.com>

Emails can arrive at all hours, but at Elsevier we respect your personal time. Feel free to respond to this email during your normal working hours.

From: Ivan Herman <ivan@w3.org>
Date: Saturday, 3 August 2024 at 11:11
To: Alyssa Riceman <alyssaricemanepub0@mailbox.org>, Brady Duga <duga@ljug.com>
Cc: public-publishingcg@w3.org <public-publishingcg@w3.org>
Subject: Re: XHTML is no longer being maintained

*** External email: use caution ***


On Aug 2, 2024 at 21:34 +0200, Brady Duga <duga@ljug.com>, wrote:
Switching away from XHTML to HTML has been a topic for years in the various EPUB related groups. From a reading system perspective, most RSes load their content into webviews or the browser as HTML anyway, since XHTML has been finicky since ... well, since forever. But there are parts of the pipeline in almost all RSes that assume they are getting well-formed XML, so we have never gone the route of allowing it as a core media type. Every once in a while there is a bit of a push when something breaks (e.g. scripting has some XHTML issues), but the cure has always been worse than the disease. Maybe the time has come (or is coming) to bring it up again.

There have been extensive discussions around EPUB + HTML over the years, see, for example:

https://github.com/w3c/epub-specs/issues/636

https://github.com/w3c/epub-specs/issues/2259


The fear has always been to break the existing infrastructure, which does not only involve Reading Systems, but also the full production line, epub checkers, etc.

That being said, spawning XHTML into a separate group does not seem to be realistic. With the complexity of HTML today, that would be an impossible task, and would almost surely lead to an incompatible branch off HTML. Eventually, the EPUB community at large may have to finally bite the bullet and open up to HTML but, so far, this has not been the case...

Cheers

Ivan

On Fri, Aug 2, 2024 at 12:14 PM Alyssa Riceman <alyssaricemanepub0@mailbox.org<https://mailto:alyssaricemanepub0@mailbox.org/>> wrote:
Hi!

According to modern editions of the HTML Living Standard (https://html.spec.whatwg.org/multipage/xhtml.html):

> the XML syntax is essentially unmaintained — in that, it’s not expected that any further features will ever be added to the XML syntax (even when such features have been added to the HTML syntax).

(Where 'the XML syntax' is XHTML.)

This seems worrying! One of the great advances of modern EPUB over the format's earlier versions was unpinning the versions of its core media types, allowing use of—among other things—arbitrarily-modern HTML in our XHTML content documents (within the limits of what readers will realistically be able to handle); but now here we are getting stuck on the path to outdatedness anyway, for our XHTML content documents, on the basis that they no longer will be modern HTML.

(Indeed, I've already personally run afoul of a case where this is relevant: XHTML, unlike modern HTML, lacks support for Declarative Shadow DOM, which I'd been hoping I might be able to make use of in a currently-ongoing EPUB-related project of mine.)

I don't know what, if anything, it would make sense to do about this. The ideal, of course, would be to magically produce some new maintainers for HTML's XML syntax so it can be returned to consistent up-to-date-ness with non-X HTML; but that seems likely to be difficult, potentially to the point of logistical infeasibility, and no other possible solutions have yet occurred to me which seem any more feasible than that one. (Likely-less-feasibly, of course, there's some temptation towards allowing use of non-XML-syntax HTML within EPUB; but that seems, from my admittedly-limited knowledge, likely to be an impractical path to go down which would inflict large amounts of difficulty on developers of reader software.)

Still, even absent immediate knowledge of a solution, it seems like a concern worth raising to the group's attention, and (as far as I can tell from skimming the group archives) not one which has already been raised by anyone else. So here it is. Does this appear to be a real problem to others here as it does to me? And, if so, are there any potential solutions I've missed which are apparent to others here and worth pursuing in more depth?

Thanks,
Alyssa Riceman





________________________________

Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The Netherlands, Registration No. 33158992, Registered in The Netherlands.

Received on Monday, 19 August 2024 09:47:12 UTC