Re: XHTML is no longer being maintained from Ivan Herman on 2024-08-20 (public-publishingcg@w3.org from August 2024)

From: Ivan Herman <ivan@w3.org>
Date: Tue, 20 Aug 2024 12:09:36 +0200
To: "\"Hoekstra, Rinke (ELS-AMS)\"" <r.hoekstra@elsevier.com>, MURATA <eb2mmrt@gmail.com>
Cc: Alyssa Riceman <alyssaricemanepub0@mailbox.org>, Brady Duga <duga@ljug.com>, "\"public-publishingcg@w3.org\"" <public-publishingcg@w3.org>
Message-ID: <7312a587-dd87-42a1-98a9-ab2206b01a7f@Spark>
Hi Makoto,

We may have to be more precise in what we are saying...

If "XHTML" syntax is all we are saying, i.e.,essentially, that we put closing tags everywhere, I do not think that is a problem; no browser, or webview based application, would refuse that or misinterpret it. I think the main issue is what is the declaration of the file: is it defined as an XML file through its media type, its first few lines with the DTD reference, or HTML5. The details of the processing in the browser/webview would depend on that declaration with regard to the way scripts operate.

So what does it mean when you say "QTI3 is unlikely to adopt the HTML syntax"? Does it mean that they will generate those narrative subtrees in HTML but using the XML conventions, or that they will rely on XHTML processing of, say, scripts, script tags, and things like that?

Ivan


----
Ivan Herman, W3C
Home: http://www.w3.org/People/Ivan/
mobile: +33 6 52 46 00 43
ORCID ID: https://orcid.org/0000-0003-0782-2704
On Aug 19, 2024 at 12:20 +0200, MURATA <eb2mmrt@gmail.com>, wrote:
> I am now involved in QTI3, which is an XML-based language for representing assessments or tests.  Basically, the top-level structure is represented by QTI-specific tags while narrative subtrees are represented by XHTML.  QTI3 is unlikely to adopt the HTML syntax.
>
>  --
> Regards,
> Makoto
>
>
> > 2024年8月19日(月) 午後6:47 Hoekstra, Rinke (ELS-AMS) <r.hoekstra@elsevier.com>:
> > > We faced a similar issue with our CP/LD standard for scholarly content (https://www.niso.org/publications/z39105-2023-cpld).
> > >
> > > Originally, we settled on XHTML to ensure that we could easily process the content using XML processors, but we received significant pushback from various sides (especially the user-facing community). Given that there are several reliable ways to process HTML into a DOM, we settled on “just” HTML5.
> > >
> > > An additional benefit (for us) was that by dropping the XML, we now have almost entirely stratified the content layer (HTML) from the data layer (RDF). XML has too many affordances to overload content with information that is really just (meta)data and should be treated as such.
> > >
> > > -Rinke
> > >
> > > --
> > > Rinke Hoekstra
> > > Sr. Director Architecture – Knowledge
> > > Industry Director of Elsevier’s Discovery Lab
> > > ELSEVIER - Amsterdam
> > > r.hoekstra@elsevier.com
> > >
> > > Emails can arrive at all hours, but at Elsevier we respect your personal time. Feel free to respond to this email during your normal working hours.
> > >
> > > From: Ivan Herman <ivan@w3.org>
> > > Date: Saturday, 3 August 2024 at 11:11
> > > To: Alyssa Riceman <alyssaricemanepub0@mailbox.org>, Brady Duga <duga@ljug.com>
> > > Cc: public-publishingcg@w3.org <public-publishingcg@w3.org>
> > > Subject: Re: XHTML is no longer being maintained
> > > *** External email: use caution ***
> > >
> > > On Aug 2, 2024 at 21:34 +0200, Brady Duga <duga@ljug.com>, wrote:
> > > > quote_type
> > > > Switching away from XHTML to HTML has been a topic for years in the various EPUB related groups. From a reading system perspective, most RSes load their content into webviews or the browser as HTML anyway, since XHTML has been finicky since ... well, since forever. But there are parts of the pipeline in almost all RSes that assume they are getting well-formed XML, so we have never gone the route of allowing it as a core media type. Every once in a while there is a bit of a push when something breaks (e.g. scripting has some XHTML issues), but the cure has always been worse than the disease. Maybe the time has come (or is coming) to bring it up again.
> > >
> > > There have been extensive discussions around EPUB + HTML over the years, see, for example:
> > >
> > > https://github.com/w3c/epub-specs/issues/636
> > > https://github.com/w3c/epub-specs/issues/2259
> > >
> > > The fear has always been to break the existing infrastructure, which does not only involve Reading Systems, but also the full production line, epub checkers, etc.
> > >
> > > That being said, spawning XHTML into a separate group does not seem to be realistic. With the complexity of HTML today, that would be an impossible task, and would almost surely lead to an incompatible branch off HTML. Eventually, the EPUB community at large may have to finally bite the bullet and open up to HTML but, so far, this has not been the case...
> > >
> > > Cheers
> > >
> > > Ivan
> > > > quote_type
> > > >
> > > > On Fri, Aug 2, 2024 at 12:14 PM Alyssa Riceman <alyssaricemanepub0@mailbox.org> wrote:
> > > > > quote_type
> > > > > Hi!
> > > > >
> > > > > According to modern editions of the HTML Living Standard (https://html.spec.whatwg.org/multipage/xhtml.html):
> > > > >
> > > > > > the XML syntax is essentially unmaintained — in that, it’s not expected that any further features will ever be added to the XML syntax (even when such features have been added to the HTML syntax).
> > > > >
> > > > > (Where 'the XML syntax' is XHTML.)
> > > > >
> > > > > This seems worrying! One of the great advances of modern EPUB over the format's earlier versions was unpinning the versions of its core media types, allowing use of—among other things—arbitrarily-modern HTML in our XHTML content documents (within the limits of what readers will realistically be able to handle); but now here we are getting stuck on the path to outdatedness anyway, for our XHTML content documents, on the basis that they no longer will be modern HTML.
> > > > >
> > > > > (Indeed, I've already personally run afoul of a case where this is relevant: XHTML, unlike modern HTML, lacks support for Declarative Shadow DOM, which I'd been hoping I might be able to make use of in a currently-ongoing EPUB-related project of mine.)
> > > > >
> > > > > I don't know what, if anything, it would make sense to do about this. The ideal, of course, would be to magically produce some new maintainers for HTML's XML syntax so it can be returned to consistent up-to-date-ness with non-X HTML; but that seems likely to be difficult, potentially to the point of logistical infeasibility, and no other possible solutions have yet occurred to me which seem any more feasible than that one. (Likely-less-feasibly, of course, there's some temptation towards allowing use of non-XML-syntax HTML within EPUB; but that seems, from my admittedly-limited knowledge, likely to be an impractical path to go down which would inflict large amounts of difficulty on developers of reader software.)
> > > > >
> > > > > Still, even absent immediate knowledge of a solution, it seems like a concern worth raising to the group's attention, and (as far as I can tell from skimming the group archives) not one which has already been raised by anyone else. So here it is. Does this appear to be a real problem to others here as it does to me? And, if so, are there any potential solutions I've missed which are apparent to others here and worth pursuing in more depth?
> > > > >
> > > > > Thanks,
> > > > > Alyssa Riceman
> > > > >
> > > > >
> > > > >
> > > > >
> > >
> > > Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The Netherlands, Registration No. 33158992, Registered in The Netherlands.
Received on Tuesday, 20 August 2024 10:09:45 UTC