- From: Aidan Hogan <aidhog@gmail.com>
- Date: Thu, 23 Oct 2025 21:20:10 -0300
- To: Sarven Capadisli <info@csarven.ca>
- Cc: "tgdk@dagstuhl.de" <tgdk@dagstuhl.de>, semantic-web <semantic-web@w3.org>
Hi Sarven, Thanks for the feedback! Great to have these comments on this initiative from someone with so much experience on the topic. :) My understanding is that for the foreseeable future, Dagstuhl Publishing (in copy) will be following a TeX 2 HTML pipeline as their extensive back catalogue has TeX sourcing, and indeed as you mention the goal is to make the final publication available in HTML. From conversations with them, the TeX 2 HTML pipeline is indeed not a trivial task to implement, especially since they have a small developer team, so this is already a big achievement we are very pleased with. I understand that having a native HTML pipeline is not planned at the moment, but rather the plan is to consolidate the HTML exports from the TeX-based pipeline. (Of course the features you mention would be great to have in the future, but I understand that these are not in the current plans.) Having more machine readable metadata as you mention might be more feasible, however, in the short term. Best, Aidan On 2025-10-19 06:14, Sarven Capadisli wrote: > On 2025-10-16 19:01, Aidan Hogan wrote: > >> As always, TGDK articles are published under Diamond Open Access (no >> fees for authors or readers) > > +1 > >> Secondly, thanks to tireless work by the Dagstuhl Publishing team, we >> are pleased to announce the publication of HTML versions of all of the >> past TGDK articles: >> >> https://drops.dagstuhl.de/search?term=TGDK&type=Document/HTML >> >> This initiative is part of a year-long pilot to study the feasibility >> for Dagstuhl Publishing of long-term HTML support in order to improve >> accessibility to the research results they publish. > > +1 > > Congrats on the initiative and looking forward to the advancements! > > > Some comments about the HTML that may interest you: > > There are quite a few accessibility issues (see WCAG). > > The HTML uses arbitrary tags to encapsulate some content, but there are > specific and appropriate HTML elements that should be used instead. For > example, a reader cannot tell that some content is a list item or inline > code if their user agent does not handle CSS or cannot visually perceive > it. > > Some content in articles appear in three instances, e.g.: > > * Inline in the HTML (the content made human-visible when rendered by > the user agent) > * In HTML meta tags (machine-readable only, presumably for SEO or > specific crawlers) > * In HTML script blocks (machine-readable only) > > On the other hand, when the primary source, the human-visible content, > is marked with RDFa, it also becomes machine-readable without duplication. > > The source of the HTML output seems to be derived from LaTeX: > > * The accuracy and richness of the output will be limited by the > converter (e.g., all of the above issues). > * This may not be an issue if the goal is only to make the "final" > publication available as HTML, but the only way to update the HTML > output is to go back to the source format and re-transform. > * Using HTML as the source format may greatly simplify the publication > process. > > Are there plans to have the HTML express, e.g., problem statements, > motivation, hypothesis, arguments, workflow steps, methodology, design, > results, evaluation, conclusions, future challenges, as well as all > inline semantic citations (i.e., typed, beyond doc-cites-doc) so that > more machine-processable information can be gathered from scientific > articles that are advancing our understanding of graph data and knowledge? > > -Sarven > https://csarven.ca/#i
Received on Friday, 24 October 2025 00:20:18 UTC