- From: Deyan Ginev <deyan.ginev@gmail.com>
- Date: Tue, 30 Apr 2024 11:31:38 -0400
- To: www-math@w3.org, LaTeXML project <latexml@lists.informatik.uni-erlangen.de>
Received on Tuesday, 30 April 2024 15:32:09 UTC
Hi everyone, I am happy to announce that the latest ar5iv collection of HTML+MathML documents is now freely available for reuse as a dataset. The release contains 2.1 million HTML documents, and over 1 billion MathML expressions, generated by latexml v0.8.8. More details and download at: https://sigmathling.kwarc.info/resources/ar5iv-dataset-2024/ As a reminder, the "ar5iv Lab" is an HTML preview site for arXiv.org. As of late 2023, ar5iv is in the process of being phased out, as arXiv's official HTML coverage gradually reaches parity. Until then, it continues to be available at: https://ar5iv.labs.arxiv.org/ Best regards, Deyan
Received on Tuesday, 30 April 2024 15:32:09 UTC