- From: Sarven Capadisli <info@csarven.ca>
- Date: Wed, 20 Mar 2024 11:54:28 +0100
- To: public-solid@w3.org
On 2024-03-19 14:35, Leonard Rosenthol wrote: > I wasn’t trying to say that PDF is better than your format – I was > simply trying to make sure that no misinformation was spread. Nothing more. :) The subject line is intended as a lighthearted joke because I'm aware of where these kinds of discussions tend to go. >>I presume you're referring to extracting / mapping XMP > > No, that is not what I am referring to at all. I am referring the > feature of PDF called “Tagged PDF”, which has been part of the standard > for 25 years. And don’t forget that PDF has been an ISO standard (ISO > 32000) since 2008 and is a normative reference in the HTML5 > specification (which is the basis for the “open web”). As in https://html.spec.whatwg.org/multipage/system-state.html#dom-navigator-pdfviewerenabled that's part of NavigatorPlugins (non-normative)? Either way, my point wasn't that there are no references to PDF or it is entirely unsupported in any way. I find working with HTML (over PDF) as source to be > In PDF 2.0 (ISO 32000-2), we added support for RDFa as part of that. Thanks for bringing that to my attention. I stand corrected. As mentioned in my earlier email, I was running on knowledge prior to ISO-32000-2. So, I have to say that I'm amazed that RDFa even made its way into PDF! > Here is a picture from a presentation that I give on the topic showing > the tagging with RDFa semantics and the associated derived HTML. > > A screenshot of a computer Description automatically generated If you have a reference to an example PDF+RDFa document that you can link to - private to me is also okay - as well as an HTML+RDFa serialization, I'd love to inspect. Reading ISO 32000-2:2020 (with errata), since `/O` allows RDFa attributes, and presumably any conforming value, it doesn't have the limitations as XMP (e.g., any subject `about=` could be described I take it?) > There are numerous open source tools & libraries that give you access to > this information when present in a PDF, as well as tools for creating it > in the first place. Even common publishing solutions such as > Open/LibreOffice, various (La)TeX implementations and even commercial > solutions also support creation of Tagged PDFs. I acknowledge that this is useful in a pipeline where the "graph" inside those documents - or as an alternative representation for the PDF, whether in HTML+RDFa, Turtle, or something else - can be extracted and be accessible from a Solid storage. I find HTML(+RDFa) to be least frictionless and lossless to work with for a wide range of information as the source format, especially when a human- and machine-readable view needs to end up in the browser. JavaScript doesn't even need to enter the picture until HTTP write-operations are needed or for the behaviour layer on the document. > Personally, I don’t think there is a “best solution for all cases of > information sharing”. It is entirely dependent on whether the goal is > to share information with humans, with machines or with both. It is > also important to consider additional requirements such as > longevity/stability of the information. And so folks should always > choose what works best for them and their use cases. (and with that > said, I think the Solid platform and its technologies bring some > excellent pieces to the world – which is why I am here in this group!) I agree on all points. Which brings me back to the subject line of this email... where we tend to get into weeds about formats/serializations.. tabs vs. spaces.. as you know, it is pretty easy to run into why Turtle can beat up JSON-LD or vice-versa... (until of course HTML enters chat). Meanwhile none of the plumbing matters to the end-user. -Sarven https://csarven.ca/#i
Received on Wednesday, 20 March 2024 10:54:36 UTC