- From: Dan Bolser <dan.bolser@gmail.com>
- Date: Thu, 5 Jan 2023 09:39:00 +0000
- To: Franck Michel <fmichel@i3s.unice.fr>
- Cc: public-bioschemas@w3.org, Fabien Gandon <fabien.gandon@inria.fr>
- Message-ID: <CAPBO=2nw=7nCr71Hsgy0+MdP4Moie3yUp47P15PW1wnN=KHhiQ@mail.gmail.com>
https://www.tomforth.co.uk/scienceandpdfs/ Looks useful On Wed, Jan 4, 2023, 5:50 PM Franck Michel <fmichel@i3s.unice.fr> wrote: > Dear community, > > First of all, let me wish you all a happy, richly marked up new year ;). > > Schema.org is meant to mark up ressources of any kind on the internet, not > just web pages. While presenting Bioschemas, I once had this question: how > do I mark up a pdf file? More generally, how to mark up any resource other > than an html or xml-based content, like pdf, image, csv, Excel sheet, zip > archive etc. ? > > I recently asked this during a BSC meeting but it seemed that nobody had > really faced this use case yet. And I did a quick Google search but nothing > came up. So I'd be interested in having your thoughts on this. > > A basic solution would be to insert markup in the web page that provides > the download link. Not so satisfying since, when an application downloads > the file using its direct URL, there is no more markup. > > I could think of a simple solution that uses the HTTP Link header to point > to a file containing the markup data (similarly to what's been done in > JSON-LD <https://www.w3.org/TR/json-ld/#interpreting-json-as-json-ld> or > CSCW <https://www.w3.org/TR/tabular-data-model/#link-header>). The > exchange would look like this: > > GET /document.pdf HTTP/1.1 > Host: example.com > > ==================================== > > HTTP/1.1 200 OK > Content-Type: application/pdf > Link: <document_metadata.json>; rel="meta"; type="application/ld+json" > ... > > Where document_metadata.json is a JSON-LD description of the file and its > topic (written with Schema.org and Bioschemas of course). I'm not sure > whether rel="meta" is the best choice here, but that's just an example. > > Note that some metadata may already be embedded in pdf and image files by > means of XMP <https://en.wikipedia.org/wiki/Extensible_Metadata_Platform>, > where Schema.org types and properties could be used. But this does not work > with any type of file, plus applications may want to use only HTTP-based > mechanisms to get the markup data, rather than have to read the content of > binary files. > > Have you seen this kind of use case and usage somewhere? Any other > solution you could think of? Do search engines expect this kind of linking > to external markup files? > > Thx in advance. Regards, > Franck. > > -- > Franck MICHEL, CNRS research engineer > Université Côte d’Azur, CNRS, Inria > I3S laboratory (UMR 7271) > >
Received on Thursday, 5 January 2023 09:39:26 UTC