Re: How to mark up a document other than a web page? from Dan Bolser on 2023-01-05 (public-bioschemas@w3.org from January 2023)

From: Dan Bolser <dan.bolser@gmail.com>
Date: Thu, 5 Jan 2023 09:39:00 +0000
To: Franck Michel <fmichel@i3s.unice.fr>
Cc: public-bioschemas@w3.org, Fabien Gandon <fabien.gandon@inria.fr>
Message-ID: <CAPBO=2nw=7nCr71Hsgy0+MdP4Moie3yUp47P15PW1wnN=KHhiQ@mail.gmail.com>

https://www.tomforth.co.uk/scienceandpdfs/

Looks useful


On Wed, Jan 4, 2023, 5:50 PM Franck Michel <fmichel@i3s.unice.fr> wrote:

> Dear community,
>
> First of all, let me wish you all a happy, richly marked up new year ;).
>
> Schema.org is meant to mark up ressources of any kind on the internet, not
> just web pages. While presenting Bioschemas, I once had this question: how
> do I mark up a pdf file? More generally, how to mark up any resource other
> than an html or xml-based content, like pdf, image, csv, Excel sheet, zip
> archive etc. ?
>
> I recently asked this during a BSC meeting but it seemed that nobody had
> really faced this use case yet. And I did a quick Google search but nothing
> came up. So I'd be interested in having your thoughts on this.
>
> A basic solution would be to insert markup in the web page that provides
> the download link. Not so satisfying since, when an application downloads
> the file using its direct URL, there is no more markup.
>
> I could think of a simple solution that uses the HTTP Link header to point
> to a file containing the markup data (similarly to what's been done in
> JSON-LD <https://www.w3.org/TR/json-ld/#interpreting-json-as-json-ld> or
> CSCW <https://www.w3.org/TR/tabular-data-model/#link-header>). The
> exchange would look like this:
>
> GET /document.pdf HTTP/1.1
> Host: example.com
>
> ====================================
>
> HTTP/1.1 200 OK
> Content-Type: application/pdf
> Link: <document_metadata.json>; rel="meta"; type="application/ld+json"
> ...
>
> Where document_metadata.json is a JSON-LD description of the file and its
> topic (written with Schema.org and Bioschemas of course). I'm not sure
> whether rel="meta" is the best choice here, but that's just an example.
>
> Note that some metadata may already be embedded in pdf and image files by
> means of XMP <https://en.wikipedia.org/wiki/Extensible_Metadata_Platform>,
> where Schema.org types and properties could be used. But this does not work
> with any type of file, plus applications may want to use only HTTP-based
> mechanisms to get the markup data, rather than have to read the content of
> binary files.
>
> Have you seen this kind of use case and usage somewhere? Any other
> solution you could think of? Do search engines expect this kind of linking
> to external markup files?
>
> Thx in advance. Regards,
>    Franck.
>
> --
> Franck MICHEL, CNRS research engineer
> Université Côte d’Azur, CNRS, Inria
> I3S laboratory (UMR 7271)
>
>

Received on Thursday, 5 January 2023 09:39:26 UTC