RE: How to mark up a document other than a web page? from Carole Goble on 2023-01-05 (public-bioschemas@w3.org from January 2023)

From: Carole Goble <carole.goble@manchester.ac.uk>
Date: Thu, 5 Jan 2023 10:09:11 +0000
To: Dan Bolser <dan.bolser@gmail.com>, Franck Michel <fmichel@i3s.unice.fr>
CC: "public-bioschemas@w3.org" <public-bioschemas@w3.org>, Fabien Gandon <fabien.gandon@inria.fr>
Message-ID: <LO2P265MB365908D6CD42542B47D3B7E9A1FA9@LO2P265MB3659.GBRP265.PROD.OUTLOOK.COM>
https://zenodo.org/record/7147703#.Y7agoxXP2F4   is a longer talk that sets up the RO-Crate vision

Carole


Professor Carole Goble CBE FREng FBCS CITP
Department of Computer Science
The University of Manchester,
Manchester, M13 9PL, UK

Head of Node ELIXIR-UK<https://elixiruknode.org/>

PLEASE Do not send me a calendar invite and expect me to see it. (i) Invites only work 50% of the time (ii) if they do work they do not appear as email so I don’t know they are there until it is too late.
Want me at a meeting? Email me. Don’t just silently sneak into a diary I do not use.

From: Carole Goble <carole.goble@manchester.ac.uk>
Sent: 05 January 2023 09:54
To: Dan Bolser <dan.bolser@gmail.com>; Franck Michel <fmichel@i3s.unice.fr>
Cc: public-bioschemas@w3.org; Fabien Gandon <fabien.gandon@inria.fr>; Carole Goble <carole.goble@manchester.ac.uk>
Subject: RE: How to mark up a document other than a web page?

I have forwarded this thread to RO-Crate folks to pitch in

RO-Crate https://www.researchobject.org/ro-crate/  packages files and annotates them with rich metadata (using Bagit). It uses JSON-LD and schema.org. It’s an example of using schema.org for multiple files not web pages.

RO-Crate has gained a lot of traction in organisations needing to exchange digital objects with structured machine readable metadata, and is designed to be repository neutral – that is, enable inter-repo exchange. Zenodo and DataVerse have work ongoing to build compliance.
https://zenodo.org/record/7376356#.Y7adghXP2F4 is a talk about the repository overlay aspect of RO-Crate

Carole


Professor Carole Goble CBE FREng FBCS CITP
Department of Computer Science
The University of Manchester,
Manchester, M13 9PL, UK

Head of Node ELIXIR-UK<https://elixiruknode.org/>

PLEASE Do not send me a calendar invite and expect me to see it. (i) Invites only work 50% of the time (ii) if they do work they do not appear as email so I don’t know they are there until it is too late.
Want me at a meeting? Email me. Don’t just silently sneak into a diary I do not use.

From: Dan Bolser <dan.bolser@gmail.com<mailto:dan.bolser@gmail.com>>
Sent: 05 January 2023 09:39
To: Franck Michel <fmichel@i3s.unice.fr<mailto:fmichel@i3s.unice.fr>>
Cc: public-bioschemas@w3.org<mailto:public-bioschemas@w3.org>; Fabien Gandon <fabien.gandon@inria.fr<mailto:fabien.gandon@inria.fr>>
Subject: Re: How to mark up a document other than a web page?

https://www.tomforth.co.uk/scienceandpdfs/


Looks useful


On Wed, Jan 4, 2023, 5:50 PM Franck Michel <fmichel@i3s.unice.fr<mailto:fmichel@i3s.unice.fr>> wrote:
Dear community,

First of all, let me wish you all a happy, richly marked up new year ;).

Schema.org is meant to mark up ressources of any kind on the internet, not just web pages. While presenting Bioschemas, I once had this question: how do I mark up a pdf file? More generally, how to mark up any resource other than an html or xml-based content, like pdf, image, csv, Excel sheet, zip archive etc. ?

I recently asked this during a BSC meeting but it seemed that nobody had really faced this use case yet. And I did a quick Google search but nothing came up. So I'd be interested in having your thoughts on this.

A basic solution would be to insert markup in the web page that provides the download link. Not so satisfying since, when an application downloads the file using its direct URL, there is no more markup.

I could think of a simple solution that uses the HTTP Link header to point to a file containing the markup data (similarly to what's been done in JSON-LD<https://www.w3.org/TR/json-ld/#interpreting-json-as-json-ld> or CSCW<https://www.w3.org/TR/tabular-data-model/#link-header>). The exchange would look like this:

GET /document.pdf HTTP/1.1
Host: example.com<http://example.com>

====================================

HTTP/1.1 200 OK
Content-Type: application/pdf
Link: <document_metadata.json>; rel="meta"; type="application/ld+json"
...

Where document_metadata.json is a JSON-LD description of the file and its topic (written with Schema.org and Bioschemas of course). I'm not sure whether rel="meta" is the best choice here, but that's just an example.

Note that some metadata may already be embedded in pdf and image files by means of XMP<https://en.wikipedia.org/wiki/Extensible_Metadata_Platform>, where Schema.org types and properties could be used. But this does not work with any type of file, plus applications may want to use only HTTP-based mechanisms to get the markup data, rather than have to read the content of binary files.

Have you seen this kind of use case and usage somewhere? Any other solution you could think of? Do search engines expect this kind of linking to external markup files?

Thx in advance. Regards,
   Franck.

--

Franck MICHEL, CNRS research engineer

Université Côte d’Azur, CNRS, Inria

I3S laboratory (UMR 7271)
Received on Thursday, 5 January 2023 10:09:26 UTC