Re: How to mark up a document other than a web page?

Hi Steffen,

Thx for your answer.

Actually I was not assuming any particular infrastructure to enable the 
annotation. The solution you mention could work for the use case I was 
describing. Yet my concern would be about the generic applicability of 
such a solution. Imagine a website based on a CMS, that allows users to 
download binary files. Migrating to Bagit would probably conflict with 
the CMS that has its own way of organizing files. Unless there are ways 
of connecting Bagit with common CMSs.

Also, my point with using the Link header is that it relies only on the 
HTTP protocol, such that clients able to consume markup would look for 
the the Link header and fetch the JSON-LD description, whereas other 
clients would simply ignore the Link header and keep working as usual.

The whole idea is to be able to mark up the content of a pdf or image, 
just like we mark up the content of a web page. For instance the webpage 
about a protein has markup data with type schema:Protein and it refers 
to profile https://bioschemas.org/profiles/Protein/0.11-RELEASE. If you 
download the image of the protein within that page, you could get more 
or less the same markup data just by following the Link header. That 
could help search engines to better index any content, not just web pages.

Franck.

Le 04/01/2023 à 19:33, Neumann, Steffen a écrit :
> Hi Franck,
>
> my 2c here: this sounds like the task of annotation a bunch of files 
> in a generic (file based) repository.
> So unless you embed into the files themselves, most generic is to have 
> a good place to keep
> the metadata / bioschemas JSON+LD files.
>
> If you downloadhttps://dx.doi.org/10.22000/451,
> they use bagit https://en.wikipedia.org/wiki/BagIt
> with folders for descriptive-md/ and technical-md/.
> AFAIK there is no example for (bio)schemas markup in RADAR yet,
> but we intend to give it a try one day.
>
> Is that what you had in mind ?
> Yours,
> Steffen
>
>
>
>
> Steffen Neumann
> Tel:
> E-Mail:
>  +49 345 5582 1470
> sneumann@ipb-halle.de <mailto:sneumann@ipb-halle.de>
> https://mailtasticcdn.azureedge.net/img/images/siglinks/company/71639e11-6a7d-4a5d-bcf0-0b3dad2b4943/u_logo-TcV2hig3uu.png 
>  Leibniz-Institut für Pflanzenbiochemie
> Weinberg 3 | 06120 Halle | Deutschland
> Tel. +49 345 5582 0 | www.ipb-halle.de <http://www.ipb-halle.de/>
>
> Aktuell können Sie einige Informationen nicht sehen.Bitte aktivieren 
> Sie externe Inhalte, um die Mail vollständig angezeigt zu bekommen 
> oder klicken Sie hier. 
> <https://app.mailtastic.de/api/linkserve/campaign/8db9a5c7-eb2c-40ae-8772-a80a63bda70a/16995> 
>
>
> ------------------------------------------------------------------------
> *From:* Franck Michel <fmichel@i3s.unice.fr>
> *Sent:* Wednesday, January 4, 2023 18:49
> *To:* public-bioschemas@w3.org <public-bioschemas@w3.org>; Fabien 
> Gandon <fabien.gandon@inria.fr>
> *Subject:* How to mark up a document other than a web page?
> Dear community,
>
> First of all, let me wish you all a happy, richly marked up new year ;).
>
> Schema.org is meant to mark up ressources of any kind on the internet, 
> not just web pages. While presenting Bioschemas, I once had this 
> question: how do I mark up a pdf file? More generally, how to mark up 
> any resource other than an html or xml-based content, like pdf, image, 
> csv, Excel sheet, zip archive etc. ?
>
> I recently asked this during a BSC meeting but it seemed that nobody 
> had really faced this use case yet. And I did a quick Google search 
> but nothing came up. So I'd be interested in having your thoughts on this.
>
> A basic solution would be to insert markup in the web page that 
> provides the download link. Not so satisfying since, when an 
> application downloads the file using its direct URL, there is no more 
> markup.
>
> I could think of a simple solution that uses the HTTP Link header to 
> point to a file containing the markup data (similarly to what's been 
> done in JSON-LD 
> <https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.w3.org%2FTR%2Fjson-ld%2F%23interpreting-json-as-json-ld&data=05%7C01%7C%7Cc38f72d04142429d7c4608daee7c35dc%7C0934ee6c2a574efd80a9fc003defef4e%7C0%7C0%7C638084514515205914%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=OVk9kghscSoRqc9%2BEH0JXfEO2HKbhCKsIoooRhHr5JE%3D&reserved=0> 
> or CSCW 
> <https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.w3.org%2FTR%2Ftabular-data-model%2F%23link-header&data=05%7C01%7C%7Cc38f72d04142429d7c4608daee7c35dc%7C0934ee6c2a574efd80a9fc003defef4e%7C0%7C0%7C638084514515205914%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=dqaZ6H3I%2FKHCTyy110aqZYjmf046Vdo2Ci1simx9%2FlE%3D&reserved=0>). 
> The exchange would look like this:
>
> GET /document.pdf HTTP/1.1
> Host: example.com
>
> ====================================
>
> HTTP/1.1 200 OK
> Content-Type: application/pdf
> Link: <document_metadata.json>; rel="meta"; type="application/ld+json"
> ...
>
> Where document_metadata.json is a JSON-LD description of the file and 
> its topic (written with Schema.org and Bioschemas of course). I'm not 
> sure whether rel="meta" is the best choice here, but that's just an 
> example.
>
> Note that some metadata may already be embedded in pdf and image files 
> by means of XMP 
> <https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FExtensible_Metadata_Platform&data=05%7C01%7C%7Cc38f72d04142429d7c4608daee7c35dc%7C0934ee6c2a574efd80a9fc003defef4e%7C0%7C0%7C638084514515205914%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=kX3di2QSk8c5l1sZO9tYkmg%2Bv6%2B9wpwtAmT8BE9A%2BiE%3D&reserved=0>, 
> where Schema.org types and properties could be used. But this does not 
> work with any type of file, plus applications may want to use only 
> HTTP-based mechanisms to get the markup data, rather than have to read 
> the content of binary files.
>
> Have you seen this kind of use case and usage somewhere? Any other 
> solution you could think of? Do search engines expect this kind of 
> linking to external markup files?
>
> Thx in advance. Regards,
>    Franck.
> -- 
> Franck MICHEL, CNRS research engineer
> Université Côte d’Azur, CNRS, Inria
> I3S laboratory (UMR 7271)

Received on Thursday, 5 January 2023 09:09:05 UTC