- From: Herbert Van de Sompel <hvdsomp@gmail.com>
- Date: Sat, 21 Jan 2023 18:27:11 +0100
- To: Franck Michel <fmichel@i3s.unice.fr>
- Cc: Carole Goble <carole.goble@manchester.ac.uk>, "LJ.Garcia" <lj.garcia.co@gmail.com>, Stian Soiland-Reyes <soiland-reyes@manchester.ac.uk>, Yvan Le Bras <yvan.le-bras@mnhn.fr>, Dan Bolser <dan.bolser@gmail.com>, public-bioschemas <public-bioschemas@w3.org>, Fabien Gandon <fabien.gandon@inria.fr>, Pierre-Antoine Champin <pierre-antoine@w3.org>
- Message-ID: <CAOywMHeOFkSF1zLnSfy4EbfBCYTLJcoR7RQ3qZQ_Fnf+ukXUjg@mail.gmail.com>
hi Franck, I forgot to mention that one can also use a "profile" attribute (with a URI as value) on a link to convey additional information about the format of the linked document. That's handy, e.g. when pointing to JSON(-LD) because JSON is all over the place. The "profile" attribute lets you be more expressive than the MIME type, i.e. which kind of JSON(-LD) format is being used. Greetings Herbert On Sat, Jan 21, 2023 at 6:22 PM Herbert Van de Sompel <hvdsomp@gmail.com> wrote: > hi Franck, > > Yes, what you describe totally aligns with the Signposting approach. In > the spec, you will see that it also talks about using HTTP Link headers > and/or Link Sets because these allow to convey information (e.g. metadata) > for resources that are not HTML and hence can not have embedded metadata. > > The "describedby" relationship, as all other relationships that are used > in Signposting, is listed in the IANA Link Relationship Registry at > https://www.iana.org/assignments/link-relations/link-relations.xhtml . > All these relationships have been defined in formal > specifications/standards. > > Typed links, like the ones used by Signposting, are commonly used in > RESTful interfaces. They're really mainstream in that way but don't seem to > be well known in the scholarly communication landscape. With Signposting we > want to try and change that by promoting this very low barrier > interoperability approach. > > Greetings > > Herbert > > On Fri, Jan 20, 2023 at 5:08 PM Franck Michel <fmichel@i3s.unice.fr> > wrote: > >> Dear all, >> >> Thx Leyla for pointing to Signposting, and thx Herbert for the links. I >> did not know about this project but this is indeed very similar to what I >> propose, yet with a different scope. >> >> Signposting suggests to use the HTTP Link header to link scholarly >> resources on the web with metadata about them, in various formats, like >> authoring information, bibliographic information in Bibtex, RIS etc., >> together with content negotiation. >> >> My proposition can totally be complementary with this. It seeks the >> generalized use of Schema.org (+ extensions such as Bioschemas of course) >> to mark up any resource. The HTTP Link header is used to point to the >> markup data. >> The "describedBy" relation is probably better suited than the "meta" that >> I used in my example. But anyway, the idea is that you can benefit from the >> large scope of Schema.org, Bioschemas and other extensions, to describe all >> your resources that are not webpages. >> Content negotiation can be used as show in the examples of Signposting, >> so if I extend my earlier example, that would give something like this: >> >> curl -I -H "Accept: application/ld+json" https://domain.org/myImage.jpg >> >> Link: >> <https://domain.org/myImage.jpg?q=markup&format=application/ld+json> >> <https://domain.org/myImage.jpg?q=markup&format=application/ld+json> >> ; rel="describedby" >> ; type="application/ld+json" >> >> Note that the query part of the URL "q=markup&format=application/ld+json" >> can be anything you like, it's just a trick to be used by the URL rewriting >> module of the web server, to point to the markup data for that specific >> resource. >> >> Regarding Jerven's remark about XMP, indeed like I said in my previous >> email: "XMP (Extensible Metadata Platform) allows to embed metadata in >> binary files. That's (...) limited to a few file types and this requires to >> parse the content of the file itself." >> By contrast, like explained by the authors of Signposting, using the HTTP >> Link header allows to query headers only (with HTTP method HEAD) such that >> you can get the metadata without even having to download the resource >> itself which may be big. >> Plus, this relies on native HTTP mechanisms only, so that you don't need >> a specific library to parse the header of pdfs, another one for images and >> so on. >> >> Franck. >> >> Le 20/01/2023 à 10:42, Herbert Van de Sompel a écrit : >> >> hi all, >> >> Thanks Carole for adding me to the conversation. >> >> Yes, indeed, Signposting in general, and the FAIR Signposting Profile >> specifically, were introduced as a lightweight mechanism to address the >> issue at hand: >> * https://signposting.org/ >> * https://signposting.org/FAIR/ >> >> Since Dataverse was mentioned in the email exchange, I can report that >> support for the FAIR Signposting Profile was implemented for Dataverse and >> should come with the next release, see >> https://github.com/IQSS/dataverse/issues/5962 >> >> I am happy to answer any questions. >> >> Greetings >> >> Herbert >> >> On Fri, Jan 20, 2023 at 10:20 AM Carole Goble < >> carole.goble@manchester.ac.uk> wrote: >> >>> Looping in Herbert Van de Sompel, worldwide Signposting expert >>> >>> >>> >>> Carole >>> >>> >>> >>> >>> >>> Professor Carole Goble CBE FREng FBCS CITP >>> >>> Department of Computer Science >>> >>> The University of Manchester, >>> >>> Manchester, M13 9PL, UK >>> >>> >>> >>> Head of Node ELIXIR-UK <https://elixiruknode.org/> >>> >>> >>> >>> PLEASE Do not send me a calendar invite and expect me to see it. (i) >>> Invites only work 50% of the time (ii) if they do work they do not appear >>> as email so I don’t know they are there until it is too late. >>> >>> Want me at a meeting? Email me. Don’t just silently sneak into a diary I >>> do not use. >>> >>> >>> >>> *From:* LJ.Garcia <lj.garcia.co@gmail.com> >>> *Sent:* 19 January 2023 18:41 >>> *To:* Franck Michel <fmichel@i3s.unice.fr>; Stian Soiland-Reyes < >>> soiland-reyes@manchester.ac.uk> >>> *Cc:* Yvan Le Bras <yvan.le-bras@mnhn.fr>; Carole Goble < >>> carole.goble@manchester.ac.uk>; Dan Bolser <dan.bolser@gmail.com>; >>> public-bioschemas <public-bioschemas@w3.org>; Fabien Gandon < >>> fabien.gandon@inria.fr> >>> *Subject:* Re: How to mark up a document other than a web page? >>> >>> >>> >>> Hi Franck, >>> >>> >>> >>> What you mention about using the HTTP header reminds me of Signposting ( >>> https://signposting.org/). Have you seen this approach? I am still have >>> to catch up with this subject so adding more people to the loop with better >>> knowledge on it. >>> >>> >>> >>> Kind regards, >>> >>> >>> >>> On Thu, Jan 5, 2023 at 5:48 PM Franck Michel <fmichel@i3s.unice.fr> >>> wrote: >>> >>> Dear all, >>> >>> Thank you for your remarks and comments. Actually I feel like the >>> discussion has already gone way beyond my initial question and proposition. >>> >>> My point was to figure out a simple way to provide metadata about any >>> kind of resource on the web, not only web pages, in the form of Schema.org >>> markup. >>> >>> RO-Crate is definitely a very interesting initiative but it primarily >>> concerns communities used to dealing with large data repositories like >>> Zenodo or Dataverse. Besides, it requires to encapsulate the produced >>> objects within an package (archive) that contains all necessary additional >>> metadata. This is great for enforcing FAIR ROs, but apart from such >>> specific needs, an image on the web will remain available as a raw jpg or >>> png file, same thing for a pdf, music, spreadsheet etc. We cannot expect >>> each web master to encapsulate those objects in RO-Crate packages. >>> >>> A way to mark up an object is to create a web page that links to this >>> object, and add markup on that page. But whenever the object is accessed >>> directly by its URL, it has no more markup data. As a result, SEO practices >>> have terrible recommendations like naming image files with a super long >>> name containing the name of the thing being represented, its description, >>> the image resolution etc. Ugly, right? XMP (Extensible Metadata Platform) >>> allows to embed metadata in binary files. That's much better but this is >>> limited to a few file types and this requires to parse the content of the >>> file itself. >>> >>> So my point is: we can link objects on the web to their metadata with a >>> mechanism that has been there since HTTP 1.0 (RFC1945 >>> <https://datatracker.ietf.org/doc/html/rfc1945#page-59>, 1996!), that >>> is almost the beginning of the web: the HTTP Link header. Hence the example >>> of a web server that returns a pdf document along with this header: >>> Link: <document_metadata.json>; rel="meta"; >>> type="application/ld+json" >>> >>> Upside: it does not break nor impose anything. HTTP clients that don't >>> care or understand JSON-LD will just ignore it. Those that can consume >>> JSON-LD will fetch the metadata and use the Schema.org annotations to do >>> whatever they want. This way, search engines will know precisely what's in >>> the object, making tools like Google Image able to index images much more >>> effectively. >>> Downside: there has to be a second HTTP get query to retrieve the >>> JSON-LD metadata. No big deal. >>> >>> Does it make sense or is it just totally obvious? >>> >>> Franck. >>> >>> Le 05/01/2023 à 11:55, Yvan Le Bras a écrit : >>> >>> Hi Franck, Carole, hi everyone, >>> >>> >>> >>> Let me first wish you all a happy new year ! >>> >>> >>> >>> Sorry if I misunderstood or if I am totally wrong, but it appears to me >>> important to try expose my point of view ;) >>> >>> >>> >>> Looking at your question Franck, and at answer from Carole notably, it >>> seems to me that 1/ schemas.org is made to mark-up web pages and e-mail >>> messages 2/ using an intermediate ""metadata layer"" who can be RDFa or >>> JSON-LD for example. >>> >>> >>> >>> Thus, to add schemas.org vocabulary to ""files"", it appears to me the >>> best is to use a metadata standard who describes the data, and for example >>> also URLs to download data files, and then can be exposed in RDFa or >>> JSON-LD for example through web pages where there schemas.org >>> vocabulary is used... So in structured data accessible on the internet. >>> >>> >>> >>> Thus, we can use RO-Crate or other standardized way to produce RO >>> metadata using schemas.org on JSON-LD web pages (for example we do so >>> in Ecology using "Ecological Metadata Language" standard and we can look at >>> the structured data on the data catalog like here >>> https://data.pndb.fr/view/urn:uuid:99abf52c-b271-4b66-ae50-c504e492bc4c >>> where we are using notably "schemaVersion", "url", "dataPublished", >>> "dateModified", "description", "keywords", "creator", "temporalCoverage", >>> "SubjectOf", "fileFormat", "spatialCoverage", ""geo", "latitude", >>> "longitude", "variableMeasured" schema.org terms) >>> >>> >>> >>> => Here I give the EML oriented example because it allows us to have >>> detailled metadata, notably with the "variableMeasured" who is something >>> allowing our datasets to have a particularly higher FAIRness. >>> >>> >>> >>> Please, don't hesitate to comment ! >>> >>> >>> >>> Wishing you a very good end of week, >>> >>> >>> >>> Best, >>> >>> >>> >>> Yvan >>> >>> >>> ------------------------------ >>> >>> *De: *"Carole Goble" <carole.goble@manchester.ac.uk> >>> <carole.goble@manchester.ac.uk> >>> *À: *"Dan Bolser" <dan.bolser@gmail.com> <dan.bolser@gmail.com>, >>> "Franck Michel" <fmichel@i3s.unice.fr> <fmichel@i3s.unice.fr> >>> *Cc: *"public-bioschemas" <public-bioschemas@w3.org> >>> <public-bioschemas@w3.org>, "Fabien Gandon" <fabien.gandon@inria.fr> >>> <fabien.gandon@inria.fr> >>> *Envoyé: *Jeudi 5 Janvier 2023 11:09:11 >>> *Objet: *RE: How to mark up a document other than a web page? >>> >>> >>> >>> https://zenodo.org/record/7147703#.Y7agoxXP2F4 is a longer talk that >>> sets up the RO-Crate vision >>> >>> >>> >>> Carole >>> >>> >>> >>> >>> >>> Professor Carole Goble CBE FREng FBCS CITP >>> >>> Department of Computer Science >>> >>> The University of Manchester, >>> >>> Manchester, M13 9PL, UK >>> >>> >>> >>> Head of Node ELIXIR-UK <https://elixiruknode.org/> >>> >>> >>> >>> PLEASE Do not send me a calendar invite and expect me to see it. (i) >>> Invites only work 50% of the time (ii) if they do work they do not appear >>> as email so I don’t know they are there until it is too late. >>> >>> Want me at a meeting? Email me. Don’t just silently sneak into a diary I >>> do not use. >>> >>> >>> >>> *From:* Carole Goble <carole.goble@manchester.ac.uk> >>> <carole.goble@manchester.ac.uk> >>> *Sent:* 05 January 2023 09:54 >>> *To:* Dan Bolser <dan.bolser@gmail.com> <dan.bolser@gmail.com>; Franck >>> Michel <fmichel@i3s.unice.fr> <fmichel@i3s.unice.fr> >>> *Cc:* public-bioschemas@w3.org; Fabien Gandon <fabien.gandon@inria.fr> >>> <fabien.gandon@inria.fr>; Carole Goble <carole.goble@manchester.ac.uk> >>> <carole.goble@manchester.ac.uk> >>> *Subject:* RE: How to mark up a document other than a web page? >>> >>> >>> >>> I have forwarded this thread to RO-Crate folks to pitch in >>> >>> >>> >>> RO-Crate https://www.researchobject.org/ro-crate/ packages files and >>> annotates them with rich metadata (using Bagit). It uses JSON-LD and >>> schema.org. It’s an example of using schema.org for multiple files not >>> web pages. >>> >>> >>> >>> RO-Crate has gained a lot of traction in organisations needing to >>> exchange digital objects with structured machine readable metadata, and is >>> designed to be repository neutral – that is, enable inter-repo exchange. >>> Zenodo and DataVerse have work ongoing to build compliance. >>> >>> https://zenodo.org/record/7376356#.Y7adghXP2F4 is a talk about the >>> repository overlay aspect of RO-Crate >>> >>> >>> >>> Carole >>> >>> >>> >>> >>> >>> Professor Carole Goble CBE FREng FBCS CITP >>> >>> Department of Computer Science >>> >>> The University of Manchester, >>> >>> Manchester, M13 9PL, UK >>> >>> >>> >>> Head of Node ELIXIR-UK <https://elixiruknode.org/> >>> >>> >>> >>> PLEASE Do not send me a calendar invite and expect me to see it. (i) >>> Invites only work 50% of the time (ii) if they do work they do not appear >>> as email so I don’t know they are there until it is too late. >>> >>> Want me at a meeting? Email me. Don’t just silently sneak into a diary I >>> do not use. >>> >>> >>> >>> *From:* Dan Bolser <dan.bolser@gmail.com> >>> *Sent:* 05 January 2023 09:39 >>> *To:* Franck Michel <fmichel@i3s.unice.fr> >>> *Cc:* public-bioschemas@w3.org; Fabien Gandon <fabien.gandon@inria.fr> >>> *Subject:* Re: How to mark up a document other than a web page? >>> >>> >>> >>> https://www.tomforth.co.uk/scienceandpdfs/ >>> >>> >>> >>> Looks useful >>> >>> >>> >>> >>> >>> On Wed, Jan 4, 2023, 5:50 PM Franck Michel <fmichel@i3s.unice.fr> wrote: >>> >>> Dear community, >>> >>> First of all, let me wish you all a happy, richly marked up new year ;). >>> >>> Schema.org is meant to mark up ressources of any kind on the internet, >>> not just web pages. While presenting Bioschemas, I once had this question: >>> how do I mark up a pdf file? More generally, how to mark up any resource >>> other than an html or xml-based content, like pdf, image, csv, Excel sheet, >>> zip archive etc. ? >>> >>> I recently asked this during a BSC meeting but it seemed that nobody had >>> really faced this use case yet. And I did a quick Google search but nothing >>> came up. So I'd be interested in having your thoughts on this. >>> >>> A basic solution would be to insert markup in the web page that provides >>> the download link. Not so satisfying since, when an application downloads >>> the file using its direct URL, there is no more markup. >>> >>> I could think of a simple solution that uses the HTTP Link header to >>> point to a file containing the markup data (similarly to what's been done >>> in JSON-LD <https://www.w3.org/TR/json-ld/#interpreting-json-as-json-ld> >>> or CSCW <https://www.w3.org/TR/tabular-data-model/#link-header>). The >>> exchange would look like this: >>> >>> GET /document.pdf HTTP/1.1 >>> Host: example.com >>> >>> ==================================== >>> >>> HTTP/1.1 200 OK >>> Content-Type: application/pdf >>> Link: <document_metadata.json>; rel="meta"; type="application/ld+json" >>> ... >>> >>> Where document_metadata.json is a JSON-LD description of the file and >>> its topic (written with Schema.org and Bioschemas of course). I'm not sure >>> whether rel="meta" is the best choice here, but that's just an example. >>> >>> Note that some metadata may already be embedded in pdf and image files >>> by means of XMP >>> <https://en.wikipedia.org/wiki/Extensible_Metadata_Platform>, where >>> Schema.org types and properties could be used. But this does not work with >>> any type of file, plus applications may want to use only HTTP-based >>> mechanisms to get the markup data, rather than have to read the content of >>> binary files. >>> >>> Have you seen this kind of use case and usage somewhere? Any other >>> solution you could think of? Do search engines expect this kind of linking >>> to external markup files? >>> >>> Thx in advance. Regards, >>> Franck. >>> >>> -- >>> >>> Franck MICHEL, CNRS research engineer >>> >>> Université Côte d’Azur, CNRS, Inria >>> >>> I3S laboratory (UMR 7271) >>> >>> >>> >>> >>> >>> -- >>> >>> -- >>> >>> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >>> Yvan Le Bras, PhD >>> @Yvan2935 >>> >>> <°))))>< >>> >>> Responsable scientifique et technique "Pole National de Données de >>> Biodiversité" https://www.pndb.fr/ >>> >>> >>> Bureau 34, Station marine de Concarneau BP 225, 29182 Concarneau CEDEX --- >>> MNHN Unité de service PatriNat Paris >>> >>> tél.: +33 (0) 2 98 50 99 35 >>> / +33 (0) 6.10.43.96.51 >>> >>> >>> yvan.le-bras@mnhn.fr >>> >>> >>> >>> >> >> -- >> ================== >> Herbert Van de Sompel >> https://hvdsomp.info >> https://orcid.org/0000-0002-0715-6126 >> >> >> > > -- > ================== > Herbert Van de Sompel > https://hvdsomp.info > https://orcid.org/0000-0002-0715-6126 > -- ================== Herbert Van de Sompel https://hvdsomp.info https://orcid.org/0000-0002-0715-6126
Received on Saturday, 21 January 2023 17:27:36 UTC