- From: LJ.Garcia <lj.garcia.co@gmail.com>
- Date: Thu, 19 Jan 2023 19:41:15 +0100
- To: Franck Michel <fmichel@i3s.unice.fr>, Stian Soiland-Reyes <soiland-reyes@manchester.ac.uk>
- Cc: Yvan Le Bras <yvan.le-bras@mnhn.fr>, Carole Goble <carole.goble@manchester.ac.uk>, Dan Bolser <dan.bolser@gmail.com>, public-bioschemas <public-bioschemas@w3.org>, Fabien Gandon <fabien.gandon@inria.fr>
- Message-ID: <CAPZUG=BHK6MtdEc-5--+p2D+07cr5BCSMAAb8eQXy1G_GPbeVQ@mail.gmail.com>
Hi Franck, What you mention about using the HTTP header reminds me of Signposting ( https://signposting.org/). Have you seen this approach? I am still have to catch up with this subject so adding more people to the loop with better knowledge on it. Kind regards, On Thu, Jan 5, 2023 at 5:48 PM Franck Michel <fmichel@i3s.unice.fr> wrote: > Dear all, > > Thank you for your remarks and comments. Actually I feel like the > discussion has already gone way beyond my initial question and proposition. > > My point was to figure out a simple way to provide metadata about any kind > of resource on the web, not only web pages, in the form of Schema.org > markup. > > RO-Crate is definitely a very interesting initiative but it primarily > concerns communities used to dealing with large data repositories like > Zenodo or Dataverse. Besides, it requires to encapsulate the produced > objects within an package (archive) that contains all necessary additional > metadata. This is great for enforcing FAIR ROs, but apart from such > specific needs, an image on the web will remain available as a raw jpg or > png file, same thing for a pdf, music, spreadsheet etc. We cannot expect > each web master to encapsulate those objects in RO-Crate packages. > > A way to mark up an object is to create a web page that links to this > object, and add markup on that page. But whenever the object is accessed > directly by its URL, it has no more markup data. As a result, SEO practices > have terrible recommendations like naming image files with a super long > name containing the name of the thing being represented, its description, > the image resolution etc. Ugly, right? XMP (Extensible Metadata Platform) > allows to embed metadata in binary files. That's much better but this is > limited to a few file types and this requires to parse the content of the > file itself. > > So my point is: we can link objects on the web to their metadata with a > mechanism that has been there since HTTP 1.0 (RFC1945 > <https://datatracker.ietf.org/doc/html/rfc1945#page-59>, 1996!), that is > almost the beginning of the web: the HTTP Link header. Hence the example of > a web server that returns a pdf document along with this header: > Link: <document_metadata.json>; rel="meta"; type="application/ld+json" > > Upside: it does not break nor impose anything. HTTP clients that don't > care or understand JSON-LD will just ignore it. Those that can consume > JSON-LD will fetch the metadata and use the Schema.org annotations to do > whatever they want. This way, search engines will know precisely what's in > the object, making tools like Google Image able to index images much more > effectively. > Downside: there has to be a second HTTP get query to retrieve the JSON-LD > metadata. No big deal. > > Does it make sense or is it just totally obvious? > > Franck. > > Le 05/01/2023 à 11:55, Yvan Le Bras a écrit : > > Hi Franck, Carole, hi everyone, > > Let me first wish you all a happy new year ! > > Sorry if I misunderstood or if I am totally wrong, but it appears to me > important to try expose my point of view ;) > > Looking at your question Franck, and at answer from Carole notably, it > seems to me that 1/ schemas.org is made to mark-up web pages and e-mail > messages 2/ using an intermediate ""metadata layer"" who can be RDFa or > JSON-LD for example. > > Thus, to add schemas.org vocabulary to ""files"", it appears to me the > best is to use a metadata standard who describes the data, and for example > also URLs to download data files, and then can be exposed in RDFa or > JSON-LD for example through web pages where there schemas.org vocabulary > is used... So in structured data accessible on the internet. > > Thus, we can use RO-Crate or other standardized way to produce RO metadata > using schemas.org on JSON-LD web pages (for example we do so in Ecology > using "Ecological Metadata Language" standard and we can look at the > structured data on the data catalog like here > https://data.pndb.fr/view/urn:uuid:99abf52c-b271-4b66-ae50-c504e492bc4c > where we are using notably "schemaVersion", "url", "dataPublished", > "dateModified", "description", "keywords", "creator", "temporalCoverage", > "SubjectOf", "fileFormat", "spatialCoverage", ""geo", "latitude", > "longitude", "variableMeasured" schema.org terms) > > => Here I give the EML oriented example because it allows us to have > detailled metadata, notably with the "variableMeasured" who is something > allowing our datasets to have a particularly higher FAIRness. > > Please, don't hesitate to comment ! > > Wishing you a very good end of week, > > Best, > > Yvan > > ------------------------------ > *De: *"Carole Goble" <carole.goble@manchester.ac.uk> > <carole.goble@manchester.ac.uk> > *À: *"Dan Bolser" <dan.bolser@gmail.com> <dan.bolser@gmail.com>, "Franck > Michel" <fmichel@i3s.unice.fr> <fmichel@i3s.unice.fr> > *Cc: *"public-bioschemas" <public-bioschemas@w3.org> > <public-bioschemas@w3.org>, "Fabien Gandon" <fabien.gandon@inria.fr> > <fabien.gandon@inria.fr> > *Envoyé: *Jeudi 5 Janvier 2023 11:09:11 > *Objet: *RE: How to mark up a document other than a web page? > > https://zenodo.org/record/7147703#.Y7agoxXP2F4 is a longer talk that > sets up the RO-Crate vision > > > > Carole > > > > > > Professor Carole Goble CBE FREng FBCS CITP > > Department of Computer Science > > The University of Manchester, > > Manchester, M13 9PL, UK > > > > Head of Node ELIXIR-UK <https://elixiruknode.org/> > > > > PLEASE Do not send me a calendar invite and expect me to see it. (i) > Invites only work 50% of the time (ii) if they do work they do not appear > as email so I don’t know they are there until it is too late. > > Want me at a meeting? Email me. Don’t just silently sneak into a diary I > do not use. > > > > *From:* Carole Goble <carole.goble@manchester.ac.uk> > <carole.goble@manchester.ac.uk> > *Sent:* 05 January 2023 09:54 > *To:* Dan Bolser <dan.bolser@gmail.com> <dan.bolser@gmail.com>; Franck > Michel <fmichel@i3s.unice.fr> <fmichel@i3s.unice.fr> > *Cc:* public-bioschemas@w3.org; Fabien Gandon <fabien.gandon@inria.fr> > <fabien.gandon@inria.fr>; Carole Goble <carole.goble@manchester.ac.uk> > <carole.goble@manchester.ac.uk> > *Subject:* RE: How to mark up a document other than a web page? > > > > I have forwarded this thread to RO-Crate folks to pitch in > > > > RO-Crate https://www.researchobject.org/ro-crate/ packages files and > annotates them with rich metadata (using Bagit). It uses JSON-LD and > schema.org. It’s an example of using schema.org for multiple files not > web pages. > > > > RO-Crate has gained a lot of traction in organisations needing to exchange > digital objects with structured machine readable metadata, and is designed > to be repository neutral – that is, enable inter-repo exchange. Zenodo and > DataVerse have work ongoing to build compliance. > > https://zenodo.org/record/7376356#.Y7adghXP2F4 is a talk about the > repository overlay aspect of RO-Crate > > > > Carole > > > > > > Professor Carole Goble CBE FREng FBCS CITP > > Department of Computer Science > > The University of Manchester, > > Manchester, M13 9PL, UK > > > > Head of Node ELIXIR-UK <https://elixiruknode.org/> > > > > PLEASE Do not send me a calendar invite and expect me to see it. (i) > Invites only work 50% of the time (ii) if they do work they do not appear > as email so I don’t know they are there until it is too late. > > Want me at a meeting? Email me. Don’t just silently sneak into a diary I > do not use. > > > > *From:* Dan Bolser <dan.bolser@gmail.com> > *Sent:* 05 January 2023 09:39 > *To:* Franck Michel <fmichel@i3s.unice.fr> > *Cc:* public-bioschemas@w3.org; Fabien Gandon <fabien.gandon@inria.fr> > *Subject:* Re: How to mark up a document other than a web page? > > > > https://www.tomforth.co.uk/scienceandpdfs/ > > > > Looks useful > > > > > > On Wed, Jan 4, 2023, 5:50 PM Franck Michel <fmichel@i3s.unice.fr> wrote: > > Dear community, > > First of all, let me wish you all a happy, richly marked up new year ;). > > Schema.org is meant to mark up ressources of any kind on the internet, not > just web pages. While presenting Bioschemas, I once had this question: how > do I mark up a pdf file? More generally, how to mark up any resource other > than an html or xml-based content, like pdf, image, csv, Excel sheet, zip > archive etc. ? > > I recently asked this during a BSC meeting but it seemed that nobody had > really faced this use case yet. And I did a quick Google search but nothing > came up. So I'd be interested in having your thoughts on this. > > A basic solution would be to insert markup in the web page that provides > the download link. Not so satisfying since, when an application downloads > the file using its direct URL, there is no more markup. > > I could think of a simple solution that uses the HTTP Link header to point > to a file containing the markup data (similarly to what's been done in > JSON-LD <https://www.w3.org/TR/json-ld/#interpreting-json-as-json-ld> or > CSCW <https://www.w3.org/TR/tabular-data-model/#link-header>). The > exchange would look like this: > > GET /document.pdf HTTP/1.1 > Host: example.com > > ==================================== > > HTTP/1.1 200 OK > Content-Type: application/pdf > Link: <document_metadata.json>; rel="meta"; type="application/ld+json" > ... > > Where document_metadata.json is a JSON-LD description of the file and its > topic (written with Schema.org and Bioschemas of course). I'm not sure > whether rel="meta" is the best choice here, but that's just an example. > > Note that some metadata may already be embedded in pdf and image files by > means of XMP <https://en.wikipedia.org/wiki/Extensible_Metadata_Platform>, > where Schema.org types and properties could be used. But this does not work > with any type of file, plus applications may want to use only HTTP-based > mechanisms to get the markup data, rather than have to read the content of > binary files. > > Have you seen this kind of use case and usage somewhere? Any other > solution you could think of? Do search engines expect this kind of linking > to external markup files? > > Thx in advance. Regards, > Franck. > > -- > > Franck MICHEL, CNRS research engineer > > Université Côte d’Azur, CNRS, Inria > > I3S laboratory (UMR 7271) > > > > -- > -- > > -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Yvan Le Bras, PhD > @Yvan2935 > > <°))))>< > > Responsable scientifique et technique "Pole National de Données de > Biodiversité" https://www.pndb.fr/ > > Bureau > 34, Station marine de Concarneau BP 225, 29182 Concarneau CEDEX --- MNHN > Unité de service PatriNat Paris > > tél.: +33 (0) 2 98 50 99 35 / > +33 (0) 6.10.43.96.51 > > > yvan.le-bras@mnhn.fr > > >
Received on Thursday, 19 January 2023 18:41:40 UTC