FW: a Schema Markup Validator: adopting SDTT for validator.schema.org

This will be of interest to our community.

Alasdair

--
Alasdair J G Gray
Associate Professor in Computer Science,
School of Mathematical and Computer Sciences
Heriot-Watt University, Edinburgh, UK.

Email: A.J.G.Gray@hw.ac.uk<mailto:A.J.G.Gray@hw.ac.uk>
Web: http://www.macs.hw.ac.uk/~ajg33

ORCID: http://orcid.org/0000-0002-5711-4872

Office: Earl Mountbatten Building 1.39
Twitter: @gray_alasdair


Heriot-Watt is a global University, as a result my working hours may not be your working hours. Do not feel pressure to reply to this email outside your working hours.


To arrange a meeting: https://doodle.com/mm/alasdairgray/book-a-time



From: "danbri@google.com" <danbri@google.com>
Date: Tuesday, 15 December 2020 at 15:50
To: "public-schemaorg@w3.org" <public-schemaorg@w3.org>, Tom Marsh <tmarsh@exchange.microsoft.com>, Stéphane Corlosquet <scorlosquet@gmail.com>, Yuliya Tihohod <tilid@yandex-team.ru>, "R.V. Guha" <guha@google.com>, Nicolas Torzec <torzecn@oath.com>
Subject: a Schema Markup Validator: adopting SDTT for validator.schema.org
Resent from: "public-schemaorg@w3.org" <public-schemaorg@w3.org>
Resent date: Tuesday, 15 December 2020 at 15:46

****************************************************************
Caution: This email originated from a sender outside Heriot-Watt University.
Do not follow links or open attachments if you doubt the authenticity of the sender or the content.
****************************************************************


Schema.org folks (steering group, community group, everyone...),

https://github.com/schemaorg/schemaorg/issues/2790 tracks a proposal for a validator.schema.org<http://validator.schema.org> tool, to be based on Google SDTT, and to be accompanied by opensource collaboration on data shape validation and parser interoperability.

Today my Google colleagues are sharing Google's plans for the future of the Google Structured Data Testing Tool (SDTT) - see https://developers.google.com/search/blog/2020/12/structured-data-testing-tool-update. The intent is to rework it into a vendor-neutral tool that can continue to serve as a markup syntax checker for JSON-LD, Microdata, RDFa as used by the communities around Schema.org. Although it could live on its own independent domain, it would make a great addition to the Schema.org site, and I would like to proceed in that direction in 2021, as part of Google's long term commitment to hosting the Schema.org site and keeping it relevant for schema.org<http://schema.org> users.

The basic idea is that the service now known as "Google Structured Data Testing Tool" would stop making Google-product-specific data checks, but continue  - as "Schema Markup Validator" - to serve as a robust tool for checking JSON-LD, Microdata and RDFa schema markup. No validator (or schema.org<http://schema.org> parser) is perfect, so part of this work will involve documenting any shortcomings in the parsers/validators, and collaboration with opensource implementers and standards makers towards improving the ecosystem for everyone.

In addition to syntax validation, there is also the more futuristic topic of "shape validation". For those unfamiliar with this distinction, syntax validation is about helping publishers get the basic structure of JSON-LD, Microdata, RDFa correct, whereas shape validation is about looking at the extracted structured data and comparing it to the documented needs of various online services, to see which features or tools it might be eligible for. SDTT currently performs its own version of "shape checking" to identify markup that matches the shapes needed by Google features, as listed in https://developers.google.com/search/docs/guides/search-gallery. However the intent is to turn this functionality off, so that the testing tool becomes a simpler vendor-neutral offering focussed on correctness of markup syntax.

In addition to adopting a "degooglified" SDTT as a syntax-level "Schema Markup Validator", I would also like in 2021 to continue some collaboration around shape validation. This is the idea of using relatively new web standards (shacl, shex) to check structured data for matching specific data patterns or "shapes". See https://en.wikipedia.org/wiki/SHACL and https://en.wikipedia.org/wiki/ShEx, or the  free online book "Validating RDF Data", https://book.validatingrdf.com/. Google recently opensourced some Javascript software<https://github.com/google/schemarama/> in this area, which brings together other opensource tooling to create a shape validation system using both ShEx and SHACL. While it looks superficially like SDTT, the focus is different: there is no syntax-level validation (which is why the plan outlined above for SDTT is useful). Over time, we can explore ways of integrating these different kinds of validation, but we can make some very useful, simpler steps first by giving a reworked SDTT a home under Schema.org.

I've linked some more detailed notes on SDTT from the issue at https://github.com/schemaorg/schemaorg/issues/2790 - or see https://docs.google.com/document/d/1q8z_rRJepiz4Os_KcEs3NaCVEm3US5l-qYL14JmE0To/edit# <https://docs.google.com/document/d/1q8z_rRJepiz4Os_KcEs3NaCVEm3US5l-qYL14JmE0To/edit> directly. Feel free to follow up here, in Github or the doc, ...

cheers,

Dan
________________________________

Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With campuses and students across the entire globe we span the world, delivering innovation and educational excellence in business, engineering, design and the physical, social and life sciences. This email is generated from the Heriot-Watt University Group, which includes:

  1.  Heriot-Watt University, a Scottish charity registered under number SC000278
  2.  Heriot- Watt Services Limited (Oriam), Scotland's national performance centre for sport. Heriot-Watt Services Limited is a private limited company registered is Scotland with registered number SC271030 and registered office at Research & Enterprise Services Heriot-Watt University, Riccarton, Edinburgh, EH14 4AS.

The contents (including any attachments) are confidential. If you are not the intended recipient of this e-mail, any disclosure, copying, distribution or use of its contents is strictly prohibited, and you should please notify the sender immediately and then delete it (including any attachments) from your system.

Received on Tuesday, 15 December 2020 16:00:13 UTC