- From: Dan Brickley <danbri@google.com>
- Date: Tue, 15 Dec 2020 15:46:06 +0000
- To: "schema.org Mailing List" <public-schemaorg@w3.org>, Tom Marsh <tmarsh@exchange.microsoft.com>, Stéphane Corlosquet <scorlosquet@gmail.com>, Yuliya Tihohod <tilid@yandex-team.ru>, "R.V. Guha" <guha@google.com>, Nicolas Torzec <torzecn@oath.com>
- Message-ID: <CAK-qy=7iijqFX400R3u_YRYU5yzzAahv4VpLdeSzb+SQQ_G-Nw@mail.gmail.com>
Schema.org folks (steering group, community group, everyone...), https://github.com/schemaorg/schemaorg/issues/2790 tracks a proposal for a validator.schema.org tool, to be based on Google SDTT, and to be accompanied by opensource collaboration on data shape validation and parser interoperability. Today my Google colleagues are sharing Google's plans for the future of the Google Structured Data Testing Tool (SDTT) - see https://developers.google.com/search/blog/2020/12/structured-data-testing-tool-update. The intent is to rework it into a vendor-neutral tool that can continue to serve as a markup syntax checker for JSON-LD, Microdata, RDFa as used by the communities around Schema.org. Although it could live on its own independent domain, it would make a great addition to the Schema.org site, and I would like to proceed in that direction in 2021, as part of Google's long term commitment to hosting the Schema.org site and keeping it relevant for schema.org users. The basic idea is that the service now known as "Google Structured Data Testing Tool" would stop making Google-product-specific data checks, but continue - as "Schema Markup Validator" - to serve as a robust tool for checking JSON-LD, Microdata and RDFa schema markup. No validator (or schema.org parser) is perfect, so part of this work will involve documenting any shortcomings in the parsers/validators, and collaboration with opensource implementers and standards makers towards improving the ecosystem for everyone. In addition to syntax validation, there is also the more futuristic topic of "shape validation". For those unfamiliar with this distinction, syntax validation is about helping publishers get the basic structure of JSON-LD, Microdata, RDFa correct, whereas shape validation is about looking at the extracted structured data and comparing it to the documented needs of various online services, to see which features or tools it might be eligible for. SDTT currently performs its own version of "shape checking" to identify markup that matches the shapes needed by Google features, as listed in https://developers.google.com/search/docs/guides/search-gallery. However the intent is to turn this functionality *off*, so that the testing tool becomes a simpler vendor-neutral offering focussed on correctness of markup *syntax*. In addition to adopting a "degooglified" SDTT as a syntax-level "Schema Markup Validator", I would also like in 2021 to continue some collaboration around shape validation. This is the idea of using relatively new web standards (shacl, shex) to check structured data for matching specific data patterns or "shapes". See https://en.wikipedia.org/wiki/SHACL and https://en.wikipedia.org/wiki/ShEx, or the free online book "Validating RDF Data", https://book.validatingrdf.com/. Google recently opensourced some Javascript software <https://github.com/google/schemarama/> in this area, which brings together other opensource tooling to create a shape validation system using both ShEx and SHACL. While it looks superficially like SDTT, the focus is different: there is no syntax-level validation (which is why the plan outlined above for SDTT is useful). Over time, we can explore ways of integrating these different kinds of validation, but we can make some very useful, simpler steps first by giving a reworked SDTT a home under Schema.org. I've linked some more detailed notes on SDTT from the issue at https://github.com/schemaorg/schemaorg/issues/2790 - or see https://docs.google.com/document/d/1q8z_rRJepiz4Os_KcEs3NaCVEm3US5l-qYL14JmE0To/edit# directly. Feel free to follow up here, in Github or the doc, ... cheers, Dan
Received on Tuesday, 15 December 2020 15:46:40 UTC