Re: Schema addition request from Dan Brickley on 2017-03-20 (public-schemaorg@w3.org from March 2017)

From: Dan Brickley <danbri@google.com>
Date: Mon, 20 Mar 2017 15:02:39 +0000
To: "public-schemaorg@w3.org" <public-schemaorg@w3.org>
Message-ID: <CAK-qy=6a94LrSUPYU-B+kx3=jK92zU_v8PMa=yfV-VFTRM5WLw@mail.gmail.com>

On 20 March 2017 at 01:24, Marijane White <whimar@ohsu.edu> wrote:
> I don’t know what thoughts/opinions are on Moz are around these parts, but I’d like to note that this term is included in their local category listings, which would seem to imply that their research indicates it is a category known to at least some search engines.
>
> https://moz.com/local/categories/category/Medical%20Spa

(to everyone on this thread)

May I suggest that speculating on the exact behaviour and internal
structure of search engines (whether my employer or any other) is
unlikely to be a productive use of this mailing list, or the inboxes
of the several hundred people on it.

Those who hope for an explicit list of exactly how each search engine
handles structured data should turn to the documentation sites
published by that search engine. You are unlikely to get a lot more
detail here or on the schema.org site. In the case of Google,
everything Google has to say officially on the topic is at
https://devsite.googleplex.com/search/docs/guides/intro-structured-data
or nearby.

Schema.org was founded by search engines and remains explicitly
responsive to suggestions from any/all large scale consumers of
markup, as well as to a wider community of participants in these
discussions. This is not always easy to balance and I can see that it
can be frustrating sometimes not to have an explicit recipe for
figuring out the best areas to focus on for new schema.org vocabulary.
Discussion here has come back (again) to questions around whether
Google and others will "support" the markup, so I wanted to comment a
little on that aspect.

There are several senses in which a search engine such as Google might
"support" or "use" schema.org. I'll comment here only from an
informally Google-oriented perspective. This is not any kind of
serious taxonomy, just some informal notes to make clear that
"supports" is not a simple binary yes/no thing -

1) A search engine might (or might not) be generally supportive of the
project, initiative, approach, as being good for the Web, for the
structured data ecosystem, as a foundation for new developments, and
so on.

2) A search engine might (or might not) make use of some specific
schema.org term(s) to support a particular user-visible feature such
as various kinds of snippets, summary panels, carousels and so on.

3) A search engine might (or might not) use any-or-all schema.org
markup as background knowledge to improve products and their various
features, and to get better at understanding, summarizing and
representing the real world meaning of various kinds of online
content.

4) A search engine might (or might not) have products and features
where particular interactions (e.g. matching of certain queries) take
structured data into account - e.g. see Matt Cutts' observations in
https://www.youtube.com/watch?v=OolDzztYwtQ - even if the UI doesn't
make a big explicit fuss about it.

5) A search engine might (or might not) have products/features e.g.
cloud stuff, custom search, analytics, or be deploying new
technologies like Web components (see e.g.
https://developers.google.com/web/updates/2015/03/creating-semantic-sites-with-web-components-and-jsonld)
... which make it easier for sites/publishers to themselves make
better use of their own structured data using whichever schema terms
make sense to their own applications.

6) A search engine might (or might not) make use of schema.org's
vocabulary when dealing with information coming from sources other
than the public Web, or in other kinds of product and service.

7) I could go on...

I won't go into the specifics of any of these, except to say that
Google's public documentation (and testing tool) focusses primarily on
(2.) because it is the most tangible and practical, ... but those
recommendations are set against a backdrop of wider and growing
support for schema.org structured data in the various broader senses
sketched above. It's nearly 6 years since Google announced support for
schema.org, and the range of users to which schema.org structured data
is put has grown very substantially. Specific products and features
might come and go, particular encodings (e.g. microdata vs json-ld)
might change, but the general direction of using this stuff in more
and more ways has been pretty clear.

All of this doesn't give a clear or automatic answer to specific
questions like "should we add some new term x to schema.org?".
Sometimes we (the schema.org community "we") have added small things
speculatively and it has turned out to be useful and later picked up
in user-visible product features, e.g. in the (2) sense above; other
times, for a huge range of reasons, schema.org additions may have been
less successful. There are a variety of considerations including ease
of adding the markup (e.g. does it match what a lot of major sites
have in their databases already), and so on. It would be reasonable to
expect a few more words in this direction on the schema.org site or
its github to help guide discussions, ... but we really can't keep
having the "but does/will Google/Bing/Yahoo/Yandex/etc explicitly use
it?" thread here every 3 weeks, and it is not useful to speculate on
the internal design of search engines on this mailing list. There are
many other places on the Web devoted to speculating on how search
engines might work internally. For our discussions here, it is best to
focus more on the public contents of the Web. Given the success of
schema.org, there is value in "rounding out" various areas of the
vocabulary where there are simple fixable gaps in vocabulary coverage,
regardless of whether they're expected to turn up explicitly used in
product features in the short term. As we have attempted to do so
we've also run into situations where there is risk of massive
redundancy and complication, which is why in 2017 we'll need to give
attention to issues around compositional terms
(https://github.com/schemaorg/schemaorg/issues/1493) and to bridging
with long-tail resources like Wikidata (e.g.
https://github.com/schemaorg/schemaorg/issues/1186
https://github.com/schemaorg/schemaorg/issues/280). As those efforts
mature, the various notion of "supports"/"understands"/"uses" floating
around will continue to evolve too.

Hope this helps a little. The other thing I wanted to note is that the
new "pending" area (see pending.schema.org) gives us an intermediate
zone where we can park proposed terms, with a lower barrier to entry,
alongside a slightly weaker sense that the terms there are broadly
"supported" by consuming applications. For example, when ClaimReview
was added there it was just an idea; a year later it is widely used on
high profile fact-checking sites, as well as in consuming products...

cheers,

Dan

Received on Monday, 20 March 2017 15:03:14 UTC