Re: Clarify contribution process to "suggestions-questions-brainstorming" repo / Legislation from Dan Brickley on 2020-09-08 (public-schemaorg@w3.org from September 2020)

From: Dan Brickley <danbri@google.com>
Date: Tue, 8 Sep 2020 13:15:31 +0100
To: Thomas Francart <thomas.francart@sparna.fr>
Cc: "schema.org Mailing List" <public-schemaorg@w3.org>, "R.V. Guha" <guha@google.com>
Message-ID: <CAK-qy=6AgzLDnC4_7pexGm2TzX3UFbkrJZux+TNrG4Tg36eLCA@mail.gmail.com>
On Tue, 8 Sep 2020 at 10:24, Thomas Francart <thomas.francart@sparna.fr>
wrote:

> Hello
>
> Recently some discussion items were moved from the original schemaorg
> Github repo to the new "suggestions-questions-brainstorming" repo at
> https://github.com/schemaorg/suggestions-questions-brainstorming.
>
> I would like to have the contribution process to this new repository
> explicited, especially regarding the proposal to refine the description of
> Legislation (
> https://github.com/schemaorg/suggestions-questions-brainstorming/issues/24);
> feedback is welcomed on this issue.
> Once sufficient discussion took place and we come to an agreement, what is
> the process ? should I create a PR ? or open another issue in the original
> Github repo ? or is this other repo only meant for discussions that are not
> considered mature enough (in that case, I gave some justifications on the
> issue as the possible implementers of this) ?
>

hi Thomas,

Thanks for raising this - you ask very fair questions. The basic idea
behind this new repo (essentially just a new issue list) is that the main
Schema.org issue tracker has, as the years have gone by, become rather full
with hundreds and hundreds of perfectly sensible suggestions for possible
new schemas. It always seemed rather awkward to close such issues, but as
the issue list has grown it became harder for anyone (including me in my
editorial capacity) to keep an overview of the state of schema discussions.
For these reasons I investigated how other high profile projects deal with
related problems (https://github.com/schemaorg/schemaorg/issues/2573) and
amongst other approaches, proceeded with creating a second repository for
issue discussions that can stay open and collect as many ideas as possible.

I think in the specific case of the Legislation vocabulary, you've
identified a principle we should bear in mind more generally as a working
pattern: your issue related to work that was currently agreed and underway
within Schema.org (i.e. to Pending terms around legislation markup), and
they are a case where feedback, discussion, improvements etc should very
naturally live in the main repository. At some point intuitively there is a
cut-off where sophisticated and complex improvement ideas for a Pending
term become something new, and something we might call it "brainstorming"
of new work ideas. However, looking at your proposal in
https://github.com/schemaorg/suggestions-questions-brainstorming/issues/24 it
is very solidly focussed on the existing work, and on keeping it in sync
with upstream design improvements at ELI. Since the purpose of having a
Pending stage is to allow time for designs to "settle in", be tweaked, and
have discussion about their relationship to other terms (within schema.org
and elsewhere), I think we should have left the conversation in the main
repo's issue list, and I will fix that.

Your issue also raises another matter, which is that in
https://github.com/schemaorg/suggestions-questions-brainstorming/issues/7 I
wrote the following:

"""If your issue is associated with a serious intent to implement new
schema designs in a major user-facing platform (regardless of whether
opensource / public sector or commercial) please make this clear in the
discussions. Schema.org has a stated preference for schemas that get *used* in
the sense of consumed by applications that explicitly use particular schema
features. We also continue to improve schemas in general, but can't take on
every suggestion (however insightful). This repository provides a place for
that kind of ongoing brainstorming and discussion, i.e. to inform the
general evolution of our schemas."""

This reflects schema.org's founding and ongoing concern to be a host for
schemas that are *used* and *useful*, rather than a hosting service for all
possible schemas. In that sense the phrase you highlighted, "*a serious
intent to implement new schema designs in major user-facing platform(s)*"
was oriented more towards usage in the sense of data consumption
(harvesting / extraction / interpretation / parsing / etc.) rather than
publication; specifically applications that take schema.org data and then
do something with it. In that sense, publishing data using the schemas is
something else, but I accept that the wording here was ambiguous. The
phrasing in https://github.com/schemaorg/schemaorg/blob/main/README.md is
similar but less ambiguous on this point:

"""We try to prioritize
<https://lists.w3.org/Archives/Public/public-schemaorg/2015Dec/0016.html>
simple
fixes and improvements to our existing schemas, examples and documentation
over the addition of new vocabulary, and we are most likely to add new
schemas when there is evidence that some (preferably large-scale) consuming
application will make use of the data. Consuming applications need not be
search engines; software tools e.g. opensource, markup-enriched approaches
to Web analytics, browser add-ons or cloud tools are all rich areas for
exploration and collaboration. The important thing is that there should be
some reasonable expectation of data consumers making good use of the
changes."""


Legislation markup is, I believe, a good example of a situation where there
is a strong case to add something to schema.org (initially via Pending and
eventually into the Core) even if there are not yet any large-scale
*consuming* applications. Similar situations might include other official
(typically government) data, as well as the use of schemas to share and
integrate massive publicly-accessible "Knowledge Graph" datasets, such as
Wikidata.org, dataCommons.org, or Yago (who republish subsets of Wikidata
re-expressed using Schema.org markup - https://yago-knowledge.org/).
However I don't think we should move too far from the focus on data that is
- ultimately - consumed somewhere for some user-benefitting purpose.

BTW, Legislation is still in pending; do we have a timeframe for the
> integration of this in the core, or is the steering group still waiting for
> implementation evidences ?
>

This follows along from the above points - it would benefit everyone to
have some implementation evidence, i.e. to know that the (fantastic) work
that has gone into publishing Legislation-enriched markup in Luxembourg,
Ireland and elsewhere is reaching its full potential by being used
(consumed, parsed, extracted etc.) in some kind of useful application.
Historically for schema.org those applications have tended to be around web
search, but they needn't be. For example, extracting legislation summaries
to add to Wikidata or similar systems would be great. We should try to
think of all this not just in terms of meeting some criteria to get
"approved" into schema.org's core, but in terms of the underlying reality:
we want the markup to be used and useful. Given the amount of
interesting/useful data already being published using
https://schema.org/Legislation we could just do the easier thing and simply
move the Legislation vocabulary into core schema.org. But it would be good
to have the larger conversation about how to help everyone make the most of
that data. If there are practical obstacles - e.g. lack of good opensource
tooling - which stand in the way of having more consuming applications,
that would be a very positive focus for collaborations in this group.

Thanks,

Dan


> Thanks
>
> --
>
> *Thomas Francart* -* SPARNA*
> Web de *données* | Architecture de l'*information* | Accès aux
> *connaissances*
> blog : blog.sparna.fr, site : sparna.fr, linkedin :
> fr.linkedin.com/in/thomasfrancart
> tel :  +33 (0)6.71.11.25.97 <+33%206%2071%2011%2025%2097>, skype :
> francartthomas
>
Received on Tuesday, 8 September 2020 12:16:06 UTC