Re: Clarify contribution process to "suggestions-questions-brainstorming" repo / Legislation from Thomas Francart on 2020-09-10 (public-schemaorg@w3.org from September 2020)

From: Thomas Francart <thomas.francart@sparna.fr>
Date: Thu, 10 Sep 2020 10:05:32 +0200
To: Dan Brickley <danbri@google.com>
Cc: "schema.org Mailing List" <public-schemaorg@w3.org>, "R.V. Guha" <guha@google.com>
Message-ID: <CAPugn7UBqaeo+K4XP6V76Cq8Z=W9efkR3aBXcmOz2_hqcDXoxA@mail.gmail.com>
Hi Dan

Thanks a lot for the detailed answer, I think these clarifications were
needed.

Le mar. 8 sept. 2020 à 14:15, Dan Brickley <danbri@google.com> a écrit :

> On Tue, 8 Sep 2020 at 10:24, Thomas Francart <thomas.francart@sparna.fr>
> wrote:
>
>> Hello
>>
>> Recently some discussion items were moved from the original schemaorg
>> Github repo to the new "suggestions-questions-brainstorming" repo at
>> https://github.com/schemaorg/suggestions-questions-brainstorming.
>>
>> I would like to have the contribution process to this new repository
>> explicited, especially regarding the proposal to refine the description of
>> Legislation (
>> https://github.com/schemaorg/suggestions-questions-brainstorming/issues/24);
>> feedback is welcomed on this issue.
>> Once sufficient discussion took place and we come to an agreement, what
>> is the process ? should I create a PR ? or open another issue in the
>> original Github repo ? or is this other repo only meant for discussions
>> that are not considered mature enough (in that case, I gave some
>> justifications on the issue as the possible implementers of this) ?
>>
>
> hi Thomas,
>
> Thanks for raising this - you ask very fair questions. The basic idea
> behind this new repo (essentially just a new issue list) is that the main
> Schema.org issue tracker has, as the years have gone by, become rather full
> with hundreds and hundreds of perfectly sensible suggestions for possible
> new schemas. It always seemed rather awkward to close such issues, but as
> the issue list has grown it became harder for anyone (including me in my
> editorial capacity) to keep an overview of the state of schema discussions.
> For these reasons I investigated how other high profile projects deal with
> related problems (https://github.com/schemaorg/schemaorg/issues/2573) and
> amongst other approaches, proceeded with creating a second repository for
> issue discussions that can stay open and collect as many ideas as possible.
>
> I think in the specific case of the Legislation vocabulary, you've
> identified a principle we should bear in mind more generally as a working
> pattern: your issue related to work that was currently agreed and underway
> within Schema.org (i.e. to Pending terms around legislation markup), and
> they are a case where feedback, discussion, improvements etc should very
> naturally live in the main repository. At some point intuitively there is a
> cut-off where sophisticated and complex improvement ideas for a Pending
> term become something new, and something we might call it "brainstorming"
> of new work ideas. However, looking at your proposal in
> https://github.com/schemaorg/suggestions-questions-brainstorming/issues/24 it
> is very solidly focussed on the existing work, and on keeping it in sync
> with upstream design improvements at ELI.
>

You are correct that one of the goal of the proposal was to keep the SDO
Legislation description in synch with ELI.


> Since the purpose of having a Pending stage is to allow time for designs
> to "settle in", be tweaked, and have discussion about their relationship to
> other terms (within schema.org and elsewhere), I think we should have
> left the conversation in the main repo's issue list, and I will fix that.
>

OK Thanks


>
> Your issue also raises another matter, which is that in
> https://github.com/schemaorg/suggestions-questions-brainstorming/issues/7 I
> wrote the following:
>
> """If your issue is associated with a serious intent to implement new
> schema designs in a major user-facing platform (regardless of whether
> opensource / public sector or commercial) please make this clear in the
> discussions. Schema.org has a stated preference for schemas that get
> *used* in the sense of consumed by applications that explicitly use
> particular schema features. We also continue to improve schemas in general,
> but can't take on every suggestion (however insightful). This repository
> provides a place for that kind of ongoing brainstorming and discussion,
> i.e. to inform the general evolution of our schemas."""
>
> This reflects schema.org's founding and ongoing concern to be a host for
> schemas that are *used* and *useful*, rather than a hosting service for all
> possible schemas. In that sense the phrase you highlighted, "*a serious
> intent to implement new schema designs in major user-facing platform(s)*"
> was oriented more towards usage in the sense of data consumption
> (harvesting / extraction / interpretation / parsing / etc.) rather than
> publication; specifically applications that take schema.org data and then
> do something with it. In that sense, publishing data using the schemas is
> something else, but I accept that the wording here was ambiguous. The
> phrasing in https://github.com/schemaorg/schemaorg/blob/main/README.md is
> similar but less ambiguous on this point:
>
> """We try to prioritize
> <https://lists.w3.org/Archives/Public/public-schemaorg/2015Dec/0016.html> simple
> fixes and improvements to our existing schemas, examples and documentation
> over the addition of new vocabulary, and we are most likely to add new
> schemas when there is evidence that some (preferably large-scale) consuming
> application will make use of the data. Consuming applications need not be
> search engines; software tools e.g. opensource, markup-enriched approaches
> to Web analytics, browser add-ons or cloud tools are all rich areas for
> exploration and collaboration. The important thing is that there should be
> some reasonable expectation of data consumers making good use of the
> changes."""
>
>
> Legislation markup is, I believe, a good example of a situation where
> there is a strong case to add something to schema.org (initially via
> Pending and eventually into the Core) even if there are not yet any
> large-scale *consuming* applications.
>

To be totally honest : I had well understood that the sentence "*a serious
intent to implement new schema designs in major user-facing platform(s)*"
implied data consumption rather than publication. I was/am trying to find
arguments for better description of Legislation in SDO.
The point, as you saw it, is that with the ELI / SDO Legislation initiative
we are not in a *data-demand-driven* approach but rather in a
*data-offer-driven* approach. A consortium of Official Journals from many
countries pushes to make Legislation more interoperable, transparent and
visible on the web, by relying on structured data exchange and
dissemination. They are willing _now_ to make the effort of publishing
structured data through official public legislation web portals, and are
convinced that this implies using SDO markup; even though no large-scale
consuming application yet exists. And the ELI consortium is working to
"bridge the gap" with any interested data consumer to make their life in
terms of data acquisition, data modeling in SDO, data quality, etc.
So a pure demand-driven approach in SDO would not really work in our case,
also because the description of Legislation is not trivial, and if a model
proposition does not come from legal practitioners the risk is that it
could be overly simplistic.


> Similar situations might include other official (typically government)
> data, as well as the use of schemas to share and integrate massive
> publicly-accessible "Knowledge Graph" datasets, such as Wikidata.org,
> dataCommons.org, or Yago (who republish subsets of Wikidata re-expressed
> using Schema.org markup - https://yago-knowledge.org/).
>

+1


> However I don't think we should move too far from the focus on data that
> is - ultimately - consumed somewhere for some user-benefitting purpose.
>
> BTW, Legislation is still in pending; do we have a timeframe for the
>> integration of this in the core, or is the steering group still waiting for
>> implementation evidences ?
>>
>
> This follows along from the above points - it would benefit everyone to
> have some implementation evidence, i.e. to know that the (fantastic) work
> that has gone into publishing Legislation-enriched markup in Luxembourg,
> Ireland and elsewhere is reaching its full potential by being used
> (consumed, parsed, extracted etc.) in some kind of useful application.
> Historically for schema.org those applications have tended to be around
> web search, but they needn't be. For example, extracting legislation
> summaries to add to Wikidata or similar systems would be great. We should
> try to think of all this not just in terms of meeting some criteria to get
> "approved" into schema.org's core, but in terms of the underlying
> reality: we want the markup to be used and useful. Given the amount of
> interesting/useful data already being published using
> https://schema.org/Legislation we could just do the easier thing and
> simply move the Legislation vocabulary into core schema.org. But it would
> be good to have the larger conversation about how to help everyone make the
> most of that data. If there are practical obstacles - e.g. lack of good
> opensource tooling - which stand in the way of having more consuming
> applications, that would be a very positive focus for collaborations in
> this group.
>

We are thinking along the same lines about how to best help data consumers
do something useful in ELI; as an exapmle we were thinking about providing
an opensource tool to crawl/parse the official legal portals and recreate
full dataset of a legislation corpus. BTW, this obstacle of
crawling/parsing one or more website to recreate full datasets is not
specific to Legislation and is needed in many use-cases; running a crawler
with integrated structured data parser is not trivial. I'd like to hear if
anyone had a similar problematic and if/how it has been solved.

Thanks
Thomas


>
> Thanks,
>
> Dan
>
>
>> Thanks
>>
>> --
>>
>> *Thomas Francart* -* SPARNA*
>> Web de *données* | Architecture de l'*information* | Accès aux
>> *connaissances*
>> blog : blog.sparna.fr, site : sparna.fr, linkedin :
>> fr.linkedin.com/in/thomasfrancart
>> tel :  +33 (0)6.71.11.25.97 <+33%206%2071%2011%2025%2097>, skype :
>> francartthomas
>>
>

-- 

*Thomas Francart* -* SPARNA*
Web de *données* | Architecture de l'*information* | Accès aux
*connaissances*
blog : blog.sparna.fr, site : sparna.fr, linkedin :
fr.linkedin.com/in/thomasfrancart
tel :  +33 (0)6.71.11.25.97, skype : francartthomas
Received on Thursday, 10 September 2020 08:06:00 UTC