Re: Propose astro.schema.org from Dan Brickley on 2015-05-29 (public-schemaorg@w3.org from May 2015)

From: Dan Brickley <danbri@google.com>
Date: Fri, 29 May 2015 12:46:25 +0100
To: Ed Summers <ehs@pobox.com>
Cc: "mfhepp@gmail.com" <mfhepp@gmail.com>, Barry Carter <carter.barry@gmail.com>, "schema.org Mailing List" <public-schemaorg@w3.org>
Message-ID: <CAK-qy=7Fd-hKbtZEQCf7iOKARSHY-zF2nWG6vSs6+gYZFxyMZg@mail.gmail.com>

On 29 May 2015 at 10:59, Ed Summers <ehs@pobox.com> wrote:
>
>> On May 29, 2015, at 5:49 AM, mfhepp@gmail.com wrote:
>>
>> This will likely not pass the Alexa 1M test.
>
> I haven’t been paying attention :) what is the Alexa 1M test?

I don't know either :) Presumably roughly "used on a lot of major
sites". But I haven't thought about Alexa in a while I must admit.

There was some related discussion on public-vocabs about when
something ought to go in the core versus be handled as an extension -
https://lists.w3.org/Archives/Public/public-vocabs/2015May/0009.html

Here's a distinction that I don't think we make often enough. It is
fairly intuitive (I argue, without evidence):

Schema.org's vocabularies are fundamentally for large scale
*communication* of structured data. Its core vocabularies are much
less likely to meet the needs of people choosing schemas to actually
manage/store/create such data, i.e. for the source or master format.

Putting things in those terms makes clear that there remains plenty of
work to do in the non-schema.org RDF/OWL universe (as well as in
extensions, where the distinction perhaps gets blurred).

Publishing data using schema.org typically involves transformation,
mapping and conversion from some underlying representation (which
often enough for publishers will be SQL or Java interfaces or
something custom or application-oriented e.g. Drupal's data storage
abstractions). What you do in the privacy of your own database is
entirely your own business. OWL and other Linked Data RDF vocabularies
may or may not be useful there, depending on your situation. It is
highly unlikely that schema.org's core vocabulary alone will be enough
to be the sole, ultimate and underlying representation for most
databases. Creating your own more focussed schemas/ontologies in
RDFS/OWL, rather than using planet-wide ontologies, can help to bridge
that gap. Creating more focussed schema.org extensions may also help -
but this is new territory for us all. Schema.org's emphasis remains on
Web-scale communication, rather than as a supplier of master formats.

When you choose the terminology that actually structures your own
database(s), every detail matters - subtleties of definition, quirks
of the specific datasets and sources you're dealing with, versioning
etc. When publishing such data for consumption elsewhere in the Web,
particularly in search-oriented apps, it is natural to trade some of
that fine grained control for a wider audience. So for the astro case,
it might be that carefully modeled independent ontologies would be the
way to go within the astronomy community, but that once this work is
done or identified it could be mapped into a schema.org extension. As
Martin notes this is pretty much the route taken with the Good
Relations, Automotive etc work. There is no reason that independent
astro-oriented ontologies couldn't also be expressed as an
astro.schema.org extension, and perhaps this would help make the
resulting data markup clearer and easier for publishers. It wouldn't
guarantee search engines would suddenly start adding astro-related
search features, any more than the presence of "Volcano" in
schema.org's core (i.e. http://schema.org/Volcano) has led to many
volcano-oriented search features. But it could help focus discussion
on the astronomical aspects rather than on all the supporting
background vocabulary that is also needed.

We have created the updated Extensions mechanism to help reduce the
gap between custom schemas and schema.org, rather than to replace
non-schema.org schemas.

>From my own perspective schema.org serves much of the purpose that
FOAF originally aimed at: a general "utility" vocabulary to boostrap
this kind of structured data sharing - e.g. in
https://web.archive.org/web/20140331104046/http://www.foaf-project.org/original-intro
FOAF (then "RDFWeb") was described as a "starter vocabulary".
See also http://www.w3.org/TR/NOTE-MCF-XML/#secA. for an earlier
bootstrap vocab which inspired FOAF. Schema.org is also a starter
vocabulary, but it is on a larger scale (number of sites, size of
vocabulary, impact of consuming apps) than we ever achieved with FOAF.

I would say three things stopped FOAF itself evolving to serve as such
as "starter vocabulary":

1. We were too cautious about the size of the schema; 100 terms seemed
at the time terribly large. By relying on the multiple independent
namespaces RDFS approach we pushed complexity onto publishers, who
suffered from the lack of "attention to detail" coordination across
vocabularies. Real world descriptive problems do not map cleanly onto
independently managed RDF namespaces, resulting in fragmentation and
confusion on how best to express various situations in RDF.

2. Few consuming applications. We made some fun demos and prototypes,
but there was relatively little serious consumption of the data.
Without high profile consumption, data quality suffers and mainstream
publishers are not motivated, so it remains hard to break out of the
early-adopter tech/standards/research scene.

3. We were too cautious about evolving the schemas once they were
being used on many sites. Early design errors and compromises got
frozen in.

The approach at schema.org differs on all of these three points:

- the vocabulary is much larger, allowing many common scenarios to be
described purely by schema.org terms.
- there is an explicit up-front link to large scale consumption of the
data by mainstream user-facing applications
- the schemas are constantly being tweaked and improved, even
including name changes and redesigns when we think they improve
usability and integration.

This last point brings me back to my first: if you are choosing the
underlying format for managing your data, this kind of constant
improvement can be ... annoying. While we do make frozen snapshots
available (see http://schema.org/version/ ) but the general approach
we take is to keep improving and integrating things. This is not
something that those of us in the wider RDF community have done enough
of, in terms of improving how independently managed vocabularies fit
together. The hope with schema.org extensions is that we can find a
balance and have more decentralization of vocabulary creation while
still keeping a broad community communicating here who care about
integration and consistency, while acknowledging that changes often
have to be incremental and pragmatic.

For example, the Bibliographic extensions community at
https://www.w3.org/community/schemabibex/ ... automotive ontologies at
https://www.w3.org/community/gao/ and most recently a proposed
health/medical extensions community, see
https://www.w3.org/community/blog/2015/05/21/proposed-group-healthcare-ontology-community-group/
... these all have scope to go deeper into their focus areas than
core schema.org efforts. But they keep some connection also via
shema.org core vocabulary, so that shared notions of CreativeWork,
Organization, Event etc etc don't get repeatedly re-invented. For
Astronomy I'd suggest perhaps a W3C Community Group would also make
sense, both to draw together existing work, sites/publishers and
toolmakers who are interested to collaborate on shared vocabulary, but
also to give a coordination point so that we can keep a conversation
open on how these things all fit together. At this stage I think
studying what's out there is of far greater value than worrying about
what should go in to a hypothetical astro.schema.org or into any new
ontologies...

verbosely,

Dan

Received on Friday, 29 May 2015 11:46:53 UTC