Re: Propose astro.schema.org from Thad Guidry on 2015-05-29 (public-schemaorg@w3.org from May 2015)

From: Thad Guidry <thadguidry@gmail.com>
Date: Fri, 29 May 2015 09:30:22 -0500
To: Dan Brickley <danbri@google.com>
Cc: Ed Summers <ehs@pobox.com>, "mfhepp@gmail.com" <mfhepp@gmail.com>, Barry Carter <carter.barry@gmail.com>, "schema.org Mailing List" <public-schemaorg@w3.org>
Message-ID: <CAChbWaO0ne6c5D2r2n8h+3X1DFAMFGS-z5Dybq+vW+aBUeCVTw@mail.gmail.com>
Myself personally, I often approach "Schema" questions with this mindset
.. "It would be cool if a high school student could ask (BLAH) and get this
(RESULT)"

I then begin thinking about the bits of data that would be needed to answer
that high school student's question.  I then think about what his followup
question would be !!....and there lies the key to proper Schema development
for any domain....those followup questions are the MOST revealing at times.

Hope that helps in your Astronomical Observations of Needed Schema For The
Masses. :)


Thad
+ThadGuidry <https://www.google.com/+ThadGuidry>

On Fri, May 29, 2015 at 9:24 AM, Thad Guidry <thadguidry@gmail.com> wrote:

> Agreed with Dan.
>
> Barry, to help you even more... Here's a nice project that actually aims
> to bridge the gap of astrophysical models and publically available data  -
> https://github.com/trillian/trillian
>
> I would envision that at some point in the future, the work of Trillian
> could be queried by consumers at web-scale, and optionally through new App
> and Site Indexing technologies being developed by Google and other
> stakeholders with Wikidata alignment.
>
>
> Thad
> +ThadGuidry <https://www.google.com/+ThadGuidry>
>
> On Fri, May 29, 2015 at 6:46 AM, Dan Brickley <danbri@google.com> wrote:
>
>> On 29 May 2015 at 10:59, Ed Summers <ehs@pobox.com> wrote:
>> >
>> >> On May 29, 2015, at 5:49 AM, mfhepp@gmail.com wrote:
>> >>
>> >> This will likely not pass the Alexa 1M test.
>> >
>> > I haven’t been paying attention :) what is the Alexa 1M test?
>>
>> I don't know either :)  Presumably roughly "used on a lot of major
>> sites". But I haven't thought about Alexa in a while I must admit.
>>
>> There was some related discussion on public-vocabs about when
>> something ought to go in the core versus be handled as an extension -
>> https://lists.w3.org/Archives/Public/public-vocabs/2015May/0009.html
>>
>> Here's a distinction that I don't think we make often enough. It is
>> fairly intuitive (I argue, without evidence):
>>
>> Schema.org's vocabularies are fundamentally for large scale
>> *communication* of structured data. Its core vocabularies are much
>> less likely to meet the needs of people choosing schemas to actually
>> manage/store/create such data, i.e. for the source or master format.
>>
>> Putting things in those terms makes clear that there remains plenty of
>> work to do in the non-schema.org RDF/OWL universe (as well as in
>> extensions, where the distinction perhaps gets blurred).
>>
>> Publishing data using schema.org typically involves transformation,
>> mapping and conversion from some underlying representation (which
>> often enough for publishers will be SQL or Java interfaces or
>> something custom or application-oriented e.g. Drupal's data storage
>> abstractions). What you do in the privacy of your own database is
>> entirely your own business. OWL and other Linked Data RDF vocabularies
>> may or may not be useful there, depending on your situation. It is
>> highly unlikely that schema.org's core vocabulary alone will be enough
>> to be the sole, ultimate and underlying representation for most
>> databases. Creating your own more focussed schemas/ontologies in
>> RDFS/OWL, rather than using planet-wide ontologies, can help to bridge
>> that gap. Creating more focussed schema.org extensions may also help -
>> but this is new territory for us all. Schema.org's emphasis remains on
>> Web-scale communication, rather than as a supplier of master formats.
>>
>> When you choose the terminology that actually structures your own
>> database(s), every detail matters - subtleties of definition, quirks
>> of the specific datasets and sources you're dealing with, versioning
>> etc. When publishing such data for consumption elsewhere in the Web,
>> particularly in search-oriented apps, it is natural to trade some of
>> that fine grained control for a wider audience. So for the astro case,
>> it might be that carefully modeled independent ontologies would be the
>> way to go within the astronomy community, but that once this work is
>> done or identified it could be mapped into a schema.org extension. As
>> Martin notes this is pretty much the route taken with the Good
>> Relations, Automotive etc work. There is no reason that independent
>> astro-oriented ontologies couldn't also be expressed as an
>> astro.schema.org extension, and perhaps this would help make the
>> resulting data markup clearer and easier for publishers. It wouldn't
>> guarantee search engines would suddenly start adding astro-related
>> search features, any more than the presence of "Volcano" in
>> schema.org's core (i.e. http://schema.org/Volcano) has led to many
>> volcano-oriented search features. But it could help focus discussion
>> on the astronomical aspects rather than on all the supporting
>> background vocabulary that is also needed.
>>
>> We have created the updated Extensions mechanism to help reduce the
>> gap between custom schemas and schema.org, rather than to replace
>> non-schema.org schemas.
>>
>> >From my own perspective schema.org serves much of the purpose that
>> FOAF originally aimed at: a general "utility" vocabulary to boostrap
>> this kind of structured data sharing - e.g. in
>>
>> https://web.archive.org/web/20140331104046/http://www.foaf-project.org/original-intro
>> FOAF (then "RDFWeb") was described as a "starter vocabulary".
>> See also http://www.w3.org/TR/NOTE-MCF-XML/#secA. for an earlier
>> bootstrap vocab which inspired FOAF. Schema.org is also a starter
>> vocabulary, but it is on a larger scale (number of sites, size of
>> vocabulary, impact of consuming apps) than we ever achieved with FOAF.
>>
>> I would say three things stopped FOAF itself evolving to serve as such
>> as "starter vocabulary":
>>
>> 1. We were too cautious about the size of the schema; 100 terms seemed
>> at the time terribly large. By relying on the multiple independent
>> namespaces RDFS approach we pushed complexity onto publishers, who
>> suffered from the lack of "attention to detail" coordination across
>> vocabularies. Real world descriptive problems do not map cleanly onto
>> independently managed RDF namespaces, resulting in fragmentation and
>> confusion on how best to express various situations in RDF.
>>
>> 2. Few consuming applications. We made some fun demos and prototypes,
>> but there was relatively little serious consumption of the data.
>> Without high profile consumption, data quality suffers and mainstream
>> publishers are not motivated, so it remains hard to break out of the
>> early-adopter tech/standards/research scene.
>>
>> 3. We were too cautious about evolving the schemas once they were
>> being used on many sites. Early design errors and compromises got
>> frozen in.
>>
>> The approach at schema.org differs on all of these three points:
>>
>> - the vocabulary is much larger, allowing many common scenarios to be
>> described purely by schema.org terms.
>> - there is an explicit up-front link to large scale consumption of the
>> data by mainstream user-facing applications
>> - the schemas are constantly being tweaked and improved, even
>> including name changes and redesigns when we think they improve
>> usability and integration.
>>
>> This last point brings me back to my first: if you are choosing the
>> underlying format for managing your data, this kind of constant
>> improvement can be ... annoying. While we do make frozen snapshots
>> available (see http://schema.org/version/ ) but the general approach
>> we take is to keep improving and integrating things.  This is not
>> something that those of us in the wider RDF community have done enough
>> of, in terms of improving how independently managed vocabularies fit
>> together. The hope with schema.org extensions is that we can find a
>> balance and have more decentralization of vocabulary creation while
>> still keeping a broad community communicating here who care about
>> integration and consistency, while acknowledging that changes often
>> have to be incremental and pragmatic.
>>
>> For example, the Bibliographic extensions community at
>> https://www.w3.org/community/schemabibex/ ... automotive ontologies at
>> https://www.w3.org/community/gao/ and most recently a proposed
>> health/medical extensions community, see
>>
>> https://www.w3.org/community/blog/2015/05/21/proposed-group-healthcare-ontology-community-group/
>>  ... these all have scope to go deeper into their focus areas than
>> core schema.org efforts. But they keep some connection also via
>> shema.org core vocabulary, so that shared notions of CreativeWork,
>> Organization, Event etc etc don't get repeatedly re-invented. For
>> Astronomy I'd suggest perhaps a W3C Community Group would also make
>> sense, both to draw together existing work, sites/publishers and
>> toolmakers who are interested to collaborate on shared vocabulary, but
>> also to give a coordination point so that we can keep a conversation
>> open on how these things all fit together. At this stage I think
>> studying what's out there is of far greater value than worrying about
>> what should go in to a hypothetical astro.schema.org or into any new
>> ontologies...
>>
>> verbosely,
>>
>> Dan
>>
>>
>
Received on Friday, 29 May 2015 14:30:52 UTC