- From: Dan Brickley <danbri@google.com>
- Date: Fri, 29 May 2015 12:46:25 +0100
- To: Ed Summers <ehs@pobox.com>
- Cc: "mfhepp@gmail.com" <mfhepp@gmail.com>, Barry Carter <carter.barry@gmail.com>, "schema.org Mailing List" <public-schemaorg@w3.org>
On 29 May 2015 at 10:59, Ed Summers <ehs@pobox.com> wrote: > >> On May 29, 2015, at 5:49 AM, mfhepp@gmail.com wrote: >> >> This will likely not pass the Alexa 1M test. > > I haven’t been paying attention :) what is the Alexa 1M test? I don't know either :) Presumably roughly "used on a lot of major sites". But I haven't thought about Alexa in a while I must admit. There was some related discussion on public-vocabs about when something ought to go in the core versus be handled as an extension - https://lists.w3.org/Archives/Public/public-vocabs/2015May/0009.html Here's a distinction that I don't think we make often enough. It is fairly intuitive (I argue, without evidence): Schema.org's vocabularies are fundamentally for large scale *communication* of structured data. Its core vocabularies are much less likely to meet the needs of people choosing schemas to actually manage/store/create such data, i.e. for the source or master format. Putting things in those terms makes clear that there remains plenty of work to do in the non-schema.org RDF/OWL universe (as well as in extensions, where the distinction perhaps gets blurred). Publishing data using schema.org typically involves transformation, mapping and conversion from some underlying representation (which often enough for publishers will be SQL or Java interfaces or something custom or application-oriented e.g. Drupal's data storage abstractions). What you do in the privacy of your own database is entirely your own business. OWL and other Linked Data RDF vocabularies may or may not be useful there, depending on your situation. It is highly unlikely that schema.org's core vocabulary alone will be enough to be the sole, ultimate and underlying representation for most databases. Creating your own more focussed schemas/ontologies in RDFS/OWL, rather than using planet-wide ontologies, can help to bridge that gap. Creating more focussed schema.org extensions may also help - but this is new territory for us all. Schema.org's emphasis remains on Web-scale communication, rather than as a supplier of master formats. When you choose the terminology that actually structures your own database(s), every detail matters - subtleties of definition, quirks of the specific datasets and sources you're dealing with, versioning etc. When publishing such data for consumption elsewhere in the Web, particularly in search-oriented apps, it is natural to trade some of that fine grained control for a wider audience. So for the astro case, it might be that carefully modeled independent ontologies would be the way to go within the astronomy community, but that once this work is done or identified it could be mapped into a schema.org extension. As Martin notes this is pretty much the route taken with the Good Relations, Automotive etc work. There is no reason that independent astro-oriented ontologies couldn't also be expressed as an astro.schema.org extension, and perhaps this would help make the resulting data markup clearer and easier for publishers. It wouldn't guarantee search engines would suddenly start adding astro-related search features, any more than the presence of "Volcano" in schema.org's core (i.e. http://schema.org/Volcano) has led to many volcano-oriented search features. But it could help focus discussion on the astronomical aspects rather than on all the supporting background vocabulary that is also needed. We have created the updated Extensions mechanism to help reduce the gap between custom schemas and schema.org, rather than to replace non-schema.org schemas. >From my own perspective schema.org serves much of the purpose that FOAF originally aimed at: a general "utility" vocabulary to boostrap this kind of structured data sharing - e.g. in https://web.archive.org/web/20140331104046/http://www.foaf-project.org/original-intro FOAF (then "RDFWeb") was described as a "starter vocabulary". See also http://www.w3.org/TR/NOTE-MCF-XML/#secA. for an earlier bootstrap vocab which inspired FOAF. Schema.org is also a starter vocabulary, but it is on a larger scale (number of sites, size of vocabulary, impact of consuming apps) than we ever achieved with FOAF. I would say three things stopped FOAF itself evolving to serve as such as "starter vocabulary": 1. We were too cautious about the size of the schema; 100 terms seemed at the time terribly large. By relying on the multiple independent namespaces RDFS approach we pushed complexity onto publishers, who suffered from the lack of "attention to detail" coordination across vocabularies. Real world descriptive problems do not map cleanly onto independently managed RDF namespaces, resulting in fragmentation and confusion on how best to express various situations in RDF. 2. Few consuming applications. We made some fun demos and prototypes, but there was relatively little serious consumption of the data. Without high profile consumption, data quality suffers and mainstream publishers are not motivated, so it remains hard to break out of the early-adopter tech/standards/research scene. 3. We were too cautious about evolving the schemas once they were being used on many sites. Early design errors and compromises got frozen in. The approach at schema.org differs on all of these three points: - the vocabulary is much larger, allowing many common scenarios to be described purely by schema.org terms. - there is an explicit up-front link to large scale consumption of the data by mainstream user-facing applications - the schemas are constantly being tweaked and improved, even including name changes and redesigns when we think they improve usability and integration. This last point brings me back to my first: if you are choosing the underlying format for managing your data, this kind of constant improvement can be ... annoying. While we do make frozen snapshots available (see http://schema.org/version/ ) but the general approach we take is to keep improving and integrating things. This is not something that those of us in the wider RDF community have done enough of, in terms of improving how independently managed vocabularies fit together. The hope with schema.org extensions is that we can find a balance and have more decentralization of vocabulary creation while still keeping a broad community communicating here who care about integration and consistency, while acknowledging that changes often have to be incremental and pragmatic. For example, the Bibliographic extensions community at https://www.w3.org/community/schemabibex/ ... automotive ontologies at https://www.w3.org/community/gao/ and most recently a proposed health/medical extensions community, see https://www.w3.org/community/blog/2015/05/21/proposed-group-healthcare-ontology-community-group/ ... these all have scope to go deeper into their focus areas than core schema.org efforts. But they keep some connection also via shema.org core vocabulary, so that shared notions of CreativeWork, Organization, Event etc etc don't get repeatedly re-invented. For Astronomy I'd suggest perhaps a W3C Community Group would also make sense, both to draw together existing work, sites/publishers and toolmakers who are interested to collaborate on shared vocabulary, but also to give a coordination point so that we can keep a conversation open on how these things all fit together. At this stage I think studying what's out there is of far greater value than worrying about what should go in to a hypothetical astro.schema.org or into any new ontologies... verbosely, Dan
Received on Friday, 29 May 2015 11:46:53 UTC