Re: Propose astro.schema.org

Dear Barry:

> On 29 May 2015, at 01:56, Barry Carter <carter.barry@gmail.com> wrote:
> 
> Thanks, Dan.
> 
> It would be easy to convert existing data into an astro.schema.org format (once we created it), but I guess this brings me back to a fundamental question: what is the purpose of schema.org?
> 

In a nutshell, schema.org is a vocabulary, i.e. a conceptual data model with global identifiers for types, properties, and values, mainly targeted at better information extraction from Web content.

I would say that schema.org is first and foremost a "general-purpose Web Information Extraction support ontology".

The original description says this pretty nicely:

"Many sites are generated from structured data, which is often stored in databases. When this data is formatted into HTML, it becomes very difficult to recover the original structured data. Many applications, especially search engines, can benefit greatly from direct access to this structured data. On-page markup enables search engines to understand the information on web pages and provide richer search results in order to make it easier for users to find relevant information on the web. Markup can also enable new tools and applications that make use of the structure.
A shared markup vocabulary makes it easier for webmasters to decide on a markup schema and get the maximum benefit for their efforts. So, in the spirit of sitemaps.org, search engines have come together to provide a shared collection of schemas that webmasters can use."

http://web.archive.org/web/20150318004607/http://schema.org/

> I realize the site and the blog try to explain this, but I'm still not
> entirely clear on the purpose.
> 
> Is schema.org supposed to replace OWL and other ontologies?
OWL is an ontology language. Schema.org can be expressed in OWL, but it is not formally specified in OWL.
Schema.org can be used in combination with other ontologies and re-uses existing ontologies in various ways. Well-known ontologies like FOAF and Cyc inspired the modeling choices in schema.org, and other ontologies have been integrated, like LRMI and GoodRelations.

> Is it a
> build-from-scratch ontology for Google/Microsoft/Yahoo/Yandex?
> 
Historically, it was an attempt to collate a subset of elements from other ontologies that are relevant for marking up typical Web content, under a single name-space, to make it simpler for developer to know which elements would be understood by search engines. It was not built from scratch; note that key people in the schema.org have been long-term contributors to Web ontologies (e.g. FOAF) and formalisms and models (RDF, RDFS, OWL, RDFa, JSON-LD).

>Or does it serve some other purpose?

As said, it is safe to say that it is mainly an ontology/vocabulary that increases the efficieny and reliability of information extraction from Web content. That is not exactly what the proponents of the traditional Semantic Web vision have in mind when talking of Web ontologies. 

In the recent year, the number of consumers of schema.org data has grown and includes e.g. e-mail clients, browser extensions, and more.

> 
> More specifically, under what circumstances is it useful to create a new
> vocabulary for schema.org and under what circumstances is it not?

A schema.org extension makes sense, roughly, when

- it is likely that there exists a broad number of Web sites with respective data,
- those sites would be willing to add markup and
- search engines or other major consumers of Web data find that data useful.

Rough guidance: For an extension proposal, there should be at least 10 k matching sites in the Alexa 1M dataset.

You can always start with an independent RDFS/OWL Web ontology and turn that into a schema.org extension later-on.

A bit of warning: Building Web ontologies is a time-consuming, difficult task. I have spent almost a decade working on GoodRelations (28 classes only). The auto extension for schema.org, as recently added, was an 18-months effort (see http://www.w3.org/wiki/WebSchemas/Vehicles to get an idea).

> 
> And in what way should schema.org ontologies differ from OWL's?

Schema.org extensions do not depend on OWL as a formalism. They are based on a simpler and self-contained meta-model. 
In essence, you just have classes, properties, and enumerated values; range and domain indications, and subclass/subproperty relations.
> 
> As a note, I think math.schema.org is another vocabulary that schema.org
> needs, but I sense I'm misunderstanding the point.

This will likely not pass the Alexa 1M test.

Note that schema.org is not the attempt to build a generic, true conceptual model of the world as a whole. While we draw upon the history and techniques from ontology engineering in the more philosophical sense of the term, the overall goal is enhancing information extraction at Web scale. We typically do not argue whether a caterpillar and a butterfly are the same entity, unless it matters for a practically more useful modeling pattern.

Hope that helps!

Best

Martin


> 
> On Thu, 28 May 2015, Dan Brickley wrote:
> 
>> Date: Thu, 28 May 2015 22:18:58 +0100
>> From: Dan Brickley <danbri@google.com>
>> To: Barry Carter <carter.barry@gmail.com>
>> Cc: schema.org Mailing List <public-schemaorg@w3.org>
>> Subject: Re: Propose astro.schema.org
>> On 27 May 2015 at 22:11, Barry Carter <carter.barry@gmail.com> wrote:
>>> I'd like to propose an ontology for astronomical objects such as stars,
>>> planets, satellites, asteroids/planetoids, etc.
>>> 
>>> We could either use the existing OWL astronomy ontology:
>>> 
>>> http://www.astro.umd.edu/~eshaya/astro-onto/ontologies/astronomy.html
>>> 
>>> or create a simplified subset.
>>> 
>>> I'll flesh this out a bit more if there is sufficient community interest.
>> 
>> Thanks. Are you aware of publishers who are putting relevant
>> structured information into HTML sites already that would be
>> candidates for adoption? Or that are doing other kinds of Web-based
>> data sharing (XML/CSV/JSON etc.)...?
>> 
>> Dan
>> 
> 
> 

Received on Friday, 29 May 2015 09:49:47 UTC