Re: Google Structured Data Testing Tool - improved support for multiple independent types from Jacob Jett on 2017-03-01 (public-schemaorg@w3.org from March 2017)

From: Jacob Jett <jjett2@illinois.edu>
Date: Wed, 1 Mar 2017 16:35:33 -0600
To: "schema.org Mailing List" <public-schemaorg@w3.org>
Cc: Brian Tremblay <schema@btrem.com>
Message-ID: <CABzPtB+=mvU-uE2Opd-2pR4uyZtbULW7iv1tm98vpRX8Nx+npQ@mail.gmail.com>
Hi,

I'm an info science expert specializing in ontologies and IR systems (who
has been lurking here for quite some time), so I thought I'd answer this
question. It directly relates to the quantity of data that you have on hand
and how deeply you want to slice and dice it. Say you have five records of
organizations in your database. The fact that two are government
organizations and two are businesses and one is a church probably doesn't
matter. At that scale a human can identify the distinctions between these
organization types without needing any mediation.

Now imagine that your database contains 5 million records of which roughly
1 million are government organizations, 2 million are businesses, 120k are
churches, and the other 1.88 million are a jumbled long tail of other
organization types not covered by the first three sub-types. This is a
situation where having narrower sub-classes definitely helps the end user
zero in on the things they are interested in. This is why taxonomies like
LoC subject headings look the way that they look (diving quite deeply into
fine-grained distinctions). There is a great deal of method to the seeming
madness of vocabulary development.

These distinctions are important for answering the basic question, "What do
you have?" (related to the question, "What do you know?").

Regards,

Jacob


_____________________________________________________
Jacob Jett
Research Assistant
Center for Informatics Research in Science and Scholarship
School of Information Science
University of Illinois at Urbana-Champaign
501 E. Daniel Street, MC-493, Champaign, IL 61820-6211 USA
(217) 244-2164
jjett2@illinois.edu

On Wed, Mar 1, 2017 at 4:15 PM, Brian Tremblay <schema@btrem.com> wrote:

> On 3/1/17 1:24 PM, Robb Shecter wrote:
>
>> Martin Hepp wrote:
>>
>>> On 01 Mar 2017, at 01:57, Brian Tremblay wrote:
>>>
>>>>
>>>> On 2/10/17 7:07 PM, Robb Shecter wrote:
>>>>
>>>>> What's the relationship between the tool's understanding of
>>>>> schema.org and the Google search engine's?
>>>>>
>>>>
>>>> Google only uses a few types. The ones I've seen used by Google
>>>> include Person, Product, Review, and Recipe. There are probably a
>>>> few others.
>>>>
>>>> a deep (or new) subclass of Organization or LocalBusiness. If
>>>>> the tool recognizes it, do you happen to know whether the
>>>>> search engine will as well?
>>>>>
>>>>
>> It's tempting but misleading to just check whether your markup has
>>> an immediate visual effect.
>>>
>>
>> True. However, the uncertainty of the extent of schema.org support
>> puts publishers in a difficult position.
>>
>> Imagine I have an Organization record for the State of Oregon's web
>> page on my site oregonlaws.org.
>>
>> But now...I learn about the more specific...organization sub-type for
>> my content: GovernmentOrganization....But this is a potential trap:
>>
>> Google's list of supported schema types do not include
>> GovernmentOrganization. It's possible that the Google crawler will
>> interpret the token GovernmentOrganization as a typing mistake or
>> unknown type, and simply ignore it.
>>
>
> Right. That's the problem with being so far out front. schema.org is
> busy creating new vocabs without considering whether anyone is consuming
> those vocabs. We're left with increasingly specific (and complex)
> markups that increase the cost of development with no appreciable,
> verifiable benefit.
>
> In this case, is there some benefit to labeling something
> GovernmentOrganization instead of just Organization? I'd like to know
> what that benefit is. To take another example, is anyone or anything
> doing something meaningful (that is, something more than they do with
> LocalBusiness or Restaurant) with IceCreamShop? Or BarOrPub? How about
> CampingPitch or FireStation?
>
> There are probably other failure & semi-success modes I haven't
>> thought of. So to me, one problem is that there's no native way, in
>> json-ld, to identify a new & previously unknown subtype.
>>
>
> The problem is not limited to json-ld. It applies to microdata and rdfa
> as well.
>
> --
> Brian Tremblay
>
>
Received on Wednesday, 1 March 2017 22:36:49 UTC