- From: Jacob Jett <jjett2@illinois.edu>
- Date: Wed, 1 Mar 2017 16:35:33 -0600
- To: "schema.org Mailing List" <public-schemaorg@w3.org>
- Cc: Brian Tremblay <schema@btrem.com>
- Message-ID: <CABzPtB+=mvU-uE2Opd-2pR4uyZtbULW7iv1tm98vpRX8Nx+npQ@mail.gmail.com>
Hi, I'm an info science expert specializing in ontologies and IR systems (who has been lurking here for quite some time), so I thought I'd answer this question. It directly relates to the quantity of data that you have on hand and how deeply you want to slice and dice it. Say you have five records of organizations in your database. The fact that two are government organizations and two are businesses and one is a church probably doesn't matter. At that scale a human can identify the distinctions between these organization types without needing any mediation. Now imagine that your database contains 5 million records of which roughly 1 million are government organizations, 2 million are businesses, 120k are churches, and the other 1.88 million are a jumbled long tail of other organization types not covered by the first three sub-types. This is a situation where having narrower sub-classes definitely helps the end user zero in on the things they are interested in. This is why taxonomies like LoC subject headings look the way that they look (diving quite deeply into fine-grained distinctions). There is a great deal of method to the seeming madness of vocabulary development. These distinctions are important for answering the basic question, "What do you have?" (related to the question, "What do you know?"). Regards, Jacob _____________________________________________________ Jacob Jett Research Assistant Center for Informatics Research in Science and Scholarship School of Information Science University of Illinois at Urbana-Champaign 501 E. Daniel Street, MC-493, Champaign, IL 61820-6211 USA (217) 244-2164 jjett2@illinois.edu On Wed, Mar 1, 2017 at 4:15 PM, Brian Tremblay <schema@btrem.com> wrote: > On 3/1/17 1:24 PM, Robb Shecter wrote: > >> Martin Hepp wrote: >> >>> On 01 Mar 2017, at 01:57, Brian Tremblay wrote: >>> >>>> >>>> On 2/10/17 7:07 PM, Robb Shecter wrote: >>>> >>>>> What's the relationship between the tool's understanding of >>>>> schema.org and the Google search engine's? >>>>> >>>> >>>> Google only uses a few types. The ones I've seen used by Google >>>> include Person, Product, Review, and Recipe. There are probably a >>>> few others. >>>> >>>> a deep (or new) subclass of Organization or LocalBusiness. If >>>>> the tool recognizes it, do you happen to know whether the >>>>> search engine will as well? >>>>> >>>> >> It's tempting but misleading to just check whether your markup has >>> an immediate visual effect. >>> >> >> True. However, the uncertainty of the extent of schema.org support >> puts publishers in a difficult position. >> >> Imagine I have an Organization record for the State of Oregon's web >> page on my site oregonlaws.org. >> >> But now...I learn about the more specific...organization sub-type for >> my content: GovernmentOrganization....But this is a potential trap: >> >> Google's list of supported schema types do not include >> GovernmentOrganization. It's possible that the Google crawler will >> interpret the token GovernmentOrganization as a typing mistake or >> unknown type, and simply ignore it. >> > > Right. That's the problem with being so far out front. schema.org is > busy creating new vocabs without considering whether anyone is consuming > those vocabs. We're left with increasingly specific (and complex) > markups that increase the cost of development with no appreciable, > verifiable benefit. > > In this case, is there some benefit to labeling something > GovernmentOrganization instead of just Organization? I'd like to know > what that benefit is. To take another example, is anyone or anything > doing something meaningful (that is, something more than they do with > LocalBusiness or Restaurant) with IceCreamShop? Or BarOrPub? How about > CampingPitch or FireStation? > > There are probably other failure & semi-success modes I haven't >> thought of. So to me, one problem is that there's no native way, in >> json-ld, to identify a new & previously unknown subtype. >> > > The problem is not limited to json-ld. It applies to microdata and rdfa > as well. > > -- > Brian Tremblay > >
Received on Wednesday, 1 March 2017 22:36:49 UTC