W3C home > Mailing lists > Public > public-vocabs@w3.org > July 2013

Re: schema.org growth what are the limits?

From: Martin Hepp <martin.hepp@ebusiness-unibw.org>
Date: Mon, 29 Jul 2013 23:40:01 +0200
Cc: Bernard Vatant <bernard.vatant@mondeca.com>, "public-vocabs@w3.org" <public-vocabs@w3.org>
Message-Id: <37E4AD09-94BD-4016-A0BF-F6A45867F332@ebusiness-unibw.org>
To: Guha <guha@google.com>
Dear Guha:

On Jul 29, 2013, at 10:59 PM, Guha wrote:

> Based on the size of other 'vocabularies' (like the set of apis for platforms), I don't believe that 1000 is a limit.

Note that I was referring to the number of *types* in a feasible *core* hierarchy, not properties. I think we can have 20 - 50 properties per type then.

I stick to my prediction from [1] that the number of adopters of a standard is decreasing logarithmically with the number of conceptual elements in the standard. 

Also note that I meant the "core" of schema.org. If there will be more than 1,000 types, one will need a more powerful architecture for extensions or modularization. Also, a lot of specific conceptual elements could be delegated to external, community-driven platforms like Freebase, DBPedia, WikiData, or www.productontology.org.

And while this is a partly unfair comparison: HTML5 defines just ca. 100 elements, Python ca. 30 keywords ;-)

Finally, you (Google ;-)) still have text and NLP to infer more specific entity types in Web content :-)

Martin

[1] http://www.heppnetz.de/files/IEEE-IC-PossibleOntologies-published.pdf



> guha
> 
> 
> On Mon, Jul 29, 2013 at 9:09 AM, Martin Hepp <martin.hepp@ebusiness-unibw.org> wrote:
> Hi Bernard,
> I do currently not find a better reference than [1], but I already said on this list that I think the schema.org-approach will scale only up to ca. 1,000 types. Otherwise, navigating the type hierarchy and learning how to use the standard will become too burdensome, and reaching consensus will become too difficult.
> 
> See also [2] on the assumed effects between vocabulary size and adoption.
> 
> One could likely push the boundaries a little bit by adopting a strictly frame-based paradigm with properties officially attached only to a type or its subtypes (i.e. no global identifiers, resp. no common meaning for properties across types). This would free us from the need to find catchy, intuitive, yet generally valid names for properties (e.g. "effect" for a MedicalTreatment could mean something different than "effect" for WebService; all property names and types made up in this example).
> 
> Then schema.org could maybe grow to a somewhat bigger, rather "flat" collection of types and associated properties.
> 
> Personally I am convinced that 1,000 well-chosen types in combination with the additionalType property will be sufficient for very, very powerful modeling. On the other hand, I would be very hesitant to accept big bulk imports of types from external schemas. Let's delegate the more specific (and also more frequently changing, see [3]) specializations to Wikipedia-based services, like www.productontology.org or Wikidata.
> 
> Martin
> 
> [1] http://lists.w3.org/Archives/Public/public-vocabs/2013Jan/0059.html
> [2] http://www.heppnetz.de/files/IEEE-IC-PossibleOntologies-published.pdf
> [3] http://www.heppnetz.de/files/ConceptualDynamics-EKAW2008-CRC-final6.pdf
> 
> On Jul 26, 2013, at 4:13 PM, Bernard Vatant wrote:
> 
> > Hello all
> >
> > This is a question I has been wanting to push here for quite a while.
> > If my counting are correc, schema.org latest version has 428 classes + 582 properties = 1010 elements.
> > The number of candidate and potential extensions is likely to grow at a steady pace. Now that a handful of early adopter industries and communities have successfully pushed their vocabularies into schema.org, many others are likely to follow when they discover their obvious interest in doing so. And this when is now or quite soon, obviously.
> >
> > This growth is a good thing, but it will, and actually has already hit known limits in this kind of exercise, which once again boils down to represent the whole world in a unique model, and a unique namespace.
> >
> > The first point is not really an issue. The semantics of schema.org are "soft" enough to accomodate slight inconsistencies between various branches of the vocabulary, for exemple the same property used here and there with slightly different semantics will not really be an issue if those branches are unlikely to be used in the same context.
> >
> > The unique namespace is another issue. Once a name has been used to identify a class or a property, it can't be reused for something else. New extensions will have to cope with the legacy. Suppose I want to use http://schema.org/study for something else than a MedicalEntity and MedicalStudy Suppose DDI people want to introduce their concept of Study [1]. What will be the negotiation process?
> >
> > More generally is there a limit one could set for a manageable sensible size of the vocabulary? 10,000? 100,000?
> > Is there a plan of any kind to put a limit in size or in time to the vocabulary growth?
> >
> > Thanks for your thoughts.
> >
> > Bernard
> >
> > [1] http://rdf-vocabulary.ddialliance.org/discovery
> >
> >
> >
> >
> > --
> > Bernard Vatant
> > Vocabularies & Data Engineering
> > Tel :  + 33 (0)9 71 48 84 59
> > Skype : bernard.vatant
> > Blog : the wheel and the hub
> > Linked Open Vocabularies : lov.okfn.org
> > --------------------------------------------------------
> > Mondeca
> > 3 cité Nollez 75018 Paris, France
> > www.mondeca.com
> > Follow us on Twitter : @mondecanews
> > ----------------------------------------------------------
> > Mondeca is co-chairing
> > Long-term Preservation and Governance of RDF Vocabularies
> > at Dublin Core Conference
> > <dc2013-Lisbon.jpg>
> 
> --------------------------------------------------------
> martin hepp
> e-business & web science research group
> universitaet der bundeswehr muenchen
> 
> e-mail:  hepp@ebusiness-unibw.org
> phone:   +49-(0)89-6004-4217
> fax:     +49-(0)89-6004-4620
> www:     http://www.unibw.de/ebusiness/ (group)
>          http://www.heppnetz.de/ (personal)
> skype:   mfhepp
> twitter: mfhepp
> 
> Check out GoodRelations for E-Commerce on the Web of Linked Data!
> =================================================================
> * Project Main Page: http://purl.org/goodrelations/
> 
> 
> 
> 
> 

--------------------------------------------------------
martin hepp
e-business & web science research group
universitaet der bundeswehr muenchen

e-mail:  hepp@ebusiness-unibw.org
phone:   +49-(0)89-6004-4217
fax:     +49-(0)89-6004-4620
www:     http://www.unibw.de/ebusiness/ (group)
         http://www.heppnetz.de/ (personal)
skype:   mfhepp 
twitter: mfhepp

Check out GoodRelations for E-Commerce on the Web of Linked Data!
=================================================================
* Project Main Page: http://purl.org/goodrelations/
Received on Monday, 29 July 2013 21:40:33 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:29:28 UTC