W3C home > Mailing lists > Public > public-vocabs@w3.org > July 2013

Re: schema.org growth what are the limits?

From: Bernard Vatant <bernard.vatant@mondeca.com>
Date: Mon, 29 Jul 2013 11:55:55 +0200
Message-ID: <CAK4ZFVGYiFC4_oP+060p-pV+3_eAiBpdhLmDNrTXF7OzY2zazQ@mail.gmail.com>
To: Karen Coyle <kcoyle@kcoyle.net>
Cc: "public-vocabs@w3.org" <public-vocabs@w3.org>
Dear all

Just to feed this debate of vocabulary overload I just used the Linked Open
Vocabularies aggregator endpoint to figure homonyms of schema.org elements
in other LOV vocabularies. Query and results here : http://bit.ly/1cg5BIs

Note this is a very rough query, only exact homonyms based on identical
value of rdfs:label are in the results. More would be found with more
tolerant matching algorithms. For example I was surprised not to find any
element of GoodRelations, but GoodRelations often includes comments in the
labels (such as cardinality for properties). There is also no distinction
between classes and properties. Schema.org has some couples of classes and
properties which differ only by the case such as http://schema.org/Dataset


2013/7/26 Karen Coyle <kcoyle@kcoyle.net>

> On 7/26/13 8:27 AM, Dan Brickley wrote:
>> On 26 July 2013 16:13, Bernard Vatant <bernard.vatant@mondeca.com
>> <mailto:bernard.vatant@**mondeca.com <bernard.vatant@mondeca.com>>>
>> wrote:
>>     This growth is a good thing, but it will, and actually has already
>>     hit known limits in this kind of exercise, which once again boils
>>     down to represent the whole world in a unique model, and a unique
>>     namespace.
> Very large models can work well (cf various library classification
> systems) if they are well organized. However, I agree that the unique
> namespace, the shallowness of the identifiers, and the use of natural
> language terms in the identifiers do suggest that your concern is
> well-warranted.[1]  I also think that the "bottom up" approach, while
> practical in some ways, is exhibiting its weakness.
> As in the point I brought up about re-defining, there are properties in
> schema.org that exhibit a kind of bias. For example, AudioObject and
> VideoObject were originally defined as "embedded non-text objects,"
> undoubtedly because someone was thinking of them in terms of marking up a
> web page. Perhaps we need a concept like that of Wikipedia's neutral point
> of view - but it would be more of a 'neutral context' - to keep general
> terms from being used as if they exist only in one context or situation.
> This was the situation we found with /citation, which had been buried in
> the medical area, as I recall. Paying more attention to this neutrality
> might help prevent some of the proliferation of properties because the ones
> that are there will be more re-usable.
> kc
> [1] This reminds me of the big debate in library classification: must the
> notation reflect the structure of the classification. In schema, the
> "notation" (the URI) does not reveal the structure of the vocabulary. The
> advantage of this, IMO, is that we can redefine and move properties without
> affecting the identifier. The down-side is that we just might run out of
> viable terms.
>>     The first point is not really an issue. The semantics of schema.org
>>     <http://schema.org> are "soft" enough to accomodate slight
>>     inconsistencies between various branches of the vocabulary, for
>>     exemple the same property used here and there with slightly
>>     different semantics will not really be an issue if those branches
>>     are unlikely to be used in the same context.
>>     The unique namespace is another issue. Once a name has been used to
>>     identify a class or a property, it can't be reused for something
>>     else. New extensions will have to cope with the legacy. Suppose I
>>     want to use http://schema.org/study for something else than a
>>     MedicalEntity and MedicalStudy Suppose DDI people want to introduce
>>     their concept of Study [1]. What will be the negotiation process?
>>     More generally is there a limit one could set for a manageable
>>     sensible size of the vocabulary? 10,000? 100,000?
>>     Is there a plan of any kind to put a limit in size or in time to the
>>     vocabulary growth?
>> We don't have an iron-clad policy for any of this. It's more
>> opportunistic than that. If we can improve the Web by adding more types
>> and properties here and there, we'll try to do so. At the same time, as
>> you point out, there are natural limits. So the 'external enumerations'
>> discussion was important, for example. And external efforts like
>> Wikidata are very important.  My favourite example is
>> http://schema.org/**PlaceOfWorship <http://schema.org/PlaceOfWorship>... we can't really have
>> schema.org
>> <http://schema.org> enumerate all the kinds of place of workship that
>> might fit there. But we can expect schema.org <http://schema.org> to
>> show how external lists (like Wikidata's, or Freebase's) plug in.
>> For vocabulary clash, it is an issue, but perhaps we can work around it
>> in many cases. I noticed rather too recently that the 'action' property
>> name has already been used up for a relatively niche use (on 'Muscle'):
>> http://schema.org/action - "The movement the muscle generates." What I'd
>> suggest here is a combination of asking around to see if people are
>> using it heavily or planning to, plus (at the search engines) looking at
>> crawl data. If it isn't widely used yet, it might be a good candidate
>> for renaming muscleAction. This might seem awkward, but there is a lot
>> also to be gained from having a simple namespace structure...
>> Dan
>>     Thanks for your thoughts.
>>     Bernard
>>     [1] http://rdf-vocabulary.**ddialliance.org/discovery<http://rdf-vocabulary.ddialliance.org/discovery>
>>     --
>>     *Bernard Vatant
>>     *
>>     Vocabularies & Data Engineering
>>     Tel : + 33 (0)9 71 48 84 59
>>     Skype : bernard.vatant
>>     Blog : the wheel and the hub <http://bvatant.blogspot.com>
>>     Linked Open Vocabularies : lov.okfn.org <http://lov.okfn.org>
>>     ------------------------------**--------------------------
>>     *Mondeca*****
>>     3 cité Nollez 75018 Paris, France
>>     www.mondeca.com <http://www.mondeca.com/>
>>     Follow us on Twitter : @mondecanews
>>     <http://twitter.com/#%21/**mondecanews<http://twitter.com/#%21/mondecanews>
>> >
>>     ------------------------------**----------------------------
>>     Mondeca is co-chairing
>>     Long-term Preservation and Governance of RDF Vocabularies
>>     <http://dcevents.dublincore.**org/IntConf/index/pages/view/**vocPres<http://dcevents.dublincore.org/IntConf/index/pages/view/vocPres>
>> >
>>     at Dublin Core Conference
> --
> Karen Coyle
> kcoyle@kcoyle.net http://kcoyle.net
> ph: 1-510-540-7596
> m: 1-510-435-8234
> skype: kcoylenet

*Bernard Vatant
Vocabularies & Data Engineering
Tel :  + 33 (0)9 71 48 84 59
Skype : bernard.vatant
Blog : the wheel and the hub <http://bvatant.blogspot.com>
Linked Open Vocabularies : lov.okfn.org
*Mondeca**          **                   *
3 cité Nollez 75018 Paris, France
Follow us on Twitter : @mondecanews <http://twitter.com/#%21/mondecanews>
Mondeca is co-chairing
Long-term Preservation and Governance of RDF
at Dublin Core Conference

(image/jpeg attachment: dc2013-Lisbon.jpg)

Received on Monday, 29 July 2013 09:56:45 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:49:01 UTC