Re: schema.org growth what are the limits? from Karen Coyle on 2013-07-26 (public-vocabs@w3.org from July 2013)

From: Karen Coyle <kcoyle@kcoyle.net>
Date: Fri, 26 Jul 2013 10:59:53 -0700
To: public-vocabs@w3.org
Message-ID: <51F2B919.5000306@kcoyle.net>
On 7/26/13 8:27 AM, Dan Brickley wrote:
>
>
> On 26 July 2013 16:13, Bernard Vatant <bernard.vatant@mondeca.com
> <mailto:bernard.vatant@mondeca.com>> wrote:

>
>     This growth is a good thing, but it will, and actually has already
>     hit known limits in this kind of exercise, which once again boils
>     down to represent the whole world in a unique model, and a unique
>     namespace.

Very large models can work well (cf various library classification 
systems) if they are well organized. However, I agree that the unique 
namespace, the shallowness of the identifiers, and the use of natural 
language terms in the identifiers do suggest that your concern is 
well-warranted.[1]  I also think that the "bottom up" approach, while 
practical in some ways, is exhibiting its weakness.

As in the point I brought up about re-defining, there are properties in 
schema.org that exhibit a kind of bias. For example, AudioObject and 
VideoObject were originally defined as "embedded non-text objects," 
undoubtedly because someone was thinking of them in terms of marking up 
a web page. Perhaps we need a concept like that of Wikipedia's neutral 
point of view - but it would be more of a 'neutral context' - to keep 
general terms from being used as if they exist only in one context or 
situation. This was the situation we found with /citation, which had 
been buried in the medical area, as I recall. Paying more attention to 
this neutrality might help prevent some of the proliferation of 
properties because the ones that are there will be more re-usable.

kc
[1] This reminds me of the big debate in library classification: must 
the notation reflect the structure of the classification. In schema, the 
"notation" (the URI) does not reveal the structure of the vocabulary. 
The advantage of this, IMO, is that we can redefine and move properties 
without affecting the identifier. The down-side is that we just might 
run out of viable terms.


>
>     The first point is not really an issue. The semantics of schema.org
>     <http://schema.org> are "soft" enough to accomodate slight
>     inconsistencies between various branches of the vocabulary, for
>     exemple the same property used here and there with slightly
>     different semantics will not really be an issue if those branches
>     are unlikely to be used in the same context.
>
>     The unique namespace is another issue. Once a name has been used to
>     identify a class or a property, it can't be reused for something
>     else. New extensions will have to cope with the legacy. Suppose I
>     want to use http://schema.org/study for something else than a
>     MedicalEntity and MedicalStudy Suppose DDI people want to introduce
>     their concept of Study [1]. What will be the negotiation process?
>
>     More generally is there a limit one could set for a manageable
>     sensible size of the vocabulary? 10,000? 100,000?
>     Is there a plan of any kind to put a limit in size or in time to the
>     vocabulary growth?
>
>
> We don't have an iron-clad policy for any of this. It's more
> opportunistic than that. If we can improve the Web by adding more types
> and properties here and there, we'll try to do so. At the same time, as
> you point out, there are natural limits. So the 'external enumerations'
> discussion was important, for example. And external efforts like
> Wikidata are very important.  My favourite example is
> http://schema.org/PlaceOfWorship ... we can't really have schema.org
> <http://schema.org> enumerate all the kinds of place of workship that
> might fit there. But we can expect schema.org <http://schema.org> to
> show how external lists (like Wikidata's, or Freebase's) plug in.
>
> For vocabulary clash, it is an issue, but perhaps we can work around it
> in many cases. I noticed rather too recently that the 'action' property
> name has already been used up for a relatively niche use (on 'Muscle'):
> http://schema.org/action - "The movement the muscle generates." What I'd
> suggest here is a combination of asking around to see if people are
> using it heavily or planning to, plus (at the search engines) looking at
> crawl data. If it isn't widely used yet, it might be a good candidate
> for renaming muscleAction. This might seem awkward, but there is a lot
> also to be gained from having a simple namespace structure...
>
> Dan
>
>
>     Thanks for your thoughts.
>
>     Bernard
>
>     [1] http://rdf-vocabulary.ddialliance.org/discovery
>
>
>
>
>     --
>     *Bernard Vatant
>     *
>     Vocabularies & Data Engineering
>     Tel : + 33 (0)9 71 48 84 59
>     Skype : bernard.vatant
>     Blog : the wheel and the hub <http://bvatant.blogspot.com>
>     Linked Open Vocabularies : lov.okfn.org <http://lov.okfn.org>
>     --------------------------------------------------------
>     *Mondeca*****
>     3 cité Nollez 75018 Paris, France
>     www.mondeca.com <http://www.mondeca.com/>
>     Follow us on Twitter : @mondecanews
>     <http://twitter.com/#%21/mondecanews>
>     ----------------------------------------------------------
>     Mondeca is co-chairing
>     Long-term Preservation and Governance of RDF Vocabularies
>     <http://dcevents.dublincore.org/IntConf/index/pages/view/vocPres>
>     at Dublin Core Conference
>
>

-- 
Karen Coyle
kcoyle@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet
Received on Friday, 26 July 2013 18:00:17 UTC