W3C home > Mailing lists > Public > public-vocabs@w3.org > November 2011

Re: Need for a new type Activity

From: Martin Hepp <martin.hepp@ebusiness-unibw.org>
Date: Wed, 2 Nov 2011 17:56:16 +0100
Cc: Roy Lachica <roy@webnodes.com>, Dan Brickley <danbri@danbri.org>, "public-vocabs@w3.org" <public-vocabs@w3.org>
Message-Id: <170EBFF2-75C8-4EA6-A895-53A0719C1195@ebusiness-unibw.org>
To: Aaron Bradley <aaranged@yahoo.com>
Hi Aaron:

A centralized schema must not exceed a certain size; otherwise it is unmanageable and hard to navigate. Even Google could not easily maintain enumeration for arbitrary domains.

Take for example recipes: If you want to encode herbs used for a recipe, DBPedia/Wikipedia provides ca. 200 Web-scale identifiers, most of them defined in multiple languages and with a picture showing the herb.

Importing all those into schema.org would quickly break the whole schema.org approach.

I may have, unintentionally, scared potential adopters by Linked Data geek speak of "DBPedia identifiers".

What I mean by that is pretty simple and straighforward:

For a certain Microdata property with the defined range "URL" or "DBPedia URL" (tbd),

1. Search the English Wikipedia for the best matching page

Example:

Oregano --> http://en.wikipedia.org/wiki/Oregano

2. Strip off http://en.wikipedia.org/wiki

--> /Oregano

3. Attach the DBPedia base URI http://dbpedia.org/resource/

--> http://dbpedia.org/resource/Oregano

4. Use this with the respective property

<div itemscope itemtype="http://schema.org/Recipe">
  <span itemprop="name">Oregano Bread</span>
  <link itemprop="ingredients" href="http://dbpedia.org/resource/Oregano" /> Oregano
...
</div>

I think this is a fairly feasible approach, with several nice features:

1. Broad coverage
2. Many translations
3. Good textual definitions
4. Typically an image
5. Broad community involvement (most popular pages have undergone 200+ public edits and thus been subject to much more intense discourse than any other dictionary of named entities).
6. Fairly stable URIs (see [1].

Note that while Google is big, the staffing available for maintaining schema.org is likely very limited. It makes no sense to reinvest
the 350 man-years needed to complete Cyc, the all-purpose ontology of the world.

I can tell you from the work on GoodRelations that the effort for maintaining a global schema is grossly underestimated. GoodRelations with ca. 30 classes and ca. 90 properties took more than 10 man-years to build and maintain to date.

Martin



[1] Hepp, Martin; Siorpaes, Katharina; Bachlechner, Daniel: Harvesting Wiki Consensus: Using Wikipedia Entries as Vocabulary for Knowledge Management, IEEE Internet Computing, Vol. 11, No. 5, pp. 54-65, Sept-Oct 2007.
Additional data and materials related to this paper are available at a dedicated Web page: http://www.heppnetz.de/harvesting-wikipedia/

PDF at http://www.heppnetz.de/files/hepp-siorpaes-bachlechner-harvesting%20wikipedia%20w5054.pdf


On Nov 2, 2011, at 5:26 PM, Aaron Bradley wrote:

>> From: Roy Lachica <roy@webnodes.com>
> 
> 
>> Using external identifiers e.g. from DBPedia or Freebase sounds like a good solution. However the drawback of this approach is that it creates confusion for "ordinary" web developers and webmasters. At the moment Schema.org is so simple that "anyone" can use it. If we begin to rely on webmasters to use external identifiers and begin to encourage webmasters to mix vocabularies you create confusion about what identifiers and vocabs to use, how to use them, where to find them, etc. 
> 
> 
> I could not agree more.  In some ways, relying on external namepaces and the syntactical complexity that comes with correctly employing them belies the whole point of schema.org.  If the answer to "I need a new type" is "use this-or-that external namespace (and, in the process, ditch microdata and employ RDFa instead)" then why have an extension mechanism for schema.org?
> 
> 
> Martin's follow-up point that it is impossible for "maintain an up-to-date inventory of value definitions across all industries" and "that referring to DBPedia ... is acceptable for Web developers" raises a couple of issues.  Sure, neither schema.org nor another vocabulary will encompass all terms that are required by webmasters - well-identified by many commentators as the problems implicit in trying to create an "ontology of everything" - but in that case what are the boundaries of schema.org?  The limits of schema.org are not ill defined, they are not defined at all.  And in this there is a disconnect between enterprise outfits (and semantic web developers) and Roy's "ordinary" webmasters which threatens to alienate the latter.
> 
> An example of what I mean relates to the evolution of the software extension in schema.org.  An "ordinary" webmaster wanted to know whether software was suitably handled by the CreativeWork type, and how he would go about correctly go about extending it:
> 
> http://groups.google.com/group/schemaorg-discussion/browse_thread/thread/fd2d007dd60e4c5e?pli=1
> 
> Crickets.
> 
> But in September Google determined it should extend CreativeWork to include SoftwareApplication:
> http://www.google.com/support/webmasters/bin/answer.py?answer=1645432
> 
> Viola, a well-formed, well-documented extension - with, of course, plenty of incentive for webmasters dealing with software to adopt it, as presumably Google wouldn't have developed the extension unless they were planning on parsing it.  If Google was Igor would we have pointed them to DBpedia?  Told them that you can't have an ontology of everything?
> 
> (This also raises the interesting point of how an extension makes it "officially" into the schema.org vocabulary.  Are we waiting to see whether or not it gains "significant adoption on the web" so that it "may be moved into the core schema.org vocabulary"?  How did the properties from rNews make it onto the site (and I'm still looking for a reasoned response to my "ordinary" webmaster concerns about tickerSymbol:)?  Not a complaint or an accusation, I genuinely don't know the process, or if there even is a process.)
> 
> 
> All of this is in turn related to what is reasonable for "ordinary" webmasters.  While "referring to DBPedia" may be "acceptable for Web developers" it is not appropriate for "ordinary" webmasters that are doing their best to nail basic microdata syntax and incorporate schema.org-based markup onto their sites.  Anytime you want to ask a question of an "ordinary" webmaster I'll happily volunteer - I'm don't think I'm a technical slouch, but I'm certainly no developer.  I (think) I can deal with schema.org microdata fairly aptly.  But if one were to tell me, when I inquired about an extension, "use DBPedia" I wouldn't have the faintest clue how to incorporate this into my code, I would have absolutely no idea if this accomplished the stated schema.org goal of allowing webmasters to "markup their pages in ways recognized by major  search providers," and I would little possibility of garnering development resources to do so without being able to make a
> reasonable case being made for the return on investment.
> 
> 
>> A place to see the extensions used by webmasters would be really cool.
>> A
> webpage that shows all extensions with the number of sites using it and
> the number of pages would bring more transparency to the process.
> 
> I've argued for something along these lines before, and I could not agree more!
> 
> Thanks,
> Aaron
> 
> 
>> 
>> Dan
>> I understand adding all these subtypes pose all kinds of problems such as ontological and perhaps even political issues. Anyway if you at least just add the new type Activity webmasters could use extensions to define the subtypes so you (Schema.org) don't have to take a stance on what subtypes to define. In a  bottom up way you could then later define the sub types based on the extensions we provide.  
>> I would also guess that type-subtypes are better than enumerations here since it would allow for further sub types. Activity sub types might perhaps also have different properties. 
>> 
>> I strongly believe a new type Activity would be a great contribution as a large portion of the content on the web today is about activities. E.g. pages about working, talking, eating, running, sleeping, drinking, learning, dancing, teaching, driving, making love, fighting, bird watching etc. 
>> 
>> A place to see the extensions used by webmasters would be really cool.
>> A webpage that shows all extensions with the number of sites using it and the number of pages would bring more transparency to the process.
>> 
>> 
>> Best regards
>> .roy
>> 
>> 
>> 
>> -----Original Message-----
>> From: Martin Hepp [mailto:martin.hepp@ebusiness-unibw.org] 
>> Sent: 28. oktober 2011 20:49
>> To: Dan Brickley
>> Cc: Roy Lachica; public-vocabs@w3.org
>> Subject: Re: Need for a new type Activity
>> 
>> I can strongly recommend the approach taken in e.g. http://purl.org/vso/ns: 
>> 
>> 1. Define a class for the type of a value 2. Recommend DBPedia or Freebase IDs for the individuals, if they exist.
>> 3. If there is no standardized identifier yet, Web site owners can define one locally and hope that clients can do the entity consolidation.
>> 
>> This decouples the maintenance of the schema from the evolution of new values.
>> 
>> See e.g.
>> 
>>   http://www.heppnetz.de/ontologies/vso/ns.html#FuelTypeValue
>> 
>> Best
>> 
>> Martin
>> 
>> On Oct 28, 2011, at 2:20 PM, Dan Brickley wrote:
>> 
>>> Hi Roy,
>>> 
>>> On 26 October 2011 19:24, Roy Lachica <roy@webnodes.com> wrote:
>>>> Hi
>>>> I was just adding Schema.org for a tourist site. In particular activities (Things to do at a location) and found a need for an Activity type.
>>>> Many tourism websites separate tourist activities from tourist sites (what to do from what to see).
>>>> 
>>>> Existing types such as http://schema.org/Event seem to be meant for a single event happening at a certain time at a certain location.
>>>> http://schema.org/TouristAttraction don't really match for activities such as shopping and eating out? From Wikipedia: "A tourist attraction is a place of interest where tourists visit, typically for its inherent or exhibited cultural value, historical significance, natural or built beauty, or amusement opportunities."
>>>> A page about Hiking should have links to places where you can go hiking, but the activity hiking itself is not really a tourist attraction?
>>>> 
>>>> I would like to add activities that are also relevant in a non-tourist setting. Using TouristAttraction (typically used for things you go to see) therefore seem wrong. TouristAttraction is also a sub type of Place. Many activities are place independent.
>>>> 
>>>> I would guess that search engine queries like "What to do in Rio" are just as much used as "What to see in Rio".
>>>> When searching for Shopping it would be nice to specify that i mean the activity of shopping so I can get a list of pages about shopping rather than a list of online shops.
>>>> 
>>>> There are also many sites that contain general non-tourist activities that could benefit from an Activity type.
>>>> By having an activity type, search engine users can differentiate between football as an activity, a football club or the object football.
>>>> 
>>>> I would therefore like to suggest the type: Activity  (description: 
>>>> Something you can do at will, regularly or perhaps once in a 
>>>> lifetime)
>>>> 
>>>> I am sure someone else has better suggestions for sub types, 
>>>> properties and descriptions. I was not able to find a good taxonomy 
>>>> or vocabulary but here's a few suggestions for sub types [...]
>>> 
>>> Thanks for raising this, and the suggestions. Before jumping into 
>>> those specifics I think it's worth pointing out a difficulty we'll all 
>>> have here: if Schema.org starts to include big enumerated lists it 
>>> could become rather hard to maintain in the future. This was discussed 
>>> at last month's workshop; Guha and others suggested that Schema.org 
>>> should avoid such enumerations. Where they exist already in well 
>>> established systems, Schema.org could serve as a documentation hub, 
>>> pointing to those pre-existing lists. In this case, the scope of the 
>>> list is quite broad --- it's all things of things that people do, or 
>>> do for leisure.
>>> 
>>> One possibility to investigate here is to look to larger collections 
>>> to add in such detail. For example using a collection like Wikipedia 
>>> (or it's RDFization as DBpedia, or the proposed Wikidata work; or 
>>> Freebase...). At some point with Schema.org we have to say "ok, 
>>> enough! let's cut over to a larger community-maintained dataset". It 
>>> is not exactly clear where that cut point should be. There are points 
>>> within the current schema (eg. http://schema.org/HairSalon) where you 
>>> could argue the limit has been reached.  I wonder how many of the 
>>> activity-types listed here have nice solid obvious Wikipedia URLs 
>>> associated with them (and which are handled as wiki Categories), and 
>>> also whether they are modeled/described in anything like a consistent 
>>> manner there...
>>> 
>>> cheers,
>>> 
>>> Dan
>>> 
>> 
>> 
>> 
>> 
>> 
Received on Wednesday, 2 November 2011 16:56:52 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 22 May 2012 06:48:57 GMT