Re: Expected type(s) for email should include URL from Jarno van Driel on 2013-07-30 (public-vocabs@w3.org from July 2013)

From: Jarno van Driel <jarno@quantumspork.nl>
Date: Tue, 30 Jul 2013 11:00:39 +0200
To: Martin Hepp <martin.hepp@ebusiness-unibw.org>
Cc: Stéphane Corlosquet <scorlosquet@gmail.com>, Aaron Bradley <aaranged@gmail.com>, Public Vocabs <public-vocabs@w3.org>
Message-ID: <CAFQgrbYxvw8OmXe8xLcPH29pKm_T4B_bW3S7jgANP3QkieZZxQ@mail.gmail.com>
Hi Martin,

First of, thanks for giving my 'complaint' a serious reply. That's
actually very much what I was hoping for. I am not out to be the
proverbial pain in the *** but am very hopeful to get some sort of a
discussion going about the 'learnability' of structured data. (sorry
to Aaron if I deviate from his original post with this).

I do remember the days before schema.org though. I have build sites
which later have been mentioned as 'early adapters' of semantic
technologies (RDFa to be more specific). And true, the difficulty of
finding information in those days was absolute madness. It almost
drove me to a point to let the entire semantic 'thing' go, so I
definitely acknowledge the point you make with this.

Since those days things have improved and we have many people to be
thankful for this. But IMHO where the technologies surrounding
semantics have made great steps forward, the information needed to get
a proper understanding of it all, not so much, or at least not in the
same order.

Where my issue mostly lies is that there is a big gap between the
'academic' discussion surrounding semantics and the information
available for people who want to get into it. Sure, by now there are
plenty of tutorials/guides/articles which help you get past the
introductory level of structured data but from that point onward it's,
sorry to say, a mess.

One of the things I am wondering (and am frustrated about) is how can
we get information organised and extended at the same speed as the
technologies do and with the same dedication as well. Because if
structured data needs to implemented by the masses something has to be
done about the information available, how it's kept up-to-date and how
to clarify it with real-life examples.

Surely, within the limits you described, the search engines behind
schema.org have the capacity to do so and shouldn't we expect them to
do as well? Because it seems unrealistic to expect this from all the
volunteers who help the technology advance (not saying the community
has no part to play in this).

On Tue, Jul 30, 2013 at 7:28 AM, Martin Hepp
<martin.hepp@ebusiness-unibw.org> wrote:
> Hi Jarno:
>
> On Jul 30, 2013, at 12:21 AM, Jarno van Driel wrote:
>
>> Isn't the amount of reference pages given in this discussion a bit
>> crazy? I have been been playing around with structured data for years
>> now and all that time from all kind of angles new details, articles
>> and specifications keep popping up. How is somebody supposed to know
>> where to look anymore or is it just me? Can we expect people (who want
>> to start with semantics) to look at so many different sources and
>> shouldn't it be centralised a lot more?
>>
>
> I think you are right, but the situation is a bit more complicated.
>
> First, schema.org *did* a lot to provide a one-stop reference for the vocabulary for webmasters. Before schema.org was launched, you e.g. combined FOAF properties with GoodRelations types and DBPedia enumerations, and you had to be a real master to know where to look. The broad community may not be aware of that, but before schema.org, it was a lot worse ;-)
>
> Second, marking up data is by orders of magnitude (potentially) more complex than simple HTML design, since you have to translate arbitrary data from a site and its back-end systems to a global data structure. That is thirty years of data integration and knowledge engineering pains in essence. It just has many additional degrees of freedom and is much more difficult to validate. You cannot simply check wether it looks right in a browser.
>
> Third, and most importantly, there are two distinct aspects of schema.org documentation
>
> a) the conceptual model - how information should be structured in a common way
> b) the implementation  - how search engines consume schema.org markup
>
> It was already a very remarkable achievement (in particular legally) that the big search engines were actually able to jointly release and promote schema.org.
>
> However,
>
> 1. they remain competitors, and thus everything related to what they *do* with the respective technology, e.g. how they parse, cleanse, rank, render, ... the data is not a concerted action, and, for fear of anti-trust laws, they must be very careful with joint action in their product space. Otherwise they risk being faced with anti-competitive conduct charges. Also, there is likely a lot of valuable business know-how in the implementation details. This is why a validation tool that shows how Google/Bing/Yahoo/Yandex understand your mark-up is not part of the schema.org site.
>
> 2. They cannot give guarantees on how the markup will be considered because that invites spam and other forms of abuse. Instead, they have to learn constantly on how to distinguish trustworthy data markup from unreliable or fraudulent one.
>
> Clearly, schema.org should contain more examples, but on the other hand, there are only three resources that are currently most relevant:
>
> 1. The schema.org page
>      http://schema.org
>
> 2. The sytax specifications for RDFa 1.1 and Microdata
>     http://www.w3.org/TR/rdfa-core/
>     http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html
>
> 3. The Google Structured Data Testing Tool
>     http://www.google.com/webmasters/tools/richsnippets
>
>
> Martin
>
>
>> On Tue, Jul 30, 2013 at 12:10 AM, Stéphane Corlosquet
>> <scorlosquet@gmail.com> wrote:
>>>
>>> On Mon, Jul 29, 2013 at 4:37 PM, Aaron Bradley <aaranged@gmail.com> wrote:
>>>>
>>>> The expected type for the property "email" [1], used on the types
>>>> Organization, Person and ContactPoint, is text.
>>>>
>>>> However, an email address is as often as not expressed as a mailto:
>>>> address.  And, in fact, almost all of the schema.org microdata examples that
>>>> include this property express it as a URL, such as this example for Person
>>>> [2]:
>>>>
>>>> <a href="mailto:jane-doe@xyz.edu" itemprop="email">jane-doe@xyz.edu</a>
>>>>
>>>> Google's Structured Data Testing Tool [3] does not complain if the Person
>>>> example is run through it, but Google's Schema Validator [4] returns this
>>>> warning - as it should, as per the spec:
>>>>
>>>> The property http://schema.org/email expects a value of type Text
>>>>
>>>> Note that it's perfectly possible for a page to legitimately use something
>>>> other than text for the hyperlink anchor:
>>>>
>>>> <a href="mailto:jane-doe@xyz.edu" itemprop="email"><img
>>>> src="gigantic-email-me-now-button.jpg"></a>
>>>>
>>>> Given this, doesn't it make sense to have the expected types for email to
>>>> be set to "Text or URL", as with (for example) the property "menu"?
>>>
>>>
>>> I like this idea of being tolerant on the expected value, and that's more
>>> line with other properties in schema.org like you point out. This is implied
>>> in the documentation about expected types [1], which says:
>>>
>>>> When browsing the schema.org types, you will notice that many properties
>>>> have "expected types". This means that the value of the property can itself
>>>> be an embedded item (see section 1d: embedded items). But this is not a
>>>> requirement—it's fine to include just regular text or a URL.
>>>
>>>
>>> I also expect some people will not necessarily always remember or care to
>>> append mailto: in the case for example where an email address would be
>>> displayed without a link, but simply as text in a span element (visible to
>>> the end user).
>>>
>>> Steph.
>>>
>>> [1] http://schema.org/docs/gs.html#schemaorg_expected
>>>
>>>>
>>>> [1] http://schema.org/email
>>>> [2] http://scheam.org/Person
>>>> [3] http://www.google.com/webmasters/tools/richsnippets
>>>> [4] https://developers.google.com/gmail/schemas/testing-your-schema
>>>> [5] http://schema.org/menu
>>>>
>>>
>>
>
> -------------------------------------------------------
> martin hepp
> e-business & web science research group
> universitaet der bundeswehr muenchen
>
> e-mail:  hepp@ebusiness-unibw.org
> phone:   +49-(0)89-6004-4217
> fax:     +49-(0)89-6004-4620
> www:     http://www.unibw.de/ebusiness/ (group)
>          http://www.heppnetz.de/ (personal)
> skype:   mfhepp
> twitter: mfhepp
>
> Check out GoodRelations for E-Commerce on the Web of Linked Data!
> =================================================================
> * Project Main Page: http://purl.org/goodrelations/
>
>
>
Received on Tuesday, 30 July 2013 09:01:11 UTC