Re: Multiple types from different vocabularies (ACTION-7) from Dan Brickley on 2011-10-30 (public-html-data-tf@w3.org from October 2011)

From: Dan Brickley <danbri@danbri.org>
Date: Sun, 30 Oct 2011 10:55:02 +0100
To: Ivan Herman <ivan@w3.org>
Cc: Jeni Tennison <jeni@jenitennison.com>, HTML Data Task Force WG <public-html-data-tf@w3.org>
Message-ID: <CAFNgM+Yx9iYKEKen0A9-vhx8DHYSo-9qQdCmoFQxYBuMpVthYg@mail.gmail.com>
On 30 October 2011 10:02, Ivan Herman <ivan@w3.org> wrote:
> *Formally* one can of course put an
>
> <http://schema.org/type> owl:equivalentProperty <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> .

Yes, we ought to probably do that, if 'type' gets added.

> into the schema.owl file. But if we do that, strictly from the Semantic Web/OWL world (I know, this may be of no interest for many, but I still have to consider that) what this would mean is that schema.owl would push everything into OWL Full (it is not allowed in OWL DL to make statements on the core vocabulary). If anybody ever would like to use the schema.org vocabulary within the framework of a more complex reasoning but staying within OWL DL, then that person would be doomed.

I'd be suprised if we're not already in Full with most real-world
Schema.org data.  I haven't run any formal checker over the current
OWL representation of the schema, though.

e.g. http://schema.org/docs/datamodel.html "We also expect that often,
where we expect a property value of type Person, Place, Organization
or some other subClassOf Thing, we will get a text string. In the
spirit of "some data is better than none", we will accept this markup
and do the best we can." ... this doesn't seem very DL-friendly.

Anyhow "doomed" is too strong. What we need imho is a change of
perspective towards more pipeline-based processing models for RDF
data. It is too optimistic (I won't say 'naive', but it's tempting) to
generally expect to be able to take data-bearing pages from mainstream
Web sites, pull out triples with a parser, and drop them unmediated
into a trusting, truth-centric OWL environment. Much more likely,
we'll see a more pipeline centric workflow, with all kinds of cleanup,
mappings, filterings and other enrichments happening between doing the
HTTP GET and doing anything that involves relying on the factual
claims that our parsers extract from Web pages. So It is far from
"doom" to expect large scale consumption of in-page factual data to
involve a bit of filtering, post-processing and cleanup. Not to
mention source-selection and provenance-sensitive decision making.
That seems a perfectly reasonable assumption to me.

I guess for a modest percentage of Schema.org deployments the data
will be super-high quality (eg. some opendata-in-science collab
wikis), and in those cases OWL Full might be imposing an inconvenience
on OWL-centric consumers. But since Dublin Core and FOAF are currently
also in OWL Full afaik, the problem isn't unique to Schema.org. I can
see some kind of pre-semantic robustification filter being a useful
addition to any OWL environment, ie. a filter that 'tidies' things
into a form that is more friendly towards OWL DL tooling.

All that said, OWL is a garbage-in-garbage-out machine; if you give it
true stuff, it'll figure out more true stuff. If you give it stale or
factually incorrect input, you'll get dodgy conclusions. I'm much much
more worried about that kind of 'doom' than about having instance data
'cross the streams' and talk about the built-in terminology, as are
the other Schema.org folk, hence all the concern about
author/publisher error rates and syntax simplicity.

> Strictly speaking, even OWL 2 RL would break, though that is the closest to any ground RDF reasoning around. The same holds, of course, if we used www.w3.org/ns/type. That is _one_ of the reasons I was saying that this is equally bad. (I do not think this is a far fetched possibility. Elsewhere in one of the threads the issue of mixing schema.org terms with, say, medical ontology terms came up, and that would be a very reasonable thing to do. The medical informatics world makes a significant use of OWL reasoning with their particular ontologies. Ie, we are not talking about exclusively research topics here.)

Yes, there is a lot of healthcare-related information in the
mainstream Web. If you/we can find a medical informatics person or
project who is genuinely concerned about Schema.org slipping into OWL
Full, then I'll take that as a significant issue re whether schema.org
publishes the claim " <http://schema.org/type> owl:equivalentProperty
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ."

However my suggested approach is simpler:

 * Applications that will fail if they encounter
owl:equivalentProperty claims linking OWL built-in terms with vocab
from the wild, should consider offering a robustness mode that simply
ignores such claims, treating them as non-semantic annotations that
can be removed before serious OWL processing.

> The alternative is that microdata->RDF mapping would have an extra rule that makes a mapping of schema.org/type to rdf:type, essentially breaking the uniformity of the mapping.

I wouldn't support any special case mention of schema.org in the
official mapping of microdata to RDF. If this is going to be
special-cased, then special-case it in the syntax rather than
introduce a dependency on an external project (even if it's a project
I work on).

> The same for an RDFa mapping for the schema.org vocab. A hack, that is.
>
> There is no doubt in my mind that the clean solution would be to allow for a multiple type on the microdata syntax level, just as RDFa does. Anything else is a hack, and an ugly one at that. We may have to go there, but we should be aware of what we are doing...

Yup. Has anyone talked to Hixie about this lately?

Dan

> Ivan
>
>
> On Oct 30, 2011, at 07:38 , Jeni Tennison wrote:
>
>> Picking up on this from a while ago…
>>
>> Given that a common 'other types' property is a reasonable workaround in some circumstances for the fact that microdata won't support multiple types from different vocabularies itself, we do need to sort out whether we recommend / support in a microdata-to-RDF mapping any other property.
>>
>> On 16 Oct 2011, at 10:25, Ivan Herman wrote:
>>> However, *if* we consider microdata as a simple syntax to add structured data to HTML which happens to be used by schema.org as well (even if we say that for which schema.org is the biggest 'customer'), but can also be used, eg, to encode microformat vocabularies, then using a schema.org/type is not really the good solution. Indeed, I do not see any major difference between using schema.org/type or www.w3.org/ns/type or, for that matter, the current rdf:type: indeed, in my view usage of *all* these options are equally bad insofar as it binds microdata to a particular vocabulary which, as far as I can understand it, is not the design of microdata. (Let us forget about the microdata->RDF mapping which is a different matter.)
>>
>>
>> None of these properties bind microdata to using any particular vocabulary, as it is always possible to use the URI version of the property with any type. It comes down to two factors: what property URIs we might reasonably use (eg Ivan indicated that http://www.org/type would be hard to support) and what will be easy for people to use.
>>
>> Given the three options:
>>
>> 1. http://www.w3.org/1999/02/22-rdf-syntax-ns#type
>> 2. http://www.w3.org/ns/type
>> 3. http://schema.org/type
>>
>> #2 and #3 have the clear advantages over #1 of brevity, readability and rememberability.
>>
>> #3 has the clear advantage over #2 that in the very common case where someone is using a schema.org type they do not have to write out the full URI. Here are a couple of examples:
>>
>> A. w3.org/ns/type with schema.org item type
>>
>>  <div itemscope itemtype="http://schema.org/Place" itemid="#store">
>>    <link itemprop="http://w3.org/ns/type"
>>          href="http://purl.org/goodrelations/v1#Location" />
>>    ...
>>  </div>
>>
>> B. schema.org/type with schema.org item type
>>
>>  <div itemscope itemtype="http://schema.org/Place" itemid="#store">
>>    <link itemprop="type" href="http://purl.org/goodrelations/v1#Location" />
>>    ...
>>  </div>
>>
>> C. w3.org/ns/type with non-schema.org item type
>>
>>  <div itemscope itemtype="http://purl.org/goodrelations/v1#Location" itemid="#wc">
>>     <link itemprop="http://w3.org/ns/type"
>>           href="http://www.productontology.org/id/Public_toilet" />
>>     ...
>>  </div>
>>
>> D. schema.org/type with non-schema.org item type
>>
>>  <div itemscope itemtype="http://purl.org/goodrelations/v1#Location" itemid="#wc">
>>     <link itemprop="http://schema.org/type"
>>           href="http://www.productontology.org/id/Public_toilet" />
>>     ...
>>  </div>
>>
>> In these examples, there's really nothing in it between C and D, but there's a clear win of B (schema.org/type) over A (w3.org/ns/type) in the most common case of a schema.org item type.
>>
>> Cheers,
>>
>> Jeni
>> --
>> Jeni Tennison
>> http://www.jenitennison.com
>>
>
>
> ----
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> PGP Key: http://www.ivan-herman.net/pgpkey.html
> FOAF: http://www.ivan-herman.net/foaf.rdf
>
>
>
>
>
>
Received on Sunday, 30 October 2011 09:55:41 UTC