Re: Multiple types from different vocabularies (ACTION-7) from Ivan Herman on 2011-10-30 (public-html-data-tf@w3.org from October 2011)

From: Ivan Herman <ivan@w3.org>
Date: Sun, 30 Oct 2011 12:26:01 +0100
To: Dan Brickley <danbri@danbri.org>
Cc: Jeni Tennison <jeni@jenitennison.com>, HTML Data Task Force WG <public-html-data-tf@w3.org>
Message-Id: <D177A529-F158-4BEE-91B8-E714143952BC@w3.org>
On Oct 30, 2011, at 10:55 , Dan Brickley wrote:

> On 30 October 2011 10:02, Ivan Herman <ivan@w3.org> wrote:
>> *Formally* one can of course put an
>> 
>> <http://schema.org/type> owl:equivalentProperty <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> .
> 
> Yes, we ought to probably do that, if 'type' gets added.
> 
>> into the schema.owl file. But if we do that, strictly from the Semantic Web/OWL world (I know, this may be of no interest for many, but I still have to consider that) what this would mean is that schema.owl would push everything into OWL Full (it is not allowed in OWL DL to make statements on the core vocabulary). If anybody ever would like to use the schema.org vocabulary within the framework of a more complex reasoning but staying within OWL DL, then that person would be doomed.
> 
> I'd be suprised if we're not already in Full with most real-world
> Schema.org data.  I haven't run any formal checker over the current
> OWL representation of the schema, though.
> 

As far as I could see the schema.org OWL file is o.k., though I have not run any checker either. It is really a bunch of range and domain specification making use, for example, of OWL's union. Pretty good stuff.


> e.g. http://schema.org/docs/datamodel.html "We also expect that often,
> where we expect a property value of type Person, Place, Organization
> or some other subClassOf Thing, we will get a text string. In the
> spirit of "some data is better than none", we will accept this markup
> and do the best we can." ... this doesn't seem very DL-friendly.
> 

Correct. What this means is that some _data_ will be faulty. But at least if I use clean data, then I would like to be fine...

> Anyhow "doomed" is too strong. What we need imho is a change of
> perspective towards more pipeline-based processing models for RDF
> data. It is too optimistic (I won't say 'naive', but it's tempting) to
> generally expect to be able to take data-bearing pages from mainstream
> Web sites, pull out triples with a parser, and drop them unmediated
> into a trusting, truth-centric OWL environment.

I do not find this _that_ optimistic. Many of the microdata sites will be generated by, say, CMS systems and, if there are incentives, they would produce kosher data.

I am not naïve:-) I _know_ that many data will be dirty. I also _know_ that many (ok, most...) will not care about OWL in the first place. I can even imagine getting answers to this thread saying that what I was talking about is uninteresting because real world does not care about OWL and I should have stayed silent instead:-) But I am uneasy closing even the possibility or, say, adding extra difficulties for people to use schema.org with reasoning.


> Much more likely,
> we'll see a more pipeline centric workflow, with all kinds of cleanup,
> mappings, filterings and other enrichments happening between doing the
> HTTP GET and doing anything that involves relying on the factual
> claims that our parsers extract from Web pages. So It is far from
> "doom" to expect large scale consumption of in-page factual data to
> involve a bit of filtering, post-processing and cleanup. Not to
> mention source-selection and provenance-sensitive decision making.
> That seems a perfectly reasonable assumption to me.

Yes, I can see that point.

> 
> I guess for a modest percentage of Schema.org deployments the data
> will be super-high quality (eg. some opendata-in-science collab
> wikis), and in those cases OWL Full might be imposing an inconvenience
> on OWL-centric consumers. But since Dublin Core and FOAF are currently
> also in OWL Full afaik, the problem isn't unique to Schema.org. I can
> see some kind of pre-semantic robustification filter being a useful
> addition to any OWL environment, ie. a filter that 'tidies' things
> into a form that is more friendly towards OWL DL tooling.

Agree. That is a way out, so 'doom' was too strong. But it is a drag.

> 
> All that said, OWL is a garbage-in-garbage-out machine; if you give it
> true stuff, it'll figure out more true stuff. If you give it stale or
> factually incorrect input, you'll get dodgy conclusions. I'm much much
> more worried about that kind of 'doom' than about having instance data
> 'cross the streams' and talk about the built-in terminology, as are
> the other Schema.org folk, hence all the concern about
> author/publisher error rates and syntax simplicity.
> 
>> Strictly speaking, even OWL 2 RL would break, though that is the closest to any ground RDF reasoning around. The same holds, of course, if we used www.w3.org/ns/type. That is _one_ of the reasons I was saying that this is equally bad. (I do not think this is a far fetched possibility. Elsewhere in one of the threads the issue of mixing schema.org terms with, say, medical ontology terms came up, and that would be a very reasonable thing to do. The medical informatics world makes a significant use of OWL reasoning with their particular ontologies. Ie, we are not talking about exclusively research topics here.)
> 
> Yes, there is a lot of healthcare-related information in the
> mainstream Web. If you/we can find a medical informatics person or
> project who is genuinely concerned about Schema.org slipping into OWL
> Full, then I'll take that as a significant issue re whether schema.org
> publishes the claim " <http://schema.org/type> owl:equivalentProperty
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ."
> 
> However my suggested approach is simpler:
> 
> * Applications that will fail if they encounter
> owl:equivalentProperty claims linking OWL built-in terms with vocab
> from the wild, should consider offering a robustness mode that simply
> ignores such claims, treating them as non-semantic annotations that
> can be removed before serious OWL processing.

Yes, fine. This should be added to any guideline we produce, though.

> 
>> The alternative is that microdata->RDF mapping would have an extra rule that makes a mapping of schema.org/type to rdf:type, essentially breaking the uniformity of the mapping.
> 
> I wouldn't support any special case mention of schema.org in the
> official mapping of microdata to RDF. If this is going to be
> special-cased, then special-case it in the syntax rather than
> introduce a dependency on an external project (even if it's a project
> I work on).
> 
>> The same for an RDFa mapping for the schema.org vocab. A hack, that is.
>> 

Well... as an implementer, even if the spec does not say, that is what I would do. Ie, I will not, in an RDFa mapping, for example, read the schema.owl file and process it only for that single thing. Besides, the current extra @vocab management, which is defined in RDFa, is not required (actually, it requires the knowledge of rdfs properties only, ie, subProperties rather then owl notions). So what I would do is to hardwire a schema.org/type -> rdf:type mapping in any case!


>> There is no doubt in my mind that the clean solution would be to allow for a multiple type on the microdata syntax level, just as RDFa does. Anything else is a hack, and an ugly one at that. We may have to go there, but we should be aware of what we are doing...
> 
> Yup. Has anyone talked to Hixie about this lately?

See Jeni's answer.

To make it really clear, and to avoid any kinds of misunderstandings: If the microdata spec is not changed in this respect, then I see that adding schema.org/type is the most user friendly option for schema.org/microdata users.Also, as a consequence, it will also be used by RDFa users with the schema.org vocabulary. Ie, I agree with doing that. I am just expressing my unhappiness here:-)

Ivan

> 
> Dan
> 
>> Ivan
>> 
>> 
>> On Oct 30, 2011, at 07:38 , Jeni Tennison wrote:
>> 
>>> Picking up on this from a while ago…
>>> 
>>> Given that a common 'other types' property is a reasonable workaround in some circumstances for the fact that microdata won't support multiple types from different vocabularies itself, we do need to sort out whether we recommend / support in a microdata-to-RDF mapping any other property.
>>> 
>>> On 16 Oct 2011, at 10:25, Ivan Herman wrote:
>>>> However, *if* we consider microdata as a simple syntax to add structured data to HTML which happens to be used by schema.org as well (even if we say that for which schema.org is the biggest 'customer'), but can also be used, eg, to encode microformat vocabularies, then using a schema.org/type is not really the good solution. Indeed, I do not see any major difference between using schema.org/type or www.w3.org/ns/type or, for that matter, the current rdf:type: indeed, in my view usage of *all* these options are equally bad insofar as it binds microdata to a particular vocabulary which, as far as I can understand it, is not the design of microdata. (Let us forget about the microdata->RDF mapping which is a different matter.)
>>> 
>>> 
>>> None of these properties bind microdata to using any particular vocabulary, as it is always possible to use the URI version of the property with any type. It comes down to two factors: what property URIs we might reasonably use (eg Ivan indicated that http://www.org/type would be hard to support) and what will be easy for people to use.
>>> 
>>> Given the three options:
>>> 
>>> 1. http://www.w3.org/1999/02/22-rdf-syntax-ns#type
>>> 2. http://www.w3.org/ns/type
>>> 3. http://schema.org/type
>>> 
>>> #2 and #3 have the clear advantages over #1 of brevity, readability and rememberability.
>>> 
>>> #3 has the clear advantage over #2 that in the very common case where someone is using a schema.org type they do not have to write out the full URI. Here are a couple of examples:
>>> 
>>> A. w3.org/ns/type with schema.org item type
>>> 
>>>  <div itemscope itemtype="http://schema.org/Place" itemid="#store">
>>>    <link itemprop="http://w3.org/ns/type"
>>>          href="http://purl.org/goodrelations/v1#Location" />
>>>    ...
>>>  </div>
>>> 
>>> B. schema.org/type with schema.org item type
>>> 
>>>  <div itemscope itemtype="http://schema.org/Place" itemid="#store">
>>>    <link itemprop="type" href="http://purl.org/goodrelations/v1#Location" />
>>>    ...
>>>  </div>
>>> 
>>> C. w3.org/ns/type with non-schema.org item type
>>> 
>>>  <div itemscope itemtype="http://purl.org/goodrelations/v1#Location" itemid="#wc">
>>>     <link itemprop="http://w3.org/ns/type"
>>>           href="http://www.productontology.org/id/Public_toilet" />
>>>     ...
>>>  </div>
>>> 
>>> D. schema.org/type with non-schema.org item type
>>> 
>>>  <div itemscope itemtype="http://purl.org/goodrelations/v1#Location" itemid="#wc">
>>>     <link itemprop="http://schema.org/type"
>>>           href="http://www.productontology.org/id/Public_toilet" />
>>>     ...
>>>  </div>
>>> 
>>> In these examples, there's really nothing in it between C and D, but there's a clear win of B (schema.org/type) over A (w3.org/ns/type) in the most common case of a schema.org item type.
>>> 
>>> Cheers,
>>> 
>>> Jeni
>>> --
>>> Jeni Tennison
>>> http://www.jenitennison.com
>>> 
>> 
>> 
>> ----
>> Ivan Herman, W3C Semantic Web Activity Lead
>> Home: http://www.w3.org/People/Ivan/
>> mobile: +31-641044153
>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>> FOAF: http://www.ivan-herman.net/foaf.rdf
>> 
>> 
>> 
>> 
>> 
>> 
> 


----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf
Attachments

application/pkcs7-signature attachment: smime.p7s
Received on Sunday, 30 October 2011 11:23:58 UTC