Re: Multiple itemtypes in microdata from Bradley Allen on 2011-10-14 (public-html-data-tf@w3.org from October 2011)

From: Bradley Allen <bradley.p.allen@gmail.com>
Date: Thu, 13 Oct 2011 22:51:12 -0700
To: Ian Hickson <ian@hixie.ch>
Cc: Stéphane Corlosquet <scorlosquet@gmail.com>, "public-html-data-tf@w3.org" <public-html-data-tf@w3.org>
Message-ID: <CAKpM4Lm9J=JUsSzJjVKyK-dBOLvVEOdo5c4DE5wXzH67Xbknpg@mail.gmail.com>
Hixie- Responses inline.

On Thursday, October 13, 2011, Ian Hickson <ian@hixie.ch> wrote:
> On Thu, 13 Oct 2011, Bradley Allen wrote:
>>
>> SWAN provides a vocabulary for describing scientific hypotheses; AO
>> provides a vocabulary for annotation of scholarly documents.
>
> Do you have any links to pages where I might learn more about these? (I
> tried searching Google for [SWAN scientific hypotheses] and [AO scholarly
> documents] but didn't get any useful results. :-( )

Try these:
SWAN: http://www.w3.org/TR/hcls-swan/
AO: http://code.google.com/p/annotation-ontology/

>
>> They are distinct vocabularies, developed for distinct purposes. Due to
>> their highly technical nature, they are unlikely to be specializations
>> of any meaningful class within schema.org.
>
> Yes, I wouldn't expect schema.org to have any relevance to this particular
> use case.
>

Agreed.

>
>> Furthermore, tools and workflows have been created to produce and
>> consume content marked up with these vocabularies, to provide support
>> for peer review and collaborative research, for example in the context
>> of communities like the Alzheimer Research Forum
>> (http://www.alzforum.org).
>
> Do you have any links to documentation about these consuming tools? I
> would love to be able to study them further.
>

Check out http://www.alzforum.org/res/adh/default.asp for a discussion of
the use of SWAN at www.alzforum.org. Additionally, Paolo Ciccarese's deck on
AO talks about tools and integration across a number of
ontologies/vocabularies:
http://www.slideshare.net/paolociccarese/history-and-overview-of-ao-annotation-ontology
.

>
>> As a publisher of scientific content, IMO HTML5 with microdata would be
>> a valuable delivery format for scholarly content marked up with such
>> structured data. What I would like to do, in that case, is be able to
>> express the following as something that subject matter expert could
>> insert into a article about Alzheimer's Disease:
>>
>> <p itemscope itemtype="http://purl.org/ao/core/Annotation
>>
http://swan.mindinformatics.org/ontologies/1.2/discourse-elements/ResearchStatement
">
>>   Testosterone may play an important role in the prevention of
>> Alzheimer's Disease (AD) in men.
>> </p>
>>
>> The content of the <p> tag is both a ResearchStatement and an
>> Annotation.
>
> I assume those two itemtypes are supposed to be examples (they both 404).
> I'm not sure what an "Annotation" is supposed to be here. I would presume
> a research statement is a property of a scholarly document.
>

An annotation is a statement added to a document post-publication, with the
intent of providing commentary or gloss on the original text.

A research statement is an assertion, for example, of an observation or
hypothesis, intended to advance a viewpoint relevant to a line of research.

Annotations can be research statements, and first-class objects distinct
from scholarly documents. Scholarly documents can contain research
statements, and scholarly documents can be annotated with annotations. Not
all research statements related to a document are annotations.

> Assuming that there is an HTML page that is a scholarly document and that
> contains scientific hypotheses, you can use microdata today to mark such a
> page up with no problems, as far as I can tell. No single item would be
> both a scholarly document and a scientific hypothesis; instead you would
> mark them up, something like this:
>
>   ...
>   <body itemscope itemtype="http://swan.example.org/scholarly-doc">
>    <h1>A study of birds</h1>
>    <section>
>     <h1>Abstract</h1>
>     <p itemprop="abstract">We look at birds and see if they have
>     wings or lips.</p>
>    </section>
>    <section>
>     <h1>Hypotheses</h1>
>     <p itemscope itemtype="http://example.net/ao/hypothesis">
>      <span itemprop="description">We hypothesise that birds have
>      wings.</span> <span itemprop="reason">We base this on the
>      circumstancial evidence that birds have wings according to the
>      dictionary.</span>
>     </p>
>     <p itemscope itemtype="http://example.net/ao/hypothesis">
>      We also consider lips. <span itemprop="description">We presume
>      that birds have lips.</span> You may ask why we think that. <span
>      itemprop="reason">We are just guessing.</span>
>     </p>
>    </section>
>    ...
>   </body>
>

The nature of research communication is changing from being purely focused
on print--centric containers of information such as journal article and
books chapters (i.e., the traditional notion of a scholarly document), to a
much finer-grained representation of statements derived
from experimental data that can be aggregated into scholarly documents. The
move from print to digital has enabled this new freedom of expression. We
must be careful not to constrain ourselves to thinking and working in the
old metaphors.

> This is a microdata document with three items. As JSON, this data looks
> like this:
>
> {
>  "items": [
>    {
>      "type": "http://swan.example.org/scholarly-doc",
>      "properties": {
>        "abstract": [
>          "We look at birds and see if they have\n     wings or lips."
>        ]
>      }
>    },
>    {
>      "type": "http://example.net/ao/hypothesis",
>      "properties": {
>        "description": [
>          "We hypothesise that birds have\n      wings."
>        ],
>        "reason": [
>          "We base this on the\n      circumstancial evidence that birds
have wings according to the\n      dictionary."
>        ]
>      }
>    },
>    {
>      "type": "http://example.net/ao/hypothesis",
>      "properties": {
>        "description": [
>          "We presume   \n      that birds have lips."
>        ],
>        "reason": [
>          "We are just guessing."
>        ]
>      }
>    }
>  ]
> }
>
> (See http://goo.gl/OgF8C for a live view of this.)
>

Useful to see. But the specific use case is one where someone has provided a
statement, in the context of an existing document, that adds
an additional statement to it.

>
>> I am using emerging standard vocabularies that have been developed for
>> separate purposes in a succinct, clear manner. IMO, that is the way in
>> which most of the people at the workshop would have assumed that support
>> for multiple itemtypes would work.
>

> I don't see any reason in this case that any one item would have multiple
> types, but maybe that's because I don't understand the vocabularies in
> question. I would be very interested in studying the vocabularies and the
> software that consumes them.
>

Individual items can have different senses. We can represent these different
senses as distinct types. Those types can be obtained from different
vocabularies.

>
> Incidentally, note that you can't just take, say, an RDF vocabulary, or a
> Microformats vocabulary, and just use it in microdata directly. A
> microdata vocabulary has to define processing rules that are often not
> provided for RDF and Microformats vocabularies, and has to use the terms
> defined in the HTML specification to describe how the terms work. You can
> see examples of how to define vocabularies in the HTML standard:
>
>
http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#mdvocabs
>

Fair enough. Given that, I'm going to want subject matter experts to provide
me with different microdata vocabularies to cover the different senses that
I'd like to capture in my structured data markup. Expecting those all to be
provided in a single vocabulary is unrealistic. That's the motivation for
supporting multiple itemtypes without the constraint that they all be from
the same vocabulary.

> HTH,
> --
> Ian Hickson               U+1047E                )\._.,--....,'``.    fL
> http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
> Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
>
Received on Friday, 14 October 2011 05:51:44 UTC