W3C home > Mailing lists > Public > public-html-data-tf@w3.org > October 2011

Re: Multiple itemtypes in microdata

From: Bradley Allen <bradley.p.allen@gmail.com>
Date: Fri, 14 Oct 2011 16:45:22 -0700
Message-ID: <CAKpM4L=H3F49pAkgDErkpExXB3SRNn2LwrxHVr3Y0komd6CT1Q@mail.gmail.com>
To: Ian Hickson <ian@hixie.ch>
Cc: Jeni Tennison <jeni@jenitennison.com>, StÚphane Corlosquet <scorlosquet@gmail.com>, "public-html-data-tf@w3.org" <public-html-data-tf@w3.org>
Hixie- Responses within. - BPA

Bradley P. Allen
http://bradleypallen.org

On Fri, Oct 14, 2011 at 3:18 PM, Ian Hickson <ian@hixie.ch> wrote:
> On Thu, 13 Oct 2011, Bradley Allen wrote:

>
>> An annotation is a statement added to a document post-publication, with
>> the intent of providing commentary or gloss on the original text.
>
> Can you elaborate on how such an annotation would make its way onto the
> page that contains the document? By "post-publication" do you mean after
> the HTML document is put on the Web, or something else?
>
> Microdata doesn't really help with post-publication annotation, since you
> have to be able to edit the document to add microdata.
>

By "post-publication," I mean "after initial publication." A document
being annotated in this fashion would have many revisions, each new
revisions being posted to the Web. This is typical practice in print
publishing today; on the Web, we use content management and editiorial
systems to update and republish content that has changed in this
fashion.

>
>> But the specific use case is one where someone has provided a statement,
>> in the context of an existing document, that adds an additional
>> statement to it.
>
> I don't really understand what that means. Can you show me an example? (A
> live example of a real case would be ideal.)
>

Here is perhaps the archetypal example.

In 1634, Pierre Fermat took a copy of Diophantus' Arithmetica and
wrote the equivalent of the following in the margin of one of the
pages:

<p itemscope itemtype="http://purl.org/ao/core/Annotation
http://swan.mindinformatics.org/ontologies/1.2/discourse-elements/ResearchStatement">
  No three positive integers a, b, and c can satisfy the equation a^n
+ b^n = c^n for any integer value of n greater than two.
</p>

It was a statement that later came to be know as Fermat's Last
Theorem, and it resulted in a line of mathematical research that
culminated in the publication of Andrew Wiles' proof 358 years later.

It was an annotation that was placed by Fermat into the context of the
book, on the page on which it was written.

It was not a statement that originally occurred elsewhere in the book.

>
>> Individual items can have different senses. We can represent these
>> different senses as distinct types. Those types can be obtained from
>> different vocabularies.
>
> I'm not sure we are using the word "item" in the same way here. An "item"
> is just a self-contained group of name-value pairs, such as a particular
> instance of movie metadata, or a particular instance of the description of
> a hypothesis, or some such.
>

That's an intensional way of describing an item. I'm using an
extensional way. They amount to the same thing.

>
>> A research statement is an assertion, for example, of an observation or
>> hypothesis, intended to advance a viewpoint relevant to a line of
>> research.
>>
>> Annotations can be research statements, and first-class objects distinct
>> from scholarly documents. Scholarly documents can contain research
>> statements, and scholarly documents can be annotated with annotations.
>> Not all research statements related to a document are annotations.
>
>> I'm going to want subject matter experts to provide me with different
>> microdata vocabularies to cover the different senses that I'd like to
>> capture in my structured data markup. Expecting those all to be provided
>> in a single vocabulary is unrealistic. That's the motivation for
>> supporting multiple itemtypes without the constraint that they all be
>> from the same vocabulary.
>
> You can use multiple vocabularies on one page without any trouble today.
> You would not typically have a single item that uses multiple vocabularies
> in the cases you've described. An instance of an annotation is not also an
> instance of a research statement -- you might annotate a research
> statement, but they are not one and the same. Right?
>
> To put it another way, you could annotate a research statement twice,
> right? And the annotations wouldn't be the same annotation.
>

The counterexample of Fermat's Last Theorem shows that we can have
research statements that are annotations.

>
>> The nature of research communication is changing from being purely
>> focused on print--centric containers of information such as journal
>> article and books chapters (i.e., the traditional notion of a scholarly
>> document), to a much finer-grained representation of statements derived
>> from experimental data that can be aggregated into scholarly documents.
>> The move from print to digital has enabled this new freedom of
>> expression. We must be careful not to constrain ourselves to thinking
>> and working in the old metaphors.
>
> It sounds somewhat like rather than wanting to put hypotheses and
> annotations and so forth in Web pages that are primarily prose, what you
> are describing and what I've seen in the documents you cited above is more
> a database that would be directly filled in, in which case microdata
> really has no bearing on the discussion. You wouldn't want to use
> microdata unless the document you are annotating is primarily prose --
> articles, book chapters, and the like. If the data is primarily this
> structured information, HTML isn't the right place to put it. It should
> just be put straight into its native form in the database.
>

The problem that arises by leaving this kind of rich scientific
content in databases is that it becomes part of the deep Web, and
hence undiscoverable using resources like Google. By expressing these
statements in HTML, and using microdata to capture structured data
that places statements on the Web page in the correct context of the
(primarily prose-centric) scientific discourse of which they are part,
we can enable both better discoverability and a much richer user
experience for the researcher. In addition, embedding the structured
data in the page and allowing processing that uses the structured data
to drive the user experience in the browser, rather than requiring
calls to a back-end database, yields significant benefits for us
operationally. Working examples of this enriched content can be seen
at the work we've done at Elsevier as part of our Article of the
Future effort (http://www.articleofthefuture.com/).

I'm making these arguments because I think HTML5 and microdata would
be a wonderful way to deliver this kind of enriched content. -
regards, BPA
Received on Friday, 14 October 2011 23:46:00 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 14 October 2011 23:46:02 GMT