RE: Annotation Serializations

Ivan-

We haven't so far done a lot of work with Open Annotation in RDFa, but in
answer to your question about whether annotations themselves marked up in
HTML could be a significant use case, one class of use cases that created
some interest in RDFa for OA is the idea of blog entries as annotations. 

So, for example, imagine that a Math Overflow user asks a question about a
proof appearing on page 17 of a 30-page article that appeared in a recent
issue of Journal of Algebra. Subsequently the author of this article issues
an errata concerning this proof which is posted to the journal publisher's
Website. Subsequent to that the Math Overflow question is answered with
reference to the substance of the errata.  Having these modeled as a chain
of annotations, albeit all embedded in HTML, might facilitating discovery
and use. Users of the publisher Website could more readily be made aware of
the Math Overflow blog entries. Users of Math Overflow would know about the
relevant data on the publisher Website -- not just the 30-page article, but
exactly where in the article the proof appeared. Ultimately, as you suggest,
Math Overflow, the publisher, or a 3rd party would likely want to store this
information somewhere as annotations. But especially at the outset or if the
ingest to the annotation store is done by a 3rd party after the HTML is
posted, having the appropriate OA serialized as RDFa and embedded in the
HTML might facilitate things.  

This is not entirely idle speculation. I've been involved in some
discussions concerning next steps to follow on last decade's World Digital
Math Library initiative, and this exact scenario has come up as of interest
with a successor to the WDML in the role of 3rd party.

Tim Cole
University of Illinois at UC


-----Original Message-----
From: Ivan Herman [mailto:ivan@w3.org] 
Sent: Sunday, January 19, 2014 7:26 AM
To: Doug Schepers
Cc: public-openannotation
Subject: Re: Annotation Serializations

Hi Doug, everybody,

I try to understand what you mean... Are we talking about some sort of a
family of use case templates? Or a formal and thorough serialization
specification in HTML, ie, some sort of a specialized RDFa? The latter may
be quite a lot of work... (having gone through the RDFa exercise myself). A
template library could probably be done more easily; for RDF usage one could
then make some sort of a preprocessor to RDFa, and then let the existing
RDFa processors take over.

I looked at your example, and, for the purpose of the discussion, I did
re-cast it into RDFa Lite. I *think* it is what you meant but probably not
exactly; I did remove the internal properties for Bush because you annotate
<http://example.com/sourcedoc.html> and not the snippet and, I must admit, I
was not sure how that 'cite' would translate into OA (I am not sure it can,
it may need some additional properties). I was also not sure whether the
tagging is properly mapped onto the OA. With that, I believe the snippet
below is more-or-less correct:

    <aside vocab="http://www.w3.org/ns/oa#" typeof="Annotation">
      <p>
        <a property="annotatedBy" href="http://example.com/people/shepazu"
typeof="foaf:Person">
           <span property="foaf:name">Shepazu</span>
         </a>
      </p>
      <time property="annotatedAt" datetime="2014-01-14T01:28:22-0500">
        <a href="http://example.com/annotations/shepazu-1389680902"
           title="1:28 AM - 14 Jan 2014">A few minutes ago</a>
      </time>

      <blockquote property="hasTarget"
                  resource="http://example.com/sourcedoc.html"
                  cite="http://example.com/sourcedoc.html"
                  data-prefix="essential feature of the memex. "
                  data-suffix=" When the user is building a tra" typeof="">
        <p>The process of tying two items together is the important
thing.</p>
        <footer>
          - <cite>
                 <a href="http://en.wikipedia.org/wiki/Vannevar_Bush">
                    <span>Vannevar Bush</span>
                 </a>
            </cite>
        </footer>
      </blockquote>
      <p property="hasBody" typeof=""><span property="rdf:value">Annotations
are at the Web's core.</span></p>
      <ul property="hasBody" typeof="SemanticTag">
         <li property="rdf:value">annotations</li>
         <li property="rdf:value">web</li>
         <li property="rdf:value">standards</li>
      </ul>
    </aside>

There are some quirks, because I tried to keep it within RDFa Lite (mainly
the usage of @typeof=""). Also, RDFa+HTML5 does not understand the @cite
attribute in <blockquote>; it could be easily added to RDFa Lite, if there
is a great demand for it, but that would require some extra spec rounds.
Hence the @resource attribute that repeats the URI :-(

I believe the correct mapping to OA is to have two different bodies; one is
your remark, the other are the tags. (I have added the generated Turtle at
the end, where I have taken out some statements that an RDFa processor
generates into the resulting graph, but is irrelevant for us here.)

Yes, it is slightly more complex than your thing. (Note that, I believe,
mapping this to microdata would be even more complex; indeed, microdata does
not allow mixing different vocabularies, like I do here with OA and foaf and
rdf.) I am not sure which direction one should/could take in simplifying it.

But... I have also generated a JSON-LD code from the RDFa above, and then
simplified it (my JSON-LD knowledge is not perfect, but I have checked it by
a JSON-LD checker):

{
    "@context": "http://www.w3.org/ns/oa.json",
    "@type": "Annotation",
    "annotatedAt": "2014-01-14T01:28:22-0500",
    "annotatedBy": {
        "@id": "http://example.com/people/shepazu",
        "name": "Shepazu",
        "@type" : "Person"
    },
    "hasBody": [
        {
            "value" : "Annotations are at the Web's core."
        },
        {
            "@type": "SemanticTag",
            "value": [
                "web",
                "standards",
                "annotations"
            ]
        }
    ],
    "hasTarget": "http://example.com/sourcedoc.html"

}

with the supposition that the oa.json contains a lot of information on
mapping the data to RDF that can be hidden from the end user, like the fact
that 'value' or 'Person' are terms from another vocabulary (RDF and FOAF,
respectively). In this sense, JSON-LD is more flexible than RDFa. For a JSON
user the only slightly unusual thing is the usage of the "@" character. The
"@context" can also be omitted for those who do not want to care about RDF;
actually, if used on the Web, the context can also be transferred through an
HTTP header.

I actually find the JSON-LD the simplest. And I begin to wonder whether we
really have annotations themselves marked up in HTML, or, more exactly,
whether that is a major use case. I have the impression that annotations are
built up through user interactions and are stored somewhere, and the storage
would not necessarily happen in HTML but, rather, in JSON (e.g., in a JSON
database, or something like that).

(Note that it is also possible to embed a JSON(-LD) snippet into an HTML
file[1]. This is an approach that the schema.org people have also done for
some of their clients[2].)

Ivan

[1] http://www.w3.org/TR/json-ld/#embedding-json-ld-in-html-documents
[2] http://blog.schema.org/2013/06/schemaorg-and-json-ld.html


P.S. Here is the turtle:

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix oa: <http://www.w3.org/ns/oa#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

<http://example.com/people/shepazu> a foaf:Person ;
    foaf:name "Shepazu" .

[] a oa:Annotation ;
    oa:annotatedAt "2014-01-14T01:28:22-0500" ;
    oa:annotatedBy <http://example.com/people/shepazu> ;
    oa:hasBody 
        [ rdf:value "Annotations are at the Web's core." ],
        [ a oa:SemanticTag ;
            rdf:value "annotations", "standards", "web" 
        ] ;
    oa:hasTarget <http://example.com/sourcedoc.html> .


On 19 Jan 2014, at 24:29 , Doug Schepers <schepers@w3.org> wrote:

> Hi, folks-
> 
> The work this group has done so far is excellent. I think the data model
is really solid. I'd like to see it applied broadly, not just for
annotations proper, but also for comments, footnotes, bookmarks, and other
similar things along the same lines.
> 
> And I'd like annotations to be supported by browsers natively; I think
that would dramatically increase their usage and usability.
> 
> To that end, I'd like to introduce a few topics that I think can build on
the data model, and couch it in terms that the average web developer can
easily understand and apply, and which browser vendors might get behind.
> 
> The first of these is some suggestions on different serializations, for
those who aren't interested in the RDF aspects (yes, hard to believe, but
such people do exist!).
> 
> Here's a (terrible, almost certainly incorrect) strawman for an HTML
serialization of an annotation (consider it the bastard child of
OpenAnnotation and Twitter):
> 
> <aside vocab="http://www.w3.org/ns/oa#">
>   <p>
>     <a property="annotatedBy"
>         href="http://example.com/people/shepazu"
>         typeof="Person">
>        <span property="name">Shepazu</span>
>      </a>
>   </p>
> 
>   <time property="annotatedAt" datetime="2014-01-14T01:28:22-0500">
>     <a href="http://example.com/annotations/shepazu-1389680902"
>        title="1:28 AM - 14 Jan 2014">A few minutes ago</a>
>   </time>
> 
>   <blockquote property="hasTarget"
>               cite="http://example.com/sourcedoc.html"
>               data-prefix="essential feature of the memex. "
>               data-suffix=" When the user is building a tra">
>     <p>The process of tying two items together is the important thing.</p>
>     <footer>
>       - <cite>
> 	         <a href="http://en.wikipedia.org/wiki/Vannevar_Bush"
> 	            typeof="Person">
>             <span property="name">Vannevar Bush</span>
> 	         </a>
>         </cite>
>     </footer>
>   </blockquote>
> 
>   <p property="hasBody">Annotations are at the Web's core.</p>
> 
>    <ul>
>      <li property="tag">annotations</li>
>      <li property="tag">web</li>
>      <li property="tag">standards</li>
>    </ul>
> </aside>
> 
> 
> Another serialization could be in very lightweight JSON, for sockets
interchange.
> 
> All of these serializations should be defined in such a way that they are
losslessly transformable into any of the other serializations; any missing
data (for example, values omitted for brevity) should have default (or
lacunae) values that are populated for other serializations that might need
them, such as RDF.
> 
> Thoughts?
> 
> 
> Regards-
> -Doug
> 


----
Ivan Herman, W3C 
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
GPG: 0x343F1A3D
FOAF: http://www.ivan-herman.net/foaf

Received on Sunday, 19 January 2014 17:34:59 UTC