Re: Annotation Serializations from Doug Schepers on 2014-01-20 (public-openannotation@w3.org from January 2014)

From: Doug Schepers <schepers@w3.org>
Date: Sun, 19 Jan 2014 22:30:16 -0500
To: Ivan Herman <ivan@w3.org>
CC: public-openannotation <public-openannotation@w3.org>
Message-ID: <52DC9848.8000303@w3.org>
Hi, Ivan–

Having settled the use-case justification, I'd like to follow up on a
couple more points from this message, inline...

On 1/19/14 8:25 AM, Ivan Herman wrote:
> Hi Doug, everybody,
>
> I try to understand what you mean... Are we talking about some sort
> of a family of use case templates? Or a formal and thorough
> serialization specification in HTML, ie, some sort of a specialized
> RDFa? The latter may be quite a lot of work... (having gone through
> the RDFa exercise myself). A template library could probably be done
> more easily; for RDF usage one could then make some sort of a
> preprocessor to RDFa, and then let the existing RDFa processors take
> over.

I'm not really sure, to be honest... my inclination is to say we should 
have a formal serialization, though I'd defer to what browsers or other 
implementers would say.

I actually do think there's a case to be made for a <note> element, with 
an API, to be used as the root with a specific content model 
transformable into the OA data model.

For example, I could see a <note> element being used for footnotes 
(similar to how Wikipedia treats them); in that case, there would be two 
different anchoring schemes available:

1) Classic annotations would include a robust anchor selector in the 
annotation itself, since a common use case is that the annotator doesn't 
control the document and can't insert a permanent link into it.

2) Footnotes, where the document owner usually does have change control 
over the document source, the link could be in the document source 
rather than in the body of the annotation; this would also allow the 
author to add multiple links in the document body to the same footnote.

With a standard format for both, you could point at and extract the 
annotation with either scenario, though each has its own distinct adaption.

> I looked at your example, and, for the purpose of the discussion, I
> did re-cast it into RDFa Lite. I *think* it is what you meant but
> probably not exactly; I did remove the internal properties for Bush
> because you annotate <http://example.com/sourcedoc.html> and not the
> snippet and, I must admit, I was not sure how that 'cite' would
> translate into OA (I am not sure it can, it may need some additional
> properties).

Yeah, actually, unless I'm missing something, I think there should be 
some way in the OA model to indicate the author(s) of a quote. This 
would be most useful when the annotation is being viewed as a document 
itself, or when the source document is not actually available on the Web 
(behind a paywall, in an ebook or paper book, spoken during a 
non-recorded or time-delayed presentation, or what have you), but the 
annotator still wants to attribute it as much as possible (think of 
tweets about conference presentations which contain quotes and a link to 
the speaker's twitter id).


> I was also not sure whether the tagging is properly
> mapped onto the OA. With that, I believe the snippet below is
> more-or-less correct:
>
> <aside vocab="http://www.w3.org/ns/oa#" typeof="Annotation"> <p> <a
> property="annotatedBy" href="http://example.com/people/shepazu"
> typeof="foaf:Person"> <span property="foaf:name">Shepazu</span> </a>
> </p> <time property="annotatedAt"
> datetime="2014-01-14T01:28:22-0500"> <a
> href="http://example.com/annotations/shepazu-1389680902" title="1:28
> AM - 14 Jan 2014">A few minutes ago</a> </time>
>
> <blockquote property="hasTarget"
> resource="http://example.com/sourcedoc.html"
> cite="http://example.com/sourcedoc.html" data-prefix="essential
> feature of the memex. " data-suffix=" When the user is building a
> tra" typeof=""> <p>The process of tying two items together is the
> important thing.</p> <footer> - <cite> <a
> href="http://en.wikipedia.org/wiki/Vannevar_Bush"> <span>Vannevar
> Bush</span> </a> </cite> </footer> </blockquote> <p
> property="hasBody" typeof=""><span property="rdf:value">Annotations
> are at the Web's core.</span></p> <ul property="hasBody"
> typeof="SemanticTag"> <li property="rdf:value">annotations</li> <li
> property="rdf:value">web</li> <li
> property="rdf:value">standards</li> </ul> </aside>
>
> There are some quirks, because I tried to keep it within RDFa Lite
> (mainly the usage of @typeof=""). Also, RDFa+HTML5 does not
> understand the @cite attribute in <blockquote>; it could be easily
> added to RDFa Lite, if there is a great demand for it, but that would
> require some extra spec rounds. Hence the @resource attribute that
> repeats the URI :-(
>
> I believe the correct mapping to OA is to have two different bodies;
> one is your remark, the other are the tags. (I have added the
> generated Turtle at the end, where I have taken out some statements
> that an RDFa processor generates into the resulting graph, but is
> irrelevant for us here.)
>
> Yes, it is slightly more complex than your thing. (Note that, I
> believe, mapping this to microdata would be even more complex;
> indeed, microdata does not allow mixing different vocabularies, like
> I do here with OA and foaf and rdf.) I am not sure which direction
> one should/could take in simplifying it.

To be honest, I think there is more to be gained by simplifying the HTML 
serialization than to make it expressible within the generalized 
RDFa/RDFa-Lite syntax. (I'm sure I'm going to be raked over the coals 
for saying this, but…)

For example, if we mapped specific HTML5 semantic elements to OA-model 
equivalents, we could decompose conforming HTML into the OA model 
losslessly, while not imposing much overhead on the generator.

Here's a radically simpler strawman:

  <note>
    <p>
      <a property="annotatedBy" 
href="http://example.com/people/shepazu">Shepazu</a>
    </p>
    <time property="annotatedAt" datetime="2014-01-14T01:28:22-0500">
      <a href="http://example.com/annotations/shepazu-1389680902"
         title="1:28 AM - 14 Jan 2014">A few minutes ago</a>
    </time>

    <blockquote cite="http://example.com/sourcedoc.html"
                data-prefix="essential feature of the memex. "
                data-suffix=" When the user is building a tra">
      <p>The process of tying two items together is the important thing.</p>
      <footer>
        - <cite>
            <a 
href="http://en.wikipedia.org/wiki/Vannevar_Bush">Vannevar Bush</a>
          </cite>
      </footer>
    </blockquote>
    <p>Annotations are at the Web's core.</p>
    <ul>
       <li property="tag">annotations</li>
       <li property="tag">web</li>
       <li property="tag">standards</li>
    </ul>
  </note>

And here's a legend:

<note> = oa:Annotation
<a property="annotatedBy"> = oa:annotatedBy
<time property="annotatedAt"> = oa:annotatedAt
<blockquote cite="..."> = oa:hasTarget
<li property="tag"> = oa:Tag
(<p>) = oa:hasBody


Each element has a unique semantic role, with the exception of the body 
of the annotation, which is everything that's not otherwise assigned a 
role. Semantics are inferred, but no less precise for lack of explicit 
property assignments.

I know this couldn't be extracted with a general RDF processor; there 
would need to be an intermediate processor that understands a "Web Note" 
(to coin a friendly buzzword) and converts it into RDF (or JSON, or 
whatever).

But what we stand to gain is much more content, because this would be a 
simple model for developers to pick up and use, and use correctly; by 
contrast, I doubt I could consistently recreate from memory the RDFa 
that you and I generated, or teach it to someone else. But having a few 
simple rules (e.g. "use a <note> element as the container", "add a tag 
@property for tags", "use <blockquote> to quote the source", "use 
@property to indicate who wrote the annotation and when"), developers 
could do it right nearly every time.

I'm not knocking the idea of a generalized mechanism like RDFa; you 
can't create a specialized vocabulary and semantic element set for every 
possible feature of the Web; there are just too many possibilities! But 
I happen to think that annotations (and their kissing cousins, comments 
and footnotes) are important enough and common enough that they deserve 
to be first-class concepts.

(As an aside, there were many people who wanted a <comment> element in 
HTML5; but the spec says that comments should use the <article> element; 
I think this would be an improvement on that use case.)

Okay, everyone, now please give me a moment to put on my helmet before 
you commence to throwing heavy and/or sharp objects at me... :P


> But... I have also generated a JSON-LD code from the RDFa above, and
> then simplified it (my JSON-LD knowledge is not perfect, but I have
> checked it by a JSON-LD checker):
>
> { "@context": "http://www.w3.org/ns/oa.json", "@type": "Annotation",
> "annotatedAt": "2014-01-14T01:28:22-0500", "annotatedBy": { "@id":
> "http://example.com/people/shepazu", "name": "Shepazu", "@type" :
> "Person" }, "hasBody": [ { "value" : "Annotations are at the Web's
> core." }, { "@type": "SemanticTag", "value": [ "web", "standards",
> "annotations" ] } ], "hasTarget":
> "http://example.com/sourcedoc.html"
>
> }
>
> with the supposition that the oa.json contains a lot of information
> on mapping the data to RDF that can be hidden from the end user, like
> the fact that 'value' or 'Person' are terms from another vocabulary
> (RDF and FOAF, respectively). In this sense, JSON-LD is more flexible
> than RDFa. For a JSON user the only slightly unusual thing is the
> usage of the "@" character. The "@context" can also be omitted for
> those who do not want to care about RDF; actually, if used on the
> Web, the context can also be transferred through an HTTP header.
>
> I actually find the JSON-LD the simplest. And I begin to wonder
> whether we really have annotations themselves marked up in HTML, or,
> more exactly, whether that is a major use case. I have the impression
> that annotations are built up through user interactions and are
> stored somewhere, and the storage would not necessarily happen in
> HTML but, rather, in JSON (e.g., in a JSON database, or something
> like that).

As I mentioned elsewhere, I totally agree that JSON is a major use case 
for transfer or storage (though less so than HTML for display).


> (Note that it is also possible to embed a JSON(-LD) snippet into an
> HTML file[1]. This is an approach that the schema.org people have
> also done for some of their clients[2].)

Yeah, but that's not much of a display format, as I suspect you'll agree 
(obviously, since <script>s don't display). It could be consumed by a 
web crawler, but my intuition is that the cases where an annotation is 
going to be redundantly present twice in a document (once as data, once 
for display) are going to be rare.


I don't want to scare anyone here! I'm just tossing these ideas out for 
consideration, and I'm interested in everyone's thoughts about this. I'm 
new to this community, so I've got a lot to learn, and I hope something 
to offer.

Regards-
-Doug

> Ivan
>
> [1]
> http://www.w3.org/TR/json-ld/#embedding-json-ld-in-html-documents [2]
> http://blog.schema.org/2013/06/schemaorg-and-json-ld.html
>
>
> P.S. Here is the turtle:
>
> @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix oa:
> <http://www.w3.org/ns/oa#> . @prefix rdf:
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
>
> <http://example.com/people/shepazu> a foaf:Person ; foaf:name
> "Shepazu" .
>
> [] a oa:Annotation ; oa:annotatedAt "2014-01-14T01:28:22-0500" ;
> oa:annotatedBy <http://example.com/people/shepazu> ; oa:hasBody [
> rdf:value "Annotations are at the Web's core." ], [ a oa:SemanticTag
> ; rdf:value "annotations", "standards", "web" ] ; oa:hasTarget
> <http://example.com/sourcedoc.html> .
>
>
> On 19 Jan 2014, at 24:29 , Doug Schepers <schepers@w3.org> wrote:
>
>> Hi, folks–
>>
>> The work this group has done so far is excellent. I think the data
>> model is really solid. I'd like to see it applied broadly, not just
>> for annotations proper, but also for comments, footnotes,
>> bookmarks, and other similar things along the same lines.
>>
>> And I'd like annotations to be supported by browsers natively; I
>> think that would dramatically increase their usage and usability.
>>
>> To that end, I'd like to introduce a few topics that I think can
>> build on the data model, and couch it in terms that the average web
>> developer can easily understand and apply, and which browser
>> vendors might get behind.
>>
>> The first of these is some suggestions on different serializations,
>> for those who aren't interested in the RDF aspects (yes, hard to
>> believe, but such people do exist!).
>>
>> Here's a (terrible, almost certainly incorrect) strawman for an
>> HTML serialization of an annotation (consider it the bastard child
>> of OpenAnnotation and Twitter):
>>
>> <aside vocab="http://www.w3.org/ns/oa#"> <p> <a
>> property="annotatedBy" href="http://example.com/people/shepazu"
>> typeof="Person"> <span property="name">Shepazu</span> </a> </p>
>>
>> <time property="annotatedAt" datetime="2014-01-14T01:28:22-0500">
>> <a href="http://example.com/annotations/shepazu-1389680902"
>> title="1:28 AM - 14 Jan 2014">A few minutes ago</a> </time>
>>
>> <blockquote property="hasTarget"
>> cite="http://example.com/sourcedoc.html" data-prefix="essential
>> feature of the memex. " data-suffix=" When the user is building a
>> tra"> <p>The process of tying two items together is the important
>> thing.</p> <footer> – <cite> <a
>> href="http://en.wikipedia.org/wiki/Vannevar_Bush" typeof="Person">
>> <span property="name">Vannevar Bush</span> </a> </cite> </footer>
>> </blockquote>
>>
>> <p property="hasBody">Annotations are at the Web’s core.</p>
>>
>> <ul> <li property="tag">annotations</li> <li
>> property="tag">web</li> <li property="tag">standards</li> </ul>
>> </aside>
>>
>>
>> Another serialization could be in very lightweight JSON, for
>> sockets interchange.
>>
>> All of these serializations should be defined in such a way that
>> they are losslessly transformable into any of the other
>> serializations; any missing data (for example, values omitted for
>> brevity) should have default (or lacunae) values that are populated
>> for other serializations that might need them, such as RDF.
>>
>> Thoughts?
>>
>>
>> Regards- -Doug
>>
>
>
> ---- Ivan Herman, W3C Digital Publishing Activity Lead Home:
> http://www.w3.org/People/Ivan/ mobile: +31-641044153 GPG: 0x343F1A3D
> FOAF: http://www.ivan-herman.net/foaf
>
>
>
>
>
Received on Monday, 20 January 2014 03:30:26 UTC