Re: Annotation Serializations from Ivan Herman on 2014-01-19 (public-openannotation@w3.org from January 2014)

From: Ivan Herman <ivan@w3.org>
Date: Sun, 19 Jan 2014 20:39:17 +0100
To: Doug Schepers <schepers@w3.org>
Cc: "t-cole3@illinois.edu" <t-cole3@illinois.edu>, public-openannotation <public-openannotation@w3.org>
Message-Id: <F5D61451-2A32-4A88-A4F0-3A202E3BCD85@w3.org>
Ok. I accept these as proofs that an HTML based serialization fulfill a real demand. How would we do that is something that a possible WG will have to define/show; having some ideas jotted down on the wiki will be useful.

But I do not think we should disregard JSON either, I could see use cases for that, too. Eg, if the annotation cannot be attached to the core text (this is the way Diigo, as well as most of the ebook reading system, do it) but are rather stored outside the text (eg, on a server), then the simplicity of JSON, as well as its wide usage in different tools, becomes a big plus.

The beauty of OA is that it defines an abstract model, and the serialization is well separated. That is a major feature to embrace and showing/documenting different serializations is a major asset..

(Thanks to Doug for having started this...)

Cheers

Ivan

---
Ivan Herman
Tel:+31 641044153
http://www.ivan-herman.net

(Written on mobile, sorry for brevity and misspellings...)



> On 19 Jan 2014, at 18:51, Doug Schepers <schepers@w3.org> wrote:
> 
> Hi, Tim, Ivan–
> 
> Yeah, what Tim said.
> 
> But I also think there's a broader, more straightforward rationale:
> 
> For a vast number of use cases, possibly the majority, the consumer of an annotation will see it in HTML, in the form of footnotes, comments, or proper annotations (like the Annotator sidebar or stickynotes); in many of these cases, the annotation will even be stored as HTML, or a some intermediate format like markdown that is predictably transformable into HTML, rather than in some abstracted form like RDF or JSON or even normalized SQL. Many annotations will be only accessible as HTML, even through a service's API; think of Twitter, which will let you embed a tweet on a page by hitting their API and getting back the HTML snippet.
> 
> So, if we think it would be desirable to be able to consume, aggregate, and catalog those annotations, we will want to provide clear guidance for how developers of annotation-producing software (e.g., microblogs, commenting systems, annotation libraries, and so on) can make markup that can be explicitly consumed as conforming to the OA data model.
> 
> Regards-
> -Doug
> 
>> On 1/19/14 12:25 PM, Tim Cole wrote:
>> Ivan-
>> 
>> We haven't so far done a lot of work with Open Annotation in RDFa, but in
>> answer to your question about whether annotations themselves marked up in
>> HTML could be a significant use case, one class of use cases that created
>> some interest in RDFa for OA is the idea of blog entries as annotations.
>> 
>> So, for example, imagine that a Math Overflow user asks a question about a
>> proof appearing on page 17 of a 30-page article that appeared in a recent
>> issue of Journal of Algebra. Subsequently the author of this article issues
>> an errata concerning this proof which is posted to the journal publisher's
>> Website. Subsequent to that the Math Overflow question is answered with
>> reference to the substance of the errata.  Having these modeled as a chain
>> of annotations, albeit all embedded in HTML, might facilitating discovery
>> and use. Users of the publisher Website could more readily be made aware of
>> the Math Overflow blog entries. Users of Math Overflow would know about the
>> relevant data on the publisher Website -- not just the 30-page article, but
>> exactly where in the article the proof appeared. Ultimately, as you suggest,
>> Math Overflow, the publisher, or a 3rd party would likely want to store this
>> information somewhere as annotations. But especially at the outset or if the
>> ingest to the annotation store is done by a 3rd party after the HTML is
>> posted, having the appropriate OA serialized as RDFa and embedded in the
>> HTML might facilitate things.
>> 
>> This is not entirely idle speculation. I've been involved in some
>> discussions concerning next steps to follow on last decade's World Digital
>> Math Library initiative, and this exact scenario has come up as of interest
>> with a successor to the WDML in the role of 3rd party.
>> 
>> Tim Cole
>> University of Illinois at UC
>> 
>> 
>> -----Original Message-----
>> From: Ivan Herman [mailto:ivan@w3.org]
>> Sent: Sunday, January 19, 2014 7:26 AM
>> To: Doug Schepers
>> Cc: public-openannotation
>> Subject: Re: Annotation Serializations
>> 
>> Hi Doug, everybody,
>> 
>> I try to understand what you mean... Are we talking about some sort of a
>> family of use case templates? Or a formal and thorough serialization
>> specification in HTML, ie, some sort of a specialized RDFa? The latter may
>> be quite a lot of work... (having gone through the RDFa exercise myself). A
>> template library could probably be done more easily; for RDF usage one could
>> then make some sort of a preprocessor to RDFa, and then let the existing
>> RDFa processors take over.
>> 
>> I looked at your example, and, for the purpose of the discussion, I did
>> re-cast it into RDFa Lite. I *think* it is what you meant but probably not
>> exactly; I did remove the internal properties for Bush because you annotate
>> <http://example.com/sourcedoc.html> and not the snippet and, I must admit, I
>> was not sure how that 'cite' would translate into OA (I am not sure it can,
>> it may need some additional properties). I was also not sure whether the
>> tagging is properly mapped onto the OA. With that, I believe the snippet
>> below is more-or-less correct:
>> 
>>     <aside vocab="http://www.w3.org/ns/oa#" typeof="Annotation">
>>       <p>
>>         <a property="annotatedBy" href="http://example.com/people/shepazu"
>> typeof="foaf:Person">
>>            <span property="foaf:name">Shepazu</span>
>>          </a>
>>       </p>
>>       <time property="annotatedAt" datetime="2014-01-14T01:28:22-0500">
>>         <a href="http://example.com/annotations/shepazu-1389680902"
>>            title="1:28 AM - 14 Jan 2014">A few minutes ago</a>
>>       </time>
>> 
>>       <blockquote property="hasTarget"
>>                   resource="http://example.com/sourcedoc.html"
>>                   cite="http://example.com/sourcedoc.html"
>>                   data-prefix="essential feature of the memex. "
>>                   data-suffix=" When the user is building a tra" typeof="">
>>         <p>The process of tying two items together is the important
>> thing.</p>
>>         <footer>
>>           - <cite>
>>                  <a href="http://en.wikipedia.org/wiki/Vannevar_Bush">
>>                     <span>Vannevar Bush</span>
>>                  </a>
>>             </cite>
>>         </footer>
>>       </blockquote>
>>       <p property="hasBody" typeof=""><span property="rdf:value">Annotations
>> are at the Web's core.</span></p>
>>       <ul property="hasBody" typeof="SemanticTag">
>>          <li property="rdf:value">annotations</li>
>>          <li property="rdf:value">web</li>
>>          <li property="rdf:value">standards</li>
>>       </ul>
>>     </aside>
>> 
>> There are some quirks, because I tried to keep it within RDFa Lite (mainly
>> the usage of @typeof=""). Also, RDFa+HTML5 does not understand the @cite
>> attribute in <blockquote>; it could be easily added to RDFa Lite, if there
>> is a great demand for it, but that would require some extra spec rounds.
>> Hence the @resource attribute that repeats the URI :-(
>> 
>> I believe the correct mapping to OA is to have two different bodies; one is
>> your remark, the other are the tags. (I have added the generated Turtle at
>> the end, where I have taken out some statements that an RDFa processor
>> generates into the resulting graph, but is irrelevant for us here.)
>> 
>> Yes, it is slightly more complex than your thing. (Note that, I believe,
>> mapping this to microdata would be even more complex; indeed, microdata does
>> not allow mixing different vocabularies, like I do here with OA and foaf and
>> rdf.) I am not sure which direction one should/could take in simplifying it.
>> 
>> But... I have also generated a JSON-LD code from the RDFa above, and then
>> simplified it (my JSON-LD knowledge is not perfect, but I have checked it by
>> a JSON-LD checker):
>> 
>> {
>>     "@context": "http://www.w3.org/ns/oa.json",
>>     "@type": "Annotation",
>>     "annotatedAt": "2014-01-14T01:28:22-0500",
>>     "annotatedBy": {
>>         "@id": "http://example.com/people/shepazu",
>>         "name": "Shepazu",
>>         "@type" : "Person"
>>     },
>>     "hasBody": [
>>         {
>>             "value" : "Annotations are at the Web's core."
>>         },
>>         {
>>             "@type": "SemanticTag",
>>             "value": [
>>                 "web",
>>                 "standards",
>>                 "annotations"
>>             ]
>>         }
>>     ],
>>     "hasTarget": "http://example.com/sourcedoc.html"
>> 
>> }
>> 
>> with the supposition that the oa.json contains a lot of information on
>> mapping the data to RDF that can be hidden from the end user, like the fact
>> that 'value' or 'Person' are terms from another vocabulary (RDF and FOAF,
>> respectively). In this sense, JSON-LD is more flexible than RDFa. For a JSON
>> user the only slightly unusual thing is the usage of the "@" character. The
>> "@context" can also be omitted for those who do not want to care about RDF;
>> actually, if used on the Web, the context can also be transferred through an
>> HTTP header.
>> 
>> I actually find the JSON-LD the simplest. And I begin to wonder whether we
>> really have annotations themselves marked up in HTML, or, more exactly,
>> whether that is a major use case. I have the impression that annotations are
>> built up through user interactions and are stored somewhere, and the storage
>> would not necessarily happen in HTML but, rather, in JSON (e.g., in a JSON
>> database, or something like that).
>> 
>> (Note that it is also possible to embed a JSON(-LD) snippet into an HTML
>> file[1]. This is an approach that the schema.org people have also done for
>> some of their clients[2].)
>> 
>> Ivan
>> 
>> [1] http://www.w3.org/TR/json-ld/#embedding-json-ld-in-html-documents
>> [2] http://blog.schema.org/2013/06/schemaorg-and-json-ld.html
>> 
>> 
>> P.S. Here is the turtle:
>> 
>> @prefix foaf: <http://xmlns.com/foaf/0.1/> .
>> @prefix oa: <http://www.w3.org/ns/oa#> .
>> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
>> 
>> <http://example.com/people/shepazu> a foaf:Person ;
>>     foaf:name "Shepazu" .
>> 
>> [] a oa:Annotation ;
>>     oa:annotatedAt "2014-01-14T01:28:22-0500" ;
>>     oa:annotatedBy <http://example.com/people/shepazu> ;
>>     oa:hasBody
>>         [ rdf:value "Annotations are at the Web's core." ],
>>         [ a oa:SemanticTag ;
>>             rdf:value "annotations", "standards", "web"
>>         ] ;
>>     oa:hasTarget <http://example.com/sourcedoc.html> .
>> 
>> 
>>> On 19 Jan 2014, at 24:29 , Doug Schepers <schepers@w3.org> wrote:
>>> 
>>> Hi, folks-
>>> 
>>> The work this group has done so far is excellent. I think the data model
>> is really solid. I'd like to see it applied broadly, not just for
>> annotations proper, but also for comments, footnotes, bookmarks, and other
>> similar things along the same lines.
>>> 
>>> And I'd like annotations to be supported by browsers natively; I think
>> that would dramatically increase their usage and usability.
>>> 
>>> To that end, I'd like to introduce a few topics that I think can build on
>> the data model, and couch it in terms that the average web developer can
>> easily understand and apply, and which browser vendors might get behind.
>>> 
>>> The first of these is some suggestions on different serializations, for
>> those who aren't interested in the RDF aspects (yes, hard to believe, but
>> such people do exist!).
>>> 
>>> Here's a (terrible, almost certainly incorrect) strawman for an HTML
>> serialization of an annotation (consider it the bastard child of
>> OpenAnnotation and Twitter):
>>> 
>>> <aside vocab="http://www.w3.org/ns/oa#">
>>>   <p>
>>>     <a property="annotatedBy"
>>>         href="http://example.com/people/shepazu"
>>>         typeof="Person">
>>>        <span property="name">Shepazu</span>
>>>      </a>
>>>   </p>
>>> 
>>>   <time property="annotatedAt" datetime="2014-01-14T01:28:22-0500">
>>>     <a href="http://example.com/annotations/shepazu-1389680902"
>>>        title="1:28 AM - 14 Jan 2014">A few minutes ago</a>
>>>   </time>
>>> 
>>>   <blockquote property="hasTarget"
>>>               cite="http://example.com/sourcedoc.html"
>>>               data-prefix="essential feature of the memex. "
>>>               data-suffix=" When the user is building a tra">
>>>     <p>The process of tying two items together is the important thing.</p>
>>>     <footer>
>>>       - <cite>
>>>             <a href="http://en.wikipedia.org/wiki/Vannevar_Bush"
>>>                typeof="Person">
>>>             <span property="name">Vannevar Bush</span>
>>>             </a>
>>>         </cite>
>>>     </footer>
>>>   </blockquote>
>>> 
>>>   <p property="hasBody">Annotations are at the Web's core.</p>
>>> 
>>>    <ul>
>>>      <li property="tag">annotations</li>
>>>      <li property="tag">web</li>
>>>      <li property="tag">standards</li>
>>>    </ul>
>>> </aside>
>>> 
>>> 
>>> Another serialization could be in very lightweight JSON, for sockets
>> interchange.
>>> 
>>> All of these serializations should be defined in such a way that they are
>> losslessly transformable into any of the other serializations; any missing
>> data (for example, values omitted for brevity) should have default (or
>> lacunae) values that are populated for other serializations that might need
>> them, such as RDF.
>>> 
>>> Thoughts?
>>> 
>>> 
>>> Regards-
>>> -Doug
>> 
>> 
>> ----
>> Ivan Herman, W3C
>> Digital Publishing Activity Lead
>> Home: http://www.w3.org/People/Ivan/
>> mobile: +31-641044153
>> GPG: 0x343F1A3D
>> FOAF: http://www.ivan-herman.net/foaf
> 
> 
> 
>
Received on Sunday, 19 January 2014 19:39:49 UTC