Re: Annotation Serializations

On 20 Jan 2014, at 04:30 , Doug Schepers <schepers@w3.org> wrote:

> Hi, Ivan–
> 
> Having settled the use-case justification, I'd like to follow up on a
> couple more points from this message, inline...

... with my inline comments. 

> 
> On 1/19/14 8:25 AM, Ivan Herman wrote:
>> Hi Doug, everybody,
>> 
>> I try to understand what you mean... Are we talking about some sort
>> of a family of use case templates? Or a formal and thorough
>> serialization specification in HTML, ie, some sort of a specialized
>> RDFa? The latter may be quite a lot of work... (having gone through
>> the RDFa exercise myself). A template library could probably be done
>> more easily; for RDF usage one could then make some sort of a
>> preprocessor to RDFa, and then let the existing RDFa processors take
>> over.
> 
> I'm not really sure, to be honest... my inclination is to say we should have a formal serialization, though I'd defer to what browsers or other implementers would say.
> 
> I actually do think there's a case to be made for a <note> element, with an API, to be used as the root with a specific content model transformable into the OA data model.
> 
> For example, I could see a <note> element being used for footnotes (similar to how Wikipedia treats them); in that case, there would be two different anchoring schemes available:
> 
> 1) Classic annotations would include a robust anchor selector in the annotation itself, since a common use case is that the annotator doesn't control the document and can't insert a permanent link into it.
> 
> 2) Footnotes, where the document owner usually does have change control over the document source, the link could be in the document source rather than in the body of the annotation; this would also allow the author to add multiple links in the document body to the same footnote.
> 
> With a standard format for both, you could point at and extract the annotation with either scenario, though each has its own distinct adaption.

O.k. What I was reacting is to have some sort of a mapping of OA into the generic HTML world. The issue would be (and it is the RDFa experience talking) that, though the fundamental approach would look simple, mapping to the huge amount of different combinations of HTML elements, having a clear spec to all different setups is what complicates things big time.

The approach of a dedicated element (say, <note>) with a very specific and not-too-complex content model may work. Of course, the problem may be how to define that content model; after all, a user might expect to use the full power of the HTML content elements within a <note>, so much of the definition should also specify what happens if that is done (essentially, what is ignored). But we can probably make the content model fairly restrictive, which may make things manageable (see also my comment below). I am still a bit worried about the hidden complexities, but it may be worth a try.

> 
>> I looked at your example, and, for the purpose of the discussion, I
>> did re-cast it into RDFa Lite. I *think* it is what you meant but
>> probably not exactly; I did remove the internal properties for Bush
>> because you annotate <http://example.com/sourcedoc.html> and not the
>> snippet and, I must admit, I was not sure how that 'cite' would
>> translate into OA (I am not sure it can, it may need some additional
>> properties).
> 
> Yeah, actually, unless I'm missing something, I think there should be some way in the OA model to indicate the author(s) of a quote. This would be most useful when the annotation is being viewed as a document itself, or when the source document is not actually available on the Web (behind a paywall, in an ebook or paper book, spoken during a non-recorded or time-delayed presentation, or what have you), but the annotator still wants to attribute it as much as possible (think of tweets about conference presentations which contain quotes and a link to the speaker's twitter id).
> 

I let our OA experts chime in on that one...

> 
>> I was also not sure whether the tagging is properly
>> mapped onto the OA. With that, I believe the snippet below is
>> more-or-less correct:
>> 
>> <aside vocab="http://www.w3.org/ns/oa#" typeof="Annotation"> <p> <a
>> property="annotatedBy" href="http://example.com/people/shepazu"
>> typeof="foaf:Person"> <span property="foaf:name">Shepazu</span> </a>
>> </p> <time property="annotatedAt"
>> datetime="2014-01-14T01:28:22-0500"> <a
>> href="http://example.com/annotations/shepazu-1389680902" title="1:28
>> AM - 14 Jan 2014">A few minutes ago</a> </time>
>> 
>> <blockquote property="hasTarget"
>> resource="http://example.com/sourcedoc.html"
>> cite="http://example.com/sourcedoc.html" data-prefix="essential
>> feature of the memex. " data-suffix=" When the user is building a
>> tra" typeof=""> <p>The process of tying two items together is the
>> important thing.</p> <footer> - <cite> <a
>> href="http://en.wikipedia.org/wiki/Vannevar_Bush"> <span>Vannevar
>> Bush</span> </a> </cite> </footer> </blockquote> <p
>> property="hasBody" typeof=""><span property="rdf:value">Annotations
>> are at the Web's core.</span></p> <ul property="hasBody"
>> typeof="SemanticTag"> <li property="rdf:value">annotations</li> <li
>> property="rdf:value">web</li> <li
>> property="rdf:value">standards</li> </ul> </aside>
>> 
>> There are some quirks, because I tried to keep it within RDFa Lite
>> (mainly the usage of @typeof=""). Also, RDFa+HTML5 does not
>> understand the @cite attribute in <blockquote>; it could be easily
>> added to RDFa Lite, if there is a great demand for it, but that would
>> require some extra spec rounds. Hence the @resource attribute that
>> repeats the URI :-(
>> 
>> I believe the correct mapping to OA is to have two different bodies;
>> one is your remark, the other are the tags. (I have added the
>> generated Turtle at the end, where I have taken out some statements
>> that an RDFa processor generates into the resulting graph, but is
>> irrelevant for us here.)
>> 
>> Yes, it is slightly more complex than your thing. (Note that, I
>> believe, mapping this to microdata would be even more complex;
>> indeed, microdata does not allow mixing different vocabularies, like
>> I do here with OA and foaf and rdf.) I am not sure which direction
>> one should/could take in simplifying it.
> 
> To be honest, I think there is more to be gained by simplifying the HTML serialization than to make it expressible within the generalized RDFa/RDFa-Lite syntax. (I'm sure I'm going to be raked over the coals for saying this, but…)
> 
> For example, if we mapped specific HTML5 semantic elements to OA-model equivalents, we could decompose conforming HTML into the OA model losslessly, while not imposing much overhead on the generator.
> 
> Here's a radically simpler strawman:
> 
> <note>
>   <p>
>     <a property="annotatedBy" href="http://example.com/people/shepazu">Shepazu</a>
>   </p>
>   <time property="annotatedAt" datetime="2014-01-14T01:28:22-0500">
>     <a href="http://example.com/annotations/shepazu-1389680902"
>        title="1:28 AM - 14 Jan 2014">A few minutes ago</a>
>   </time>
> 
>   <blockquote cite="http://example.com/sourcedoc.html"
>               data-prefix="essential feature of the memex. "
>               data-suffix=" When the user is building a tra">
>     <p>The process of tying two items together is the important thing.</p>
>     <footer>
>       - <cite>
>           <a href="http://en.wikipedia.org/wiki/Vannevar_Bush">Vannevar Bush</a>
>         </cite>
>     </footer>
>   </blockquote>
>   <p>Annotations are at the Web's core.</p>
>   <ul>
>      <li property="tag">annotations</li>
>      <li property="tag">web</li>
>      <li property="tag">standards</li>
>   </ul>
> </note>
> 
> And here's a legend:
> 
> <note> = oa:Annotation
> <a property="annotatedBy"> = oa:annotatedBy
> <time property="annotatedAt"> = oa:annotatedAt
> <blockquote cite="..."> = oa:hasTarget
> <li property="tag"> = oa:Tag
> (<p>) = oa:hasBody
> 
> 
> Each element has a unique semantic role, with the exception of the body of the annotation, which is everything that's not otherwise assigned a role. Semantics are inferred, but no less precise for lack of explicit property assignments.
> 
> I know this couldn't be extracted with a general RDF processor; there would need to be an intermediate processor that understands a "Web Note" (to coin a friendly buzzword) and converts it into RDF (or JSON, or whatever).

Right; and that is fine as far as I am concerned. For the purpose of RDF I may regard this as some sort of a pre-processor to RDFa and let an RDFa processor take over (e.g, that means an annotation may smoothly integrate with other semantic information in the same content elsewhere), but that is a detail and my personal bias.

> 
> But what we stand to gain is much more content, because this would be a simple model for developers to pick up and use, and use correctly; by contrast, I doubt I could consistently recreate from memory the RDFa that you and I generated, or teach it to someone else. But having a few simple rules (e.g. "use a <note> element as the container", "add a tag @property for tags", "use <blockquote> to quote the source", "use @property to indicate who wrote the annotation and when"), developers could do it right nearly every time.
> 
> I'm not knocking the idea of a generalized mechanism like RDFa; you can't create a specialized vocabulary and semantic element set for every possible feature of the Web; there are just too many possibilities! But I happen to think that annotations (and their kissing cousins, comments and footnotes) are important enough and common enough that they deserve to be first-class concepts.

Actually, I do not think the two are exclusive either. There has to be made a thorough analysis of the OA model, on how it maps on RDFa (or JSON-LD, but that is less of an issue, probably), and distill the best and most widespread use cases for the purpose of that specialized <note> element. Ie, I am not sure it is worth trying to put _the whole_ of OA into <note>; let us try a 80/20 cut. The long tail part can still be encoded in RDFa, although would require specialized knowledge; but that is the nature of long tail and is left for experts. In other words, we may have to define some sort of an OA profile...

> 
> (As an aside, there were many people who wanted a <comment> element in HTML5; but the spec says that comments should use the <article> element; I think this would be an improvement on that use case.)
> 
> Okay, everyone, now please give me a moment to put on my helmet before you commence to throwing heavy and/or sharp objects at me... :P

:-)

The heavy and sharp objects will not come from my direction. Am more concerned about the discussion it will take to accept a <note> element into HTML5.{1,2,...} by the HTML5 WG.

> 
> 
>> But... I have also generated a JSON-LD code from the RDFa above, and
>> then simplified it (my JSON-LD knowledge is not perfect, but I have
>> checked it by a JSON-LD checker):
>> 
>> { "@context": "http://www.w3.org/ns/oa.json", "@type": "Annotation",
>> "annotatedAt": "2014-01-14T01:28:22-0500", "annotatedBy": { "@id":
>> "http://example.com/people/shepazu", "name": "Shepazu", "@type" :
>> "Person" }, "hasBody": [ { "value" : "Annotations are at the Web's
>> core." }, { "@type": "SemanticTag", "value": [ "web", "standards",
>> "annotations" ] } ], "hasTarget":
>> "http://example.com/sourcedoc.html"
>> 
>> }
>> 
>> with the supposition that the oa.json contains a lot of information
>> on mapping the data to RDF that can be hidden from the end user, like
>> the fact that 'value' or 'Person' are terms from another vocabulary
>> (RDF and FOAF, respectively). In this sense, JSON-LD is more flexible
>> than RDFa. For a JSON user the only slightly unusual thing is the
>> usage of the "@" character. The "@context" can also be omitted for
>> those who do not want to care about RDF; actually, if used on the
>> Web, the context can also be transferred through an HTTP header.
>> 
>> I actually find the JSON-LD the simplest. And I begin to wonder
>> whether we really have annotations themselves marked up in HTML, or,
>> more exactly, whether that is a major use case. I have the impression
>> that annotations are built up through user interactions and are
>> stored somewhere, and the storage would not necessarily happen in
>> HTML but, rather, in JSON (e.g., in a JSON database, or something
>> like that).
> 
> As I mentioned elsewhere, I totally agree that JSON is a major use case for transfer or storage (though less so than HTML for display).
> 
> 
>> (Note that it is also possible to embed a JSON(-LD) snippet into an
>> HTML file[1]. This is an approach that the schema.org people have
>> also done for some of their clients[2].)
> 
> Yeah, but that's not much of a display format, as I suspect you'll agree (obviously, since <script>s don't display).

:-)

Cheers

Ivan

> It could be consumed by a web crawler, but my intuition is that the cases where an annotation is going to be redundantly present twice in a document (once as data, once for display) are going to be rare.
> 
> 
> I don't want to scare anyone here! I'm just tossing these ideas out for consideration, and I'm interested in everyone's thoughts about this. I'm new to this community, so I've got a lot to learn, and I hope something to offer.
> 
> Regards-
> -Doug
> 
>> Ivan
>> 
>> [1]
>> http://www.w3.org/TR/json-ld/#embedding-json-ld-in-html-documents [2]
>> http://blog.schema.org/2013/06/schemaorg-and-json-ld.html
>> 
>> 
>> P.S. Here is the turtle:
>> 
>> @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix oa:
>> <http://www.w3.org/ns/oa#> . @prefix rdf:
>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
>> 
>> <http://example.com/people/shepazu> a foaf:Person ; foaf:name
>> "Shepazu" .
>> 
>> [] a oa:Annotation ; oa:annotatedAt "2014-01-14T01:28:22-0500" ;
>> oa:annotatedBy <http://example.com/people/shepazu> ; oa:hasBody [
>> rdf:value "Annotations are at the Web's core." ], [ a oa:SemanticTag
>> ; rdf:value "annotations", "standards", "web" ] ; oa:hasTarget
>> <http://example.com/sourcedoc.html> .
>> 
>> 
>> On 19 Jan 2014, at 24:29 , Doug Schepers <schepers@w3.org> wrote:
>> 
>>> Hi, folks–
>>> 
>>> The work this group has done so far is excellent. I think the data
>>> model is really solid. I'd like to see it applied broadly, not just
>>> for annotations proper, but also for comments, footnotes,
>>> bookmarks, and other similar things along the same lines.
>>> 
>>> And I'd like annotations to be supported by browsers natively; I
>>> think that would dramatically increase their usage and usability.
>>> 
>>> To that end, I'd like to introduce a few topics that I think can
>>> build on the data model, and couch it in terms that the average web
>>> developer can easily understand and apply, and which browser
>>> vendors might get behind.
>>> 
>>> The first of these is some suggestions on different serializations,
>>> for those who aren't interested in the RDF aspects (yes, hard to
>>> believe, but such people do exist!).
>>> 
>>> Here's a (terrible, almost certainly incorrect) strawman for an
>>> HTML serialization of an annotation (consider it the bastard child
>>> of OpenAnnotation and Twitter):
>>> 
>>> <aside vocab="http://www.w3.org/ns/oa#"> <p> <a
>>> property="annotatedBy" href="http://example.com/people/shepazu"
>>> typeof="Person"> <span property="name">Shepazu</span> </a> </p>
>>> 
>>> <time property="annotatedAt" datetime="2014-01-14T01:28:22-0500">
>>> <a href="http://example.com/annotations/shepazu-1389680902"
>>> title="1:28 AM - 14 Jan 2014">A few minutes ago</a> </time>
>>> 
>>> <blockquote property="hasTarget"
>>> cite="http://example.com/sourcedoc.html" data-prefix="essential
>>> feature of the memex. " data-suffix=" When the user is building a
>>> tra"> <p>The process of tying two items together is the important
>>> thing.</p> <footer> – <cite> <a
>>> href="http://en.wikipedia.org/wiki/Vannevar_Bush" typeof="Person">
>>> <span property="name">Vannevar Bush</span> </a> </cite> </footer>
>>> </blockquote>
>>> 
>>> <p property="hasBody">Annotations are at the Web’s core.</p>
>>> 
>>> <ul> <li property="tag">annotations</li> <li
>>> property="tag">web</li> <li property="tag">standards</li> </ul>
>>> </aside>
>>> 
>>> 
>>> Another serialization could be in very lightweight JSON, for
>>> sockets interchange.
>>> 
>>> All of these serializations should be defined in such a way that
>>> they are losslessly transformable into any of the other
>>> serializations; any missing data (for example, values omitted for
>>> brevity) should have default (or lacunae) values that are populated
>>> for other serializations that might need them, such as RDF.
>>> 
>>> Thoughts?
>>> 
>>> 
>>> Regards- -Doug
>>> 
>> 
>> 
>> ---- Ivan Herman, W3C Digital Publishing Activity Lead Home:
>> http://www.w3.org/People/Ivan/ mobile: +31-641044153 GPG: 0x343F1A3D
>> FOAF: http://www.ivan-herman.net/foaf
>> 
>> 
>> 
>> 
>> 
> 
> 


----
Ivan Herman, W3C 
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
GPG: 0x343F1A3D
FOAF: http://www.ivan-herman.net/foaf

Received on Monday, 20 January 2014 09:53:44 UTC