Re: OA in HTML (was Annotation Serializations) from Tim Clark on 2014-01-22 (public-openannotation@w3.org from January 2014)

From: Tim Clark <tim_clark@harvard.edu>
Date: Tue, 21 Jan 2014 22:54:33 -0500
To: Bob Morris <morris.bob@gmail.com>
Cc: Paolo Ciccarese <paolo.ciccarese@gmail.com>, Ivan Herman <ivan@w3.org>, Robert Sanderson <azaroth42@gmail.com>, Doug Schepers <schepers@w3.org>, public-openannotation <public-openannotation@w3.org>
Message-Id: <27145EF1-3FB0-46FB-83A4-30814BCE1862@harvard.edu>
Hi Bob,

I'll respond to your second point about data citations, as I have been involved extensively in some of the data citation "pressure groups" including serving (with Ivan) on the Board of Directors of Force11 (http://force11.org), and participating (again with Ivan) in Force11's Data Citation Principles WG (http://force11.org/datacitation).  

This WG harmonized the prior work of a number of important stakeholder groups on this topic including CODATA, NISO, RDA, Library of Congress, NAS, DataCite, ICSU-WDS, publishers, etc. 

Basically if you read the Data Citation Principles as noted above, and they seem to be pretty broadly supported (http://www.force11.org/node/4785), you'll see that it proposes to bake data citations into the scientific article *as distributed by the publisher*, similarly to how ordinary publication citations are treated.   

This approach appears to be favored by the community because it require fewer changes to the ecosystem as a whole and treats data citations as part of the authoritative record of the publication.  So far as I am aware, Nature, Elsevier, PLOS, and several other smaller publishers, plan to implement this model.

There are still plenty of applications for OA annotations, however!  :-)  

Best

Tim Clark

PS I urge people involved in OA from the perspective of scientific publishing and data citation, to also become Force11 members. It's very easy, just go on the Force11 site, provide your contact and institutional info, and you're good to go.  



On Jan 21, 2014, at 4:33 PM, Bob Morris <morris.bob@gmail.com> wrote:

> +0.95
> I'm generally supportive of the directions in this thread but I find
> two related things borderline regrettable. (Rob and Paolo will find my
> position predictable.  :-)  )
> 
> (1.) The thread, and much writing nowadays, conflates "Semantic" with
> "Semantic Web"  and this often is taken to mean semantic applications
> against web documents, but web documents are not the only store of
> knowledge on the internet.
> 
> (2.) As we argue in [1], OA is almost, but not quite, suitable for
> annotating (some kinds of) data. Herein lies my second
> regret---hopefully misguided.  There is immense, and welcome, pressure
> by science funding agencies for authors to expose the data supporting
> a scientific publication. To me it looks like the DIGPUB IG, and maybe
> the Digital Publishing Activity in general, is not addressing the use
> cases that arise from this huge piece of digital publishing activity,
> namely scientific publishing in particular, but also other scholarly
> epubs, data-backed government and NGO epubs, etc.  Even if I'm right,
> the proposed approach by Rob is likely to lead to sufficient
> extensibility  that even developers with only modest html forms
> development skills could annotate the data behind, e.g. data
> visualizations. But if I'm wrong, and the Digital Publishing Activity
> already encompasses  data publication, then I hereby change my vote to
> +1.10.  Meanwhile, I am waaaatttching..... :-)
> 
> Bob Morris
> 
> [1]  Morris et al. "Semantic Annotation of Mutable Data", PLOS ONE,
> http://bit.ly/1bvMUPl
> 
> 
> Emeritus Professor  of Computer Science
> UMASS-Boston
> 100 Morrissey Blvd
> Boston, MA 02125-3390
> 
> 
> Filtered Push Project
> Harvard University Herbaria
> Harvard University
> 
> email: morris.bob@gmail.com
> web: http://efg.cs.umb.edu/
> web: http://wiki.filteredpush.org
> http://www.cs.umb.edu/~ram
> ===
> The content of this communication is made entirely on my
> own behalf and in no way should be deemed to express
> official positions of The University of Massachusetts at Boston or
> Harvard University.
> 
> 
> 
> On Tue, Jan 21, 2014 at 12:52 PM, Paolo Ciccarese
> <paolo.ciccarese@gmail.com> wrote:
>> I am available for working on the  RDFa.
>> 
>> Best,
>> Paolo
>> 
>> 
>> 
>> On Tue, Jan 21, 2014 at 12:04 PM, Ivan Herman <ivan@w3.org> wrote:
>>> 
>>> 
>>> On 21 Jan 2014, at 17:58 , Robert Sanderson <azaroth42@gmail.com> wrote:
>>> 
>>>> 
>>>> That sounds perfect to me :)  This is something that the WG would have
>>>> to do anyway (IMO), but if it's done first we save both time and avoid the
>>>> WG being front-loaded with semantic web style tasks, making it less likely
>>>> that non semantic web folk would participate... seems like a win/win to me
>>>> at least.
>>>> 
>>>> I'm happy to lead the JSON-LD refactor discussion, and contribute to the
>>>> RDFa discussion if there's someone with more experience in that realm
>>>> willing to lead it?  Ivan? Tim? Paolo?
>>> 
>>> I am not sure I would have enough knowledge of the OA model for leading
>>> this:-( But I can certainly contribute
>>> 
>>> Ivan
>>> 
>>>> 
>>>> Rob
>>>> 
>>>> 
>>>> 
>>>> On Tue, Jan 21, 2014 at 3:40 AM, Ivan Herman <ivan@w3.org> wrote:
>>>> Because we are all wildly agreeing, I believe, here is a somewhat more
>>>> specific proposal.
>>>> 
>>>> 1. The OA CG makes a document on the RDFa mapping of OA as a first
>>>> iteration. It is perfectly a prerogative of the CG to do that and
>>>> complements the work around OA (and I am of course happy to help). A good
>>>> corresponding tutorial/primer would be a major plus.
>>>> 
>>>> 2. The WG charter would include a work item on serialization in general,
>>>> and would also specify the CG RDFa mapping as an input document to the WG
>>>> (alongside the OA document itself). Being an input document does not mean
>>>> that the WG is under obligation to use it as is, but it is then in position
>>>> to look at it and go through a possible targeted syntax around <note> or
>>>> anything similar that could map on the OA CG's serialization or... whatever.
>>>> 
>>>> By doing so we avoid the impression that the WG would concentrate on
>>>> Semantic Web related work which would be off-putting for some (alas! I would
>>>> say:). But we would base our work on more solid foundations, as you just
>>>> say.
>>>> 
>>>> The chartering work for a WG has jut begun (witness this discussion!),
>>>> ie, the existence of the WG is still several months away. That should be
>>>> enough for the CG to do that...
>>>> 
>>>> (Actually, similar work around JSON-LD would also make sense, ie, to
>>>> develop a more complex @context file hiding many of the difficulties.)
>>>> 
>>>> How does that sound?
>>>> 
>>>> ivan
>>>> 
>>>> 
>>>> On 21 Jan 2014, at 02:20 , Robert Sanderson <azaroth42@gmail.com> wrote:
>>>> 
>>>>> 
>>>>> And to clarify, in case my attempt to manage scope and process was
>>>>> perceived as pushback against the entire idea...
>>>>> 
>>>>> * I (personally, and as co-chair!) am very happy for such application
>>>>> driven discussions to happen, and to happen here.  I think it would be a
>>>>> great benefit to the community, writ large, to have them in the open and the
>>>>> OACG list seems an appropriate venue.  As the majority of discussions here
>>>>> are more data model or semantics oriented, application issues may not been
>>>>> seen as germaine and they absolutely are (again, IMO).
>>>>> 
>>>>> * Topics to be explored in a potential WG are also very welcome to be
>>>>> discussed, and indeed is greatly beneficial to have them discussed early so
>>>>> as to allow people to think about their positions and potential solutions
>>>>> for the WG. Better to start early than to sit on our hands knowing there are
>>>>> platforms burning away to nothing, and missing the opportunity.
>>>>> 
>>>>> * That said, just because such discussions happen here, doesn't
>>>>> automatically generate new deliverables for the CG or the necessity for a
>>>>> new iteration of the specification. If the W3C decides that there isn't
>>>>> sufficient interest in the community (again writ large) for an Annotation
>>>>> WG, then we (writ smaller) will need to collectively decide what to do about
>>>>> those discussions :)  Hopefully that doesn't eventuate.
>>>>> 
>>>>> * I do stand by my opinion (writ very small) that having an RDFa HTML
>>>>> serialization would be both a good thing and a necessary step towards any
>>>>> OA-in-HTML solution, and the same for JSON-LD. No need to jump outside the
>>>>> standards arena unless it's important to.
>>>>> 
>>>>> I don't think that anyone is against the idea of the increased scope
>>>>> to include more than the data model (of which we are, justifiably, very
>>>>> proud) in the WG, and that there will be compromises to be made in order to
>>>>> increase adoption. That's a natural and important part of standards and
>>>>> extending the community to include other voices and stakeholders.  It is
>>>>> actually to our detriment that we weren't able to have those voices present
>>>>> to begin with, and "please bring these people" is not a challenge or demand
>>>>> but a perhaps overly heartfelt request to solve this current lack! :)
>>>>> 
>>>>> Rob
>>>>> 
>>>>> 
>>>>> On Mon, Jan 20, 2014 at 12:17 PM, Robert Sanderson
>>>>> <azaroth42@gmail.com> wrote:
>>>>> 
>>>>> Hi Doug,
>>>>> 
>>>>> As often, I think we're in violent agreement :)
>>>>> 
>>>>> On Mon, Jan 20, 2014 at 11:53 AM, Doug Schepers <schepers@w3.org>
>>>>> wrote:
>>>>> 
>>>>> * A simple HTML-based serialization would be valuable
>>>>>   -- Embedding an annotation in a page by hitting an API and getting
>>>>> the HTML back
>>>>> 
>>>>> 
>>>>> I think we're in danger of mixing up a few topics here: UI, API and
>>>>> serialization.  Is the requirement for an API that returns
>>>>> pre-formatted
>>>>> HTML for direct inclusion into other OWP applications, or is it an
>>>>> HTML
>>>>> serialization of the data model that will be interpreted and rendered
>>>>> in
>>>>> some way by a User Agent, perhaps using completely different HTML?
>>>>> The
>>>>> former implies, but does not require, a particular look and feel, such
>>>>> as "a few minutes ago" in the time part of Doug's strawman HTML.
>>>>> 
>>>>> I hope it was clear that the strawman I made was meant as sort of an
>>>>> "idealized" and minimalist example of an annotation, with only some
>>>>> essential features.
>>>>> 
>>>>> Yes, it was.  I meant here that any HTML representation intended for
>>>>> direct inclusion (ala tweet streams) into another app or page will
>>>>> necessarily include styling and design, and thus standardization of that
>>>>> across vendors will be, in my opinion, impossible and unnecessary.
>>>>> 
>>>>> A real annotation produced by an authoring tool would likely be full
>>>>> of <div>s and <span>s and other cluttered markup inserted for other reasons
>>>>> (often for styling, or artifacts of composite generation). For example, view
>>>>> the source of a tweet, a Disqus comment, or a Facebook post; this is what
>>>>> will be generated. The key is that no matter what other junk was found in
>>>>> the content of the root element, certain well-formatted bits would be
>>>>> extracted as specifically mapping to the OA model, while the rest would be
>>>>> treated as body (or ignored).
>>>>> 
>>>>> Yes, which is why I'm keen to explore the limits of RDFa first before
>>>>> turning to a home grown solution.
>>>>> 
>>>>> 
>>>>> I'd like to bring up another point: while HTML semantics might seem
>>>>> very lax to RDF folks, but they are treated very seriously by many web
>>>>> developers and designers. They like consistent patterns, and if we can
>>>>> provide them some, that will go a long way toward making them comfortable
>>>>> with producing distillable annotations.
>>>>> 
>>>>> +1.  And if there's recommendations as to providing a more clearly
>>>>> defined set of usage patterns for representing annotations in HTML, I'm all
>>>>> for it :)
>>>>> 
>>>>> The API providing pre-formatted HTML seems very community and
>>>>> situation
>>>>> specific, and thus difficult to standardize directly or effectively.
>>>>> You
>>>>> would likely not want to include the same HTML into an EPUB reading
>>>>> system, as inline into a web page of the same text, or into a stream
>>>>> of
>>>>> the user's annotations due to the different contexts in which that
>>>>> same
>>>>> annotation is being used.  So my perspective is that while this is
>>>>> good
>>>>> background, it's not itself a requirement that we need to address in
>>>>> this CG (or a potential future WG) without vendors first coming to us
>>>>> with a need to interoperate in this way. On the other hand, having a
>>>>> best practice for HTML serialized annotations such that the contents
>>>>> are
>>>>> able to be understood, regardless of the exact manner in which they
>>>>> were
>>>>> obtained, would be very valuable and the scope is much clearer.
>>>>> 
>>>>> I think you may have misunderstood what I meant by an API; I was
>>>>> talking about a client-side JavaScript API for the <note> element, not a
>>>>> server-side API for outputting HTML... though I think that's something
>>>>> people will do, and in fact, already do (again, see Twitter).
>>>>> 
>>>>> Yes, I wasn't meaning client side here, just the notion of the
>>>>> twitter-like stream of annotations in HTML.  So for my points above, think
>>>>> twitter.  For client side, that's another matter entirely that will be
>>>>> essential to discuss and work on with the input of the stakeholders.
>>>>> 
>>>>> 
>>>>> The use of RDFa, as Tim and Ivan discuss, is more clearly a
>>>>> serialization topic -- how can RDFa be leveraged to provide a
>>>>> serialization in HTML that is friendly to web developers?  This also
>>>>> addresses the completeness issue that Doug brings up in his original
>>>>> email.  I think it would be extremely presumptuous not to first do the
>>>>> full RDFa mapping and see what we can come up with, perhaps recruiting
>>>>> additional expertise in the area if needed to help us.
>>>>> 
>>>>> Then we can assess the utility and friendliness of the mapping towards
>>>>> Doug's points of adoption.  It may be that the mapping is great, and
>>>>> hence no need to go any further, or it may be that vendors come back
>>>>> and
>>>>> say it's too arcane and there should be further work done. But that's
>>>>> for the future to determine :)
>>>>> 
>>>>> I have already discussed this informally with at least one vendor, and
>>>>> I got the feedback I expected: they want us to address their use cases, and
>>>>> are less interested in the data model unless it's bundled with other parts
>>>>> of the larger puzzle that will make the ecosystem work.
>>>>> 
>>>>> For sure. But the first step in having that conversation be more
>>>>> focused is, IMO, to produce the RDFa mapping such that it can be evaluated.
>>>>> No one wants to make additional work that's unnecessary, but there's a fine
>>>>> line between rigour and adoption.  At this point, I feel we should err on
>>>>> the side of rigour as adoption is unable to be determined without the direct
>>>>> input from potential adopters... and they need something to give their input
>>>>> on rather than just "we want to use HTML".  See also the analogous
>>>>> JSON/JSON-LD topic too.
>>>>> 
>>>>> I don't think we have the luxury to put it off to the future; if we
>>>>> don't get some key stakeholders from the beginning, and set the right tone
>>>>> for the WG, I don't see us getting W3C support for forming an Annotations
>>>>> WG. The data model is great, but it's not enough.
>>>>> 
>>>>> Said stakeholders would be strongly encouraged to join the community
>>>>> group and discuss their requirements, or if that's not possible for IP or
>>>>> other legal logistics, it would be great to share the details in some
>>>>> anonymized fashion.
>>>>> 
>>>>> 
>>>>> And to be honest, I think that is as it should be; I don't think
>>>>> there's much chance for success with a working group unless we involve a
>>>>> broader set of stakeholders, including browser vendors, JavaScript library
>>>>> authors, annotation webapps services, and others, as well as the
>>>>> data-centric folks already on this list. Just standardizing a data model is
>>>>> not going to interest those other players, because there's nothing for them
>>>>> to do there, and no win for them or their constituents; we also need to talk
>>>>> about things like serializations, DOM events, selection anchoring, styling,
>>>>> and other topics; in other words, things that get implemented in browsers,
>>>>> and which will make doing annotations in browsers easier.
>>>>> 
>>>>> Agreed completely.  The only point I'd like to make is that "web
>>>>> developers want this to be easier" is not a design constraint or even useful
>>>>> feedback -- of course people want things to be easier, that's perfectly
>>>>> clear and something that we've known from the inception of the CG.  What we
>>>>> need to know is where the pain points are in implementation, what the use
>>>>> cases and requirements are, and so forth, such that we can evaluate
>>>>> proposals against agreed upon criteria, not personal anecdotes and feelings.
>>>>> [Ahem, literal bodies]
>>>>> 
>>>>> Hence, my suggestion is to follow the rigourous path first of
>>>>> generating and discussing a description of how Annotation-in-HTML would look
>>>>> in RDFa, solicit feedback, and iterate.
>>>>> 
>>>>> Rob
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> ----
>>>> Ivan Herman, W3C
>>>> Digital Publishing Activity Lead
>>>> Home: http://www.w3.org/People/Ivan/
>>>> mobile: +31-641044153
>>>> GPG: 0x343F1A3D
>>>> FOAF: http://www.ivan-herman.net/foaf
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> ----
>>> Ivan Herman, W3C
>>> Digital Publishing Activity Lead
>>> Home: http://www.w3.org/People/Ivan/
>>> mobile: +31-641044153
>>> GPG: 0x343F1A3D
>>> FOAF: http://www.ivan-herman.net/foaf
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> 
>> --
>> Dr. Paolo Ciccarese
>> http://www.paolociccarese.info/
>> Biomedical Informatics Research & Development
>> Instructor of Neurology at Harvard Medical School
>> Assistant in Neuroscience at Mass General Hospital
>> Member of the MGH Biomedical Informatics Core
>> +1-857-366-1524 (mobile)   +1-617-768-8744 (office)
>> 
>> CONFIDENTIALITY NOTICE: This message is intended only for the addressee(s),
>> may contain information that is considered
>> to be sensitive or confidential and may not be forwarded or disclosed to any
>> other party without the permission of the sender.
>> If you have received this message in error, please notify the sender
>> immediately.
> 
> 
> 
> -- 
> Robert A. Morris
> 
> Emeritus Professor  of Computer Science
> UMASS-Boston
> 100 Morrissey Blvd
> Boston, MA 02125-3390
> 
> 
> Filtered Push Project
> Harvard University Herbaria
> Harvard University
> 
> email: morris.bob@gmail.com
> web: http://efg.cs.umb.edu/
> web: http://wiki.filteredpush.org
> http://www.cs.umb.edu/~ram
> ===
> The content of this communication is made entirely on my
> own behalf and in no way should be deemed to express
> official positions of The University of Massachusetts at Boston or
> Harvard University.
> 
> 



The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.
Received on Wednesday, 22 January 2014 03:55:08 UTC