Re: Minutes of the breakout on annotation at TPAC 2015

http://www.w3.org/2015/10/28-annotation-minutes.html



W3C

Annotation break-out, TPAC meeting, Sapporo, Japan
28 Oct 2015

See also: IRC log
Attendees

Present
     csarven, rhiaro, tantek, rob sanderson, manu sporny, benjamin young
Regrets
Chair
Scribe
     m4nu

Contents

     Topics
         Introductions
         Brief Walkthrough
         The Annotation Ecosystem
     Summary of Action Items

Introductions

<m4nu> rob: Hi Rob Sanderson, co-chair for Web Annotations WG... Ben 
Young is a co-chair.

<m4nu> Rob: Academically, I've always been interested in annotations, 
allows people to give feedback on annotations.

<m4nu> David: Hi, David Burns - co-editor for web ??? specification - 
user emulation in the browser, wanted to see what this was about.

<m4nu> John: Hi John Jansen from Microsoft, working on Webdriver spec, 
mapping test suite to spec, leveraging annotations to map words in a 
paragraph to spec - interested in where it's going.

<m4nu> Malena: Hi Malena - working on automatic annotation on fashion 
data - image recognition on Semantic Web.

<m4nu> Amy: Hi just joined

<m4nu> Sarven: Joined web annotations recently, at MIT -

<m4nu> Alex: Hi Alex Milowski - worked on annotations and scientific 
data - anything web/annotations pique my interested.

<m4nu> ???: Wanted to see what's going on here.

<m4nu> Philippe: ? Hi , from Paris - research infield, would like to 
extension of annotatin to semantic

<m4nu> Kazui: Kazui Sako - wanted to learn more about web annotations - 
personal data store, manage it, control it, etc, working on it.

<m4nu> Takari: Takari from ???

<m4nu> Ben: Hi Benjamin Young with Hypothesis and co-editor of data model.
Brief Walkthrough

<m4nu> Rob: Just wanted to do a brief walktrhough - five to ten minutes 
. Let's keep questions until the end, put yourself on queue.

<m4nu> Rob: Why people care about annotations - user comments on pages - 
don't read the comments, but want to solve that problem, want to tag 
posts, make review of products, academic paper, describe content that's 
not easily accessible for screen readers

<m4nu> Rob: Transcription of video, audio, replying in a threaded mode 
could be an annotation on an annotation, copyediting - instead of having 
editor wars, anyone might be able to propose a change by making an 
annotation.

<m4nu> Rob: System for annotating and moderating - those have been 
indirections on content, moving your own annotations between systems. 
Moving annotations from Kindle to another ereader.

<m4nu> Rob: Doesn't need to go to anyone outside your circle. We have a 
long list of use cases

<m4nu> Rob: We lay out a set of sorts of things you'd want to do.

<m4nu> Rob: A brief history of annotations on the Web. This was in 
Mosaic in 1993 - annotations. Running a server where annotations would 
be stored, would cause legal and operational problems.

<danbri> see also https://lists.w3.org/Archives/Public/www-annotation/

<m4nu> Rob: Even since very beginning of Web, notation of users 
annotating the Web - both read and write has been part of the vision and 
part of technical reality. Now that we have superior capabilities, time 
is right again to try it.

<m4nu> Rob: In 2001, there was an Annotea protocol

<m4nu> Rob: In 2009, Google created Sidewiki, but stopped in 2011

<danbri> https://lists.w3.org/Archives/Public/www-talk/msg02698.html 
Announcing www-annotation@w3.org and www-collaboration@w3.org (1996)

<m4nu> Rob: In 2011, the Open Annotation Community Group was created - 
OAC was focused more on humanities.

<bigbluehat> also, those email lists are public, so *please* join in the 
conversation

<m4nu> Rob: In 2014 the Web Annotation Working Group was chartered.
The Annotation Ecosystem

<m4nu> Rob: You have someone that creates a page, then someone else 
annotates that web page.

<m4nu> Rob: We have a serialization using JSON-LD to write that 
annotation down, write that down to persistent storage mechanism.

<danbri> see also e.g. 1998 debates around Netscape "what's related" 
functionality — http://www.interhack.net/pubs/whatsrelated/ (Netscape 
fetched RDF annotations from its related link service for each page you 
viewed)

<m4nu> Rob: We also have an anchoring mechanism to talk about a bit of 
the page.

<m4nu> Rob: If you have a small range of text, fragments don't help you. 
Commenting on regions of images rather than ranges of text.

<m4nu> Rob: The annotations may need to be read by the person or other 
people as well, new user might want to comment on the annotation.

<m4nu> Rob: They may have their own store. Final step, as far as 
protocol goes, for notification, for all annotations, it would be useful 
to have a system that aggregates them. Publish that there were 
annotations about image - tells publisher, maybe render them as a part 
of the display.

<m4nu> Rob: That's the architectural vision that you're making good 
progress towards - we have two WDs, data model

<m4nu> Rob: We are trying to simplify implementations, our focus is on 
looking at JSON and JSON-LD for ease of development, so that devs can 
look at document, rather than having to worry about lots of RDF stuff, 
even though its RDF underneath.

<m4nu> Rob: Focus has been on CRUD - use hypermedia APIs to do that, 
also discovery, fortunate enough to have TimBL to join our meetings over 
last couple of days - if you can't discover annotations and where you 
create them, one to one mapping of client and server - that's one area 
that we're trying to solve w/ Web Architecture.

<m4nu> Rob: Ongoing implementation - we don't want new query languages - 
we're looking at what the minimal filtering requirements are that might 
have millions of annotations, internationalization

<m4nu> Rob: On client-side, we're looking for 'find text API' - context 
- we need to work on finding text.

<m4nu> Rob: We are trying to make it simpler, and make it more 
internationalized.

<m4nu> JohnJansen: Is there deep linking in the specs? A URI to point to 
an anchor?

<m4nu> Rob: There has been discussion around find text API and to see if 
that could be used in the fragment - you wouldn't want to put the entire 
text of a chapter into a URI - so there are technical issues - around 
fragments and URIs

<m4nu> Rob: Technically, fragments are defined by media type - lots of 
requirements and discussion, we may not deliver anything in that space. 
We do have people asking for exactly that.

<JohnJansen> :-)

<m4nu> Rob: This stuff is pretty simple (shows example)

<kevinmarks> how small a fragment is not useful?

<m4nu> Rob: There are technical details that we could go through.

<kevinmarks> across the entire web a ten word phrase is pretty good at 
uniqueness

<m4nu> Rob: if you want to comment on our specs, you can annotate them - 
we're dogfooding.

<kevinmarks> within a document how small a phrase do you need?

<m4nu> Rob: To try and dogfood, we enabled annotation on the working drafts.

<m4nu> Rob: There are specs on TR space and on Github.

<m4nu> ???: What about the use of this stuff for ePub standard?

<m4nu> Rob: Working with markus on epub - been implemented in a few 
reading systems, timing is always problematic.

<kevinmarks> yep

<kevinmarks> if someone has done work on uniqueness length I'd love to 
hear about it

<m4nu> Rob: There are a few changes from the CG spec that are not 
backwards compatible - for example, to embed text body - use EARL but it 
was never sent to CR>

<csarven> BartvanLeeuwen: There is 
http://www.w3.org/annotation/diagrams/annotation-architecture.svg

<JohnJansen> +1 to kevinmarks

<m4nu> Rob: There are a few changes that would have to be made, but 
relatively minor.

<m4nu> Rob: Several of us are also a part of digital publishing WG - we 
share a staff contact.

<m4nu> Rob: There is ongoing conversations between communities.

<m4nu> ???: All annotations are human produced?

<m4nu> Rob: At the moment, the majority of the use cases assume human 
produces them, in scientific area - a lot of work being done in NLP.

<m4nu> Rob: UIMA annotates text and uses CG spec to produce those 
annotations, all machine produced. We've done our best to allow for 
those use cases without modifying the model either way. if there are 
issues, we dont' want to make them unusable.

<m4nu> Helena: Machines have special knowledge - the knowledge that they 
have is of a different quality.

<m4nu> Rob: One thing we don't have in there yet is the confidence of 
the implementation - only 50% confident (a machine might say that)

<m4nu> Rob: Since focus has been on annotating web resources, we haven't 
put it into the model - since it's JSON-LD, we can add those features 
later w/o breaking the model. We had a meeting w/ I18N folks, NLP 
interchange format on Monday, they're interested in assisting to see if 
NIF can use annotation model.

<m4nu> Rob: Confidence is one of the requirements.

<m4nu> Helena: I'm part of multi-modal interaction WG - we have EMA - 
supports annotation use cases also - have supported recommendation 
already - in discovery w/ another approach using semantic web also.

<Zakim> kevinmarks, you wanted to ask if there has been work done on 
uniqueness length

<kevinmarks> I'm not physically present, so if someone can read that out…

<m4nu> manu: PLACEKEEPR

<m4nu> Rob: There has been some work done on uniqueness

<m4nu> Rob: 32 characters was good enough for some very high 
percentage... 64 characters was almost 100% accurate - experiment was 
wikipedia corpus, randomly select region of text, then see if you got 
back to the right block of text - so that test was done in english.

<m4nu> Benjamin: If you are only sending the thing you want highlighted, 
in those scenarios, we provide prefix and suffix, which help w/ 
reanchoring. In Hypothesis, we have robust anchoring to provide edit 
space away from text.

<Zakim> m4nu, you wanted to ask where are you in your timeline?

<kevinmarks> I suspect human generated ones would be word rather than 
character focused

<m4nu> manu: PLACEKEEPER2

<m4nu> Rob: We have a two year charter, we're pretty much halfway 
through - we're confident that we'll get the model and protocol and at 
least a stripped down of find text down to CR by middle of next year.

<kevinmarks> is the 64/32 for unique across whole wikipedia corpus or 
within a page?

<m4nu> Rob: Then it's a question of how long CR takes - if we have to 
handle an extension - that would likely be granted.

<m4nu> Rob: We would hope that the next year will create enough momentum 
around things that we're not going to take to CR to get more use cases 
and requirements to know what to fulfill them, then sketch out pre-FPWD 
material.

<m4nu> Rob: We don't want there to be a gap

<m4nu> Rob: We're certainly thinking about it - people wanting to 
participate over next year would help w/ continuation of group.

<m4nu> Bart: How would this work w/ real-world objects?

<m4nu> Rob: If there is a URL to the object, you can talk about it - 
it's RDF, so all you need is a URI for the object.

<m4nu> Bart: If you take a real-world object, would augmented reality 
see the annotations.

<m4nu> Benjamin: Data model is RDF-based, you could put in geolocation - 
specific resource is more fine-grained, a component of a thing - 
highlight - form of selectors, prefix quote suffix, data position inside 
data file, text position, those are the ones that run into I18N 
problems. With a different set of selectors you could do a geolocation 
type thing.

<m4nu> Bart: It's been a while since I read the spec, should work.

<m4nu> Rob: Yes

<m4nu> Benjamin: Yeah, should work.

<m4nu> Rob: Catarina's project after she left Flickr - HistoryPin? wants 
to annotate items in the real world.

<Zakim> kevinmarks, you wanted to ask if the uniqueness was within a 
page or through whole corpus.

<m4nu> Benjamin: Other use case that came up - digitally storing 
annotations in physical books - page number, use digital selector, text 
position to anchor inside book - closer shot at anchoring - we had most 
of what they needed.

<m4nu> Kevin: Was it the entire wikipedia corpus or per page?

<m4nu> Rob: it was on a per-page basis - so if you use a fragid on a 
page, what's the likelyhood that you mis-link.

<m4nu> Rob: You need to see how long it takes for the anchoring to 
become obsolete... don't remember if anchoring was through time.

<Zakim> m4nu, you wanted to ask about credentials and digital signatures.

<m4nu> Manu: What about digital signatures, what about credentials, 
where does that fit into your timeline?

<m4nu> Rob: Very trivial agents that can be associated with annotation, 
target could be another organizations, however, w/o credentials or 
signatures you could trivially spoof information - publish annotations 
that claim you're the author of body - reputation models, that's an issue.

<m4nu> Rob: This is really important to get right - if you want to spam 
someone, million followers, a million followers get spammed w/ 
extraneous content - we know it's going to be important - we want to 
make sure it'll be possible in future, we don't have time to do that 
right now, but it's certainly on radar to work on actively - would want 
to get started - could be another tick mark on ledger

<m4nu> to seeking a second charter

<Zakim> alexmilowski, you wanted to ask What is the status of the other 
items on the charter or have they been rolled up into the existing WD 
documents?

<m4nu> Alex: Are these things rolled into other things, are there things 
that still need to be done - six areas of work - serialization, data 
model, protocol, client-side APIs that use protocol.

<m4nu> Rob: Data model for specification, model + vocab + serializations 
- protocol stands alone, but doesnt have stuff for notification or search

<m4nu> Rob: We can't solve search in this iteration - client-side API, 
robust anchoring - great if you have experience requirements interest - 
find text API will start to be addressed for robust anchoring.

<m4nu> Rob: However, we do not yet have a client-side, make it easy to 
create/manipulate annotations in a browser - there was a pre-WD spec 
written up by Nick Stenning, but we haven't been able to take that 
forwards w/ enough momentum, one of the issues in the WG - lack of input 
from WebApps side. We've been trying to work with WebApps - find text, 
trying to make sure we don't spec something that

<m4nu> doesn't work w/ other APIs.

<m4nu> Rob: Call for help in that space - collaboration.

<Zakim> m4nu, you wanted to ask abou t@id and @type

<m4nu> Rob: If our answers are not complete, please ask.

<m4nu> Benjamin: There are a lot of JSON databases that use ID and Type 
in a different way - annotator already uses UUID, for those developers, 
they would have to change the context than putting square brackets - so 
namespace is a new thing - sorry about brackets - lower pain point, 
change all IDs to URLs. We felt this could co-exist next to existing 
JSON - could be thrown away.

<m4nu> Benjamin: Leaving them around it problematic - but less so than ...

<m4nu> Benjamin: Spec - context is optional - could express JSON in this 
shape, or you could upgrade it - you could add a link header.

<scribe> scribenick: rhiaro_

m4nu: the best option seems to be to get rid of all the @ signs to make 
it ieasier for js developers
... alias @id to id and @type to type
... are you working with legacy data that has id and type?

bigbluehat: we're working with annotations systems all over the web, and 
we're trying to get them to upgrade
... this was the easiest way. So we're not destroying keys they already 
depend on
... It's still an open question
... It was put in and taken back out
... We could consider undoing that

m4nu: how much of your user base are you going to destory by introducing 
this weird @ stuff into this data
... do you want to get more adoption at the risk o fmaking the data 
uglier forever?

bigbluehat: I come from CouchDB land which has thes e ugly underscore 
prefix things
... people have just got used to these
... as being not their stuff
... evertyhing else can be there stuff
... that's not great either
... we don't want to pollute someone elses keyspace at all
... Some amount of developers, mongo and couch, are okay with the shape 
of their json changing if I use that database, I now have these keys I 
don't like
... But the @ one is a little awkward because it requires 4 more 
characters, underscore does not

m4nu: we use mongo and couch and have aliased everything and it worked fine

bigbluehat: I just mentioned those as they are doing awkward ids

m4nu: just because they are doesn't mean you do
... THe @ signs were put in there so legacy json data could easily be 
ported to json-ld
... I understand you have legacy data, I understand you don't want to 
alienate those devs or make them rewrite applications, that's valid
... but if you could change it and they could agree the change to their 
data and their apps, and have a nicer format that looks just like json, 
that would be best

azaroth: one other issue that came up in the discussion was we want to 
use things like activitystreams for having eg. a collection of annotations
... if at al lpossible we don't want to create another collectoin spec
... as2 uses @id and @type
... we were concerned that if we did aliasing and they don't and we 
wanted to use them together that would cause problems

m4nu: right, that would be a problem
... when we first did the @id thing we hoped that would never happen, 
but now it is
... and in schema.org

tantek: we can fix it

m4nu: alias it

danbri: just the @ sign?

m4nu: alias any json-ld keyword that starts with an @ sign
... to not have an @ sign

bigbluehat: as1 has id but it is a uri
... so no problem there
... and means the same thing
... and used objectType earlier, so no cost to change
... For most of the formats an id and type shift is probably the most 
marginal change that they'd have to make
... Other things are harder
... Thank you. I didn't know about json-ld, could you put that in the spec

<tantek> m4nu: 
https://github.com/jasnell/w3c-socialwg-activitystreams/issues/

m4nu: we're trying to put together a best practices thing about that

tantek: could you file an issue against as2?

m4nu: okay

azaroth: after 5, q is empty!

<m4nu> scribe: m4nu

<tantek> https://indiewebcamp.com/annotation-use-cases

<danbri> m4nu, got an example of proper @context mapping syntax for 
@type -> type etc?

<danbri> i.e. for 
https://github.com/schemaorg/schemaorg/blob/sdo-phobos/api.py#L663

Tantek: Most of those annotations are post-types that they're annotating 
- if you want to look at more examples, is that compatible w/ your model 
- as input to social web work - best examples in the model - here are 
people posting replies/reviews. Mostly JSON-focused.

Benjamin: take found JSON, wordpress comments that are not JSON - 
upgrade those into the model
... Schema.org has JSON shapes that match or don't match - what you're 
doing, what you're not doing.

Dan: No way world will adopt single mechanism for annotations - there 
are too many different ways to do it.

Benjamin: This annotation, if you want 3 different types - knock 
yourself out.

xidorn: One more annotation use case - not aware of - why east asian 
video sites - Danmaku - text host where video is created, text will move 
with the video in some direction - in-video comment, probably another 
use case.

Benjamin: We don't have that written up as use case - fragment selector 
- media fragment on time-based positioning - this video and this 10 
second mark.
... Any fragment-based selector ontology - 10 that you dereference - 
make it an RFC - reference non-RFC specs. - here's value hash, media time

<tantek> bigbluehat, hopefully you can mention 
https://indiewebcamp.com/fragmentions as well!

Benjamin: If XPointer had become something, it would've worked.

<tantek> fragmentions is essentially a modern HTML-based replacement for 
XPointer ranges

Rob: The list of things is not affected - these are examples. Most of 
them can be used in URIs. From 30 seconds to 60 seconds of this video, 
fragment according to fragment, media fragment

<danbri> m4nu, ok i found http://www.w3.org/TR/json-ld/#aliasing-keywords :)

<Zakim> m4nu, you wanted to ask about type coercion in JSON-LD.

Manu: Why do you have so many @id?

Rob: We had a long discussion about this - let's take it to hallway 
discussion.
Summary of Action Items
[End of minutes]

Received on Monday, 2 November 2015 07:21:29 UTC