- From: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
- Date: Thu, 23 May 2013 21:37:00 +0200
- To: Robert Sanderson <azaroth42@gmail.com>
- CC: public-openannotation <public-openannotation@w3.org>
- Message-ID: <519E6FDC.1090208@informatik.uni-leipzig.de>
Hi Rob, I remember when the change was done. The problem is the semantics of the property. For "This is a sentence." You will have: "a" -> oa:start "9" oa:end "9" While NIF and most other NLP tools use: nif:beginIndex "8" nif:endIndex "9" Actually, I was just about to add: nif:beginIndex rdfs:subPropertyOf oa:start . now I have to add: nif:beginIndex owl:propertyDisjointWith oa:start . Furthermore, all NLP tools will have to do +1 and will loose precision, as annotations on gaps are lost, e.g. the "sentence beginning" is at 0,0 RFC 5147 spents about a page explaining exactly how they count position. http://tools.ietf.org/html/rfc5147#section-2.1.1 Actually, the definition of NIF is even more specific as it defines that the property counts Unicode code units, which can sometime make a difference for Asian languages. All the best, Sebastian Am 23.05.2013 19:22, schrieb Robert Sanderson: > > This was changed from offset and range in the previous version to make > it easier to query, and the same information is (fundamentally) > available. It's the choice between having to do range=end-start, or > end=start+range in code. > > I'm not sure that I follow the concern about counting characters > rather than gaps. Also, RFC5147 uses start and end, not start and > range, but perhaps that's my misunderstanding of your issue? > > E.g. one of their examples: > ftp://example.com/text.txt#line=10,20;length=9876,UTF-8 > As in the second example, this URI identifies lines 11 to 20 of the > text.txt MIME entity. > Hope that helps, > > Rob > > > On Thu, May 23, 2013 at 11:09 AM, Sebastian Hellmann > <hellmann@informatik.uni-leipzig.de > <mailto:hellmann@informatik.uni-leipzig.de>> wrote: > > Dear OA community, > oa:start[1] and oa:end are incompatible with most of the NLP tools > including several Web standards. > Maybe this is just my limited knowledge, but I am unaware of any > standards that count characters. Most are counting the gaps. > So these properties are incompatible with: > - http://www.w3.org/TR/xptr-xpointer/#b2b1b1b3b6b6 > - http://tools.ietf.org/html/rfc5147#section-2.1.1 > - LAF/Graf ISO standard, > - UIMA > - Gate > - Apache Stanbol and FISE > - NIF > > I was absent from this list for a while. Is it still possible to > change the semantics or add new properties? > > All the best, > Sebastian > > [1] http://www.w3.org/ns/oa#start > > > > -- > Dipl. Inf. Sebastian Hellmann > Department of Computer Science, University of Leipzig > Events: NLP & DBpedia 2013 (http://nlp-dbpedia2013.blogs.aksw.org, > Deadline: *July 8th*) > Venha para a Alemanha como PhD: > http://bis.informatik.uni-leipzig.de/csf > Projects: http://nlp2rdf.org , http://linguistics.okfn.org , > http://dbpedia.org/Wiktionary , http://dbpedia.org > Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann > Research Group: http://aksw.org > > -- Dipl. Inf. Sebastian Hellmann Department of Computer Science, University of Leipzig Events: NLP & DBpedia 2013 (http://nlp-dbpedia2013.blogs.aksw.org, Deadline: *July 8th*) Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf Projects: http://nlp2rdf.org , http://linguistics.okfn.org , http://dbpedia.org/Wiktionary , http://dbpedia.org Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann Research Group: http://aksw.org
Received on Thursday, 23 May 2013 19:37:43 UTC