Re: oa:start and end from Sebastian Hellmann on 2013-05-23 (public-openannotation@w3.org from May 2013)

From: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
Date: Thu, 23 May 2013 21:37:00 +0200
To: Robert Sanderson <azaroth42@gmail.com>
CC: public-openannotation <public-openannotation@w3.org>
Message-ID: <519E6FDC.1090208@informatik.uni-leipzig.de>

Hi Rob,
I remember when the change was done.  The problem is the semantics of 
the property.

For "This is a sentence."
You will have:
"a" ->
oa:start "9"
oa:end "9"

While NIF and most other NLP tools use:
nif:beginIndex "8"
nif:endIndex "9"

Actually, I was just about to add:
nif:beginIndex rdfs:subPropertyOf oa:start .
now I have to add:
nif:beginIndex owl:propertyDisjointWith oa:start .

Furthermore, all NLP tools will have to do +1 and will loose precision, 
as annotations on gaps are lost, e.g. the "sentence beginning" is at 0,0

RFC 5147 spents about a page explaining exactly how they count position.
http://tools.ietf.org/html/rfc5147#section-2.1.1

Actually, the definition of NIF is even more specific as it defines that 
the property counts Unicode code units, which can sometime make a 
difference for Asian languages.

All the best,
Sebastian

Am 23.05.2013 19:22, schrieb Robert Sanderson:
>
> This was changed from offset and range in the previous version to make 
> it easier to query, and the same information is (fundamentally) 
> available.  It's the choice between having to do range=end-start, or 
> end=start+range in code.
>
> I'm not sure that I follow the concern about counting characters 
> rather than gaps. Also, RFC5147 uses start and end, not start and 
> range, but perhaps that's my misunderstanding of your issue?
>
> E.g. one of their examples:
>     ftp://example.com/text.txt#line=10,20;length=9876,UTF-8
>     As in the second example, this URI identifies lines 11 to 20 of the
>     text.txt MIME entity.
> Hope that helps,
>
> Rob
>
>
> On Thu, May 23, 2013 at 11:09 AM, Sebastian Hellmann 
> <hellmann@informatik.uni-leipzig.de 
> <mailto:hellmann@informatik.uni-leipzig.de>> wrote:
>
>     Dear OA community,
>     oa:start[1] and oa:end are incompatible with most of the NLP tools
>     including several Web standards.
>     Maybe this is just my limited knowledge, but I am unaware of any
>     standards that count characters. Most are counting the gaps.
>     So these properties are incompatible with:
>     - http://www.w3.org/TR/xptr-xpointer/#b2b1b1b3b6b6
>     - http://tools.ietf.org/html/rfc5147#section-2.1.1
>     - LAF/Graf ISO standard,
>     - UIMA
>     - Gate
>     - Apache Stanbol and FISE
>     - NIF
>
>     I was absent from this list for a while. Is it still possible to
>     change the semantics or add new properties?
>
>     All the best,
>     Sebastian
>
>     [1] http://www.w3.org/ns/oa#start
>
>
>
>     -- 
>     Dipl. Inf. Sebastian Hellmann
>     Department of Computer Science, University of Leipzig
>     Events: NLP & DBpedia 2013 (http://nlp-dbpedia2013.blogs.aksw.org,
>     Deadline: *July 8th*)
>     Venha para a Alemanha como PhD:
>     http://bis.informatik.uni-leipzig.de/csf
>     Projects: http://nlp2rdf.org , http://linguistics.okfn.org ,
>     http://dbpedia.org/Wiktionary , http://dbpedia.org
>     Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
>     Research Group: http://aksw.org
>
>


-- 
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Events: NLP & DBpedia 2013 (http://nlp-dbpedia2013.blogs.aksw.org, 
Deadline: *July 8th*)
Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf
Projects: http://nlp2rdf.org , http://linguistics.okfn.org , 
http://dbpedia.org/Wiktionary , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org

Received on Thursday, 23 May 2013 19:37:43 UTC