Re: HTML and text Re: Questionn on annotation a text section from Sebastian Hellmann on 2012-07-26 (public-openannotation@w3.org from July 2012)

From: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
Date: Thu, 26 Jul 2012 08:04:17 +0200
To: Reto Bachmann-Gmür <reto@apache.org>
CC: public-openannotation <public-openannotation@w3.org>, Robert Sanderson <azaroth42@gmail.com>
Message-ID: <5010DDE1.9030607@informatik.uni-leipzig.de>

Am 26.07.2012 00:04, schrieb Reto Bachmann-Gmür:
> On Jul 25, 2012 2:19 AM, "Sebastian Hellmann" <
> hellmann@informatik.uni-leipzig.de> wrote:
> ....
>> E.g. <h2 title="Begrüßung" id="welcomeheader" >Hall&ouml;chen!</h2>
>>
>> I assume that your TextOffsetSelector assumes plain text and works on the
> HTML sources?
>
> I would have assumed it works on the actual text represented, so that
> &ouml;, <b>o</b> and ö in the html source all count as one character.
What do you mean by actual text represented? Do you mean text nodes in 
the DOM?
This doesn't seem feasible. If this is your primary data:

<h2 title="Begrüßung" id="welcomeheader" >Hall&ouml;chen!</h2>

How are you measuring offset and range for "Hallöchen!" then?
<_:Selector1> a oax:TextOffsetSelector ;
    oax:offset 44 ;
    oax:range 15 .

Sebastian

>
> Cheers,
> Reto
>


-- 
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Events:
   * http://sabre2012.infai.org/mlode (Leipzig, Sept. 23-24-25, 2012)
   * http://wole2012.eurecom.fr (*Deadline: July 31st 2012*)
Projects: http://nlp2rdf.org , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org

Received on Thursday, 26 July 2012 06:04:46 UTC