Re: HTML and text Re: Questionn on annotation a text section

Am 26.07.2012 00:04, schrieb Reto Bachmann-Gmür:
> On Jul 25, 2012 2:19 AM, "Sebastian Hellmann" <
> hellmann@informatik.uni-leipzig.de> wrote:
> ....
>> E.g. <h2 title="Begrüßung" id="welcomeheader" >Hall&ouml;chen!</h2>
>>
>> I assume that your TextOffsetSelector assumes plain text and works on the
> HTML sources?
>
> I would have assumed it works on the actual text represented, so that
> &ouml;, <b>o</b> and ö in the html source all count as one character.
What do you mean by actual text represented? Do you mean text nodes in 
the DOM?
This doesn't seem feasible. If this is your primary data:

<h2 title="Begrüßung" id="welcomeheader" >Hall&ouml;chen!</h2>

How are you measuring offset and range for "Hallöchen!" then?
<_:Selector1> a oax:TextOffsetSelector ;
    oax:offset 44 ;
    oax:range 15 .

Sebastian

>
> Cheers,
> Reto
>


-- 
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Events:
   * http://sabre2012.infai.org/mlode (Leipzig, Sept. 23-24-25, 2012)
   * http://wole2012.eurecom.fr (*Deadline: July 31st 2012*)
Projects: http://nlp2rdf.org , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org

Received on Thursday, 26 July 2012 06:04:46 UTC