W3C home > Mailing lists > Public > public-openannotation@w3.org > July 2012

Re: HTML and text Re: Questionn on annotation a text section

From: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
Date: Thu, 26 Jul 2012 08:04:17 +0200
Message-ID: <5010DDE1.9030607@informatik.uni-leipzig.de>
To: Reto Bachmann-Gmür <reto@apache.org>
CC: public-openannotation <public-openannotation@w3.org>, Robert Sanderson <azaroth42@gmail.com>
Am 26.07.2012 00:04, schrieb Reto Bachmann-Gmür:
> On Jul 25, 2012 2:19 AM, "Sebastian Hellmann" <
> hellmann@informatik.uni-leipzig.de> wrote:
> ....
>> E.g. <h2 title="Begrüßung" id="welcomeheader" >Hall&ouml;chen!</h2>
>>
>> I assume that your TextOffsetSelector assumes plain text and works on the
> HTML sources?
>
> I would have assumed it works on the actual text represented, so that
> &ouml;, <b>o</b> and ö in the html source all count as one character.
What do you mean by actual text represented? Do you mean text nodes in 
the DOM?
This doesn't seem feasible. If this is your primary data:

<h2 title="Begrüßung" id="welcomeheader" >Hall&ouml;chen!</h2>

How are you measuring offset and range for "Hallöchen!" then?
<_:Selector1> a oax:TextOffsetSelector ;
    oax:offset 44 ;
    oax:range 15 .

Sebastian

>
> Cheers,
> Reto
>


-- 
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Events:
   * http://sabre2012.infai.org/mlode (Leipzig, Sept. 23-24-25, 2012)
   * http://wole2012.eurecom.fr (*Deadline: July 31st 2012*)
Projects: http://nlp2rdf.org , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org
Received on Thursday, 26 July 2012 06:04:46 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:38:10 UTC