NLP Interchange Format was: Re: Let's drop RDFa in the requirements ! from Sebastian Hellmann on 2012-05-10 (public-multilingualweb-lt@w3.org from May 2012)

From: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
Date: Thu, 10 May 2012 13:15:56 +0200
To: Dave Lewis <dave.lewis@cs.tcd.ie>
CC: Maxime Lefrançois <maxime.lefrancois@inria.fr>, Multilingual Web LT Public List <public-multilingualweb-lt@w3.org>, public-ontolex@w3.org
Message-ID: <4FABA36C.9060300@informatik.uni-leipzig.de>
Dear all,
I was following the conversation about RDFa and would like to draw your 
attention to the NLP Interchange Format (NIF), which we are still 
developing within LOD2. Although I am not 100% up-to-date with all your 
requirements, I would assume, that NIF tackles some of the issues you 
are having, i.e. the no literals as subject problem or a general 
uncertainty how to handle things.

Please find the latest document (one week old) about it here:
http://svn.aksw.org/papers/2012/WWW_NIF/public/string_ontology.pdf

We are currently gathering requirements for NIF version 2.0. We will 
prepare a draft within the next two months and then a community 
reviewing phase.
I will be at Dublin, so please feel free to ask me any questions.

NIF is already compatible to the lemon model and NERD.

So to compare it to Tadej example, I made one here:
It concerns the first occurrence of "Semantic Web" on 
http://www.w3.org/DesignIssues/LinkedData.html  highlighted here:
http://pcai042.informatik.uni-leipzig.de/~swp12-9/vorprojekt/index.php?annotation_request=http%3A%2F%2Fwww.w3.org%2FDesignIssues%2FLinkedData.html%23hash_10_12_60f02d3b96c55e137e13494cf9a02d06_Semantic%2520Web

Here is the NIF example for it (sso:oen is probably the same as 
itsx:mentions):
<http://www.w3.org/DesignIssues/LinkedData.html#offset_717_729>
       a str:StringInContext ;
       itsx:mentions <http://dbpedia.org/resource/Semantic_Web> .
       sso:oen <http://dbpedia.org/resource/Semantic_Web> .

Additionally "semantic" could have a lexical entry. Note that 1. the 
offset is 4 shorter and that the DBpedia Wiktionary link is working 
already of type lemon:LexicalEntry .

<http://www.w3.org/DesignIssues/LinkedData.html#offset_717_725>
     a str:StringInContext ;
     sso:hasLexicalEntry <http://wiktionary.dbpedia.org/resource/semantic> .


All the best,
Sebastian



On 05/08/2012 03:46 PM, Dave Lewis wrote:
> Hi Maxime,
> Thanks you for this further clarification.
>
> I think a formulation you define, where the litteral would be the 
> _object_ of the triple while the span is the subject, may be 
> sufficient for what ITS is looking for. We only want to mark the 
> litteral for further processing, rather than wanting to make direct 
> assertions about it as a subject.
>
> The question of whether we should be using RDFa for this at all is a 
> broader one. It would be good to get other views, especially from 
> potential implementors of ITS2.0 on this?
>
> Also, to reinforce Maxime's point, the ontolex members and their 
> expertise would be very welcome at the upcoming dublin workshop. On 
> the 11 june we are looking at future roadmaps for convergence of the 
> multilingual web with LOD. On the 12 and 13th we will be focussing 
> directly on the requirements for the ITS2.0 recommendation that the 
> MLW-LT WG is currently producing. We've not finalised the schedule 
> yet, but I imagine that these RDFa issue would be examined early on 
> the 12th in the context of terminology management and it tool support 
> in localization.
>
> Kind Regards,
> Dave
>
>
> On 02/05/2012 11:08, Maxime Lefrançois wrote:
>> Hi Dave, The MSW-CG and MLW-LT-XG members,
>> my answers below
>>
>> ------------------------------------------------------------------------
>>
>>     *De: *"David Lewis" <dave.lewis@cs.tcd.ie>
>>     *À: *public-multilingualweb-lt@w3.org
>>     *Envoyé: *Mardi 1 Mai 2012 02:23:47
>>     *Objet: *Re: Let's drop RDFa in the requirements !
>>
>>     Hi Maxime,
>>     Some comments below:
>>
>>     On 27/04/2012 15:57, Maxime Lefrançois wrote:
>>
>>         Hi,
>>
>>         in mail
>>         
>> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Apr/0131.html,
>>         I wrote a possible RDFa markup to represent the fact that "a
>>         fragment of text is identified as a named entity". I stressed
>>         that there is a shift of meaning : the meaning using RDFa is:
>>         "there is a resource in the document that its:lexicalizes a
>>         named entity, and that has for its:value in english some
>>         fragment of text".
>>
>>         Actually, there will always be a shift of meaning if we are to
>>         use RDFa, and this is a strong conceptualization
>>         incompatibility between ITS and RDF. In fact, in ITS one
>>         annotates fragments of text (litterals), but in RDF litterals
>>         can't be subject of a triple. As simple as that.
>>
>>
>>     But does wrapping the litteral in a span and then adding an id
>>     attribute to that not make it dereferencable and then therefore
>>     the potential subject of a triple?
>>
>> Yes and no,
>>  - the uri could be the subject of a triple anywhere of the web, but 
>> the uri refers to the span, and not to the the text fragment that the 
>> span contains.
>>  - if you want to add a triple in the very same document, you need 
>> RDFa, and in RDF/RDFa there is no mechanism to use a litteral as a 
>> subject, it is forbidden. In RDFa lite, the minimal triple needs a 
>> property="" attribute to define the property of the triple, and the 
>> text fragment is the object of the triple.:
>> <span id="myid" property="its:property">mytext</span> -----> [:myid 
>> its:property "mytext"]
>
>


-- 
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Projects: http://nlp2rdf.org , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org
Received on Thursday, 10 May 2012 11:16:38 UTC