Re: NLP Interchange Format was: Re: Let's drop RDFa in the requirements !

Dear Felix,

On 05/11/2012 09:33 AM, Felix Sasaki wrote:
> Thanks a lot for this, Sebastian.
>
> https://www.w3.org/International/multilingualweb/lt/track/issues/2
The link to the issues says "Unauthorized - These pages are restricted 
to W3C Members."

> we envisage ITS attributes (its-*) for HTML5 and an automatic conversion to
> RDFa
>
> "
>
>     - the working group will provide an algorithm to convert its- attributes
>     into RDFa and Microdata markup, to serve the needs of the Semantic Web
>     community and of search engine optimization.
>     - The conversion to RDFa will add URIs to each metadata item in an HTML5
>     document. This is needed as reference points for the metadata items after
>     extraction of RDF.
>
> "
Are RDFa and Microdata enough to fulfill your requirements. The use case 
to deploy these is normally limited: i.e. a Web Content (XHTML) 
publisher embeds additional structured information into his own web 
pages.  Is this enough?

Although not technically impossible, I would consider these issues 
difficult to tackle with RDFa:
- overlapping annotations
- multiple annotations, i.e. from more than one layer or provider
- merging of annotated documents (I am not sure, if this is possible)
- third-party annotations (RDFa and Microdata can only be embedded by 
the "owner" of the web document)


> Tadej is likely to work describing that conversion algorithm (which I guess
> will be pretty straightforward). Sebastian or others, how would NIF fit
> into this picture? What alignment between the conversion to RDFa and
> potentially to NIF is needed?
As far as I know there are only few, if any, approaches mixing stand-off 
and inline annotations, but with RDF and RDFa this might actualy be no 
far stretch at all.
We would definitely need to add another URI scheme to NIF which allow 
the transition from and to RDFa. The RDF properties can be reused, directly.
@Tadej, did you already make an RDFa example? I think choosing the right 
URIs is challenging and a general problem, so you would need to tackle 
it anyhow.
Blank nodes are not advisable (they are difficult to merge and IIRC 
increase complexity from P to NP in RDFS entailement. Please ask, if you 
want the references, I have them somewhere...)


I have not read all the documents regarding MLW-LT, I hope to get a much 
clearer picture at the workshop.

All the best,
Sebastian

>
> Felix
>
>
> 2012/5/10 Sebastian Hellmann<hellmann@informatik.uni-leipzig.de>
>
>> Dear all,
>> I was following the conversation about RDFa and would like to draw your
>> attention to the NLP Interchange Format (NIF), which we are still
>> developing within LOD2. Although I am not 100% up-to-date with all your
>> requirements, I would assume, that NIF tackles some of the issues you are
>> having, i.e. the no literals as subject problem or a general uncertainty
>> how to handle things.
>>
>> Please find the latest document (one week old) about it here:
>> http://svn.aksw.org/papers/**2012/WWW_NIF/public/string_**ontology.pdf<http://svn.aksw.org/papers/2012/WWW_NIF/public/string_ontology.pdf>
>>
>> We are currently gathering requirements for NIF version 2.0. We will
>> prepare a draft within the next two months and then a community reviewing
>> phase.
>> I will be at Dublin, so please feel free to ask me any questions.
>>
>> NIF is already compatible to the lemon model and NERD.
>>
>> So to compare it to Tadej example, I made one here:
>> It concerns the first occurrence of "Semantic Web" on http://www.w3.org/**
>> DesignIssues/LinkedData.html<http://www.w3.org/DesignIssues/LinkedData.html>  highlighted here:
>> http://pcai042.informatik.uni-**leipzig.de/~swp12-9/**
>> vorprojekt/index.php?**annotation_request=http%3A%2F%**
>> 2Fwww.w3.org%2FDesignIssues%**2FLinkedData.html%23hash_10_**12_**
>> 60f02d3b96c55e137e13494cf9a02d**06_Semantic%2520Web<http://pcai042.informatik.uni-leipzig.de/~swp12-9/vorprojekt/index.php?annotation_request=http%3A%2F%2Fwww.w3.org%2FDesignIssues%2FLinkedData.html%23hash_10_12_60f02d3b96c55e137e13494cf9a02d06_Semantic%2520Web>
>>
>> Here is the NIF example for it (sso:oen is probably the same as
>> itsx:mentions):
>> <http://www.w3.org/**DesignIssues/LinkedData.html#**offset_717_729<http://www.w3.org/DesignIssues/LinkedData.html#offset_717_729>
>>       a str:StringInContext ;
>>       itsx:mentions<http://dbpedia.org/resource/**Semantic_Web<http://dbpedia.org/resource/Semantic_Web>>
>> .
>>       sso:oen<http://dbpedia.org/resource/**Semantic_Web<http://dbpedia.org/resource/Semantic_Web>>
>> .
>>
>> Additionally "semantic" could have a lexical entry. Note that 1. the
>> offset is 4 shorter and that the DBpedia Wiktionary link is working already
>> of type lemon:LexicalEntry .
>>
>> <http://www.w3.org/**DesignIssues/LinkedData.html#**offset_717_725<http://www.w3.org/DesignIssues/LinkedData.html#offset_717_725>
>>     a str:StringInContext ;
>>     sso:hasLexicalEntry<http://wiktionary.dbpedia.**org/resource/semantic<http://wiktionary.dbpedia.org/resource/semantic>>
>> .
>>
>>
>> All the best,
>> Sebastian
>>
>>
>>
>> On 05/08/2012 03:46 PM, Dave Lewis wrote:
>>
>>> Hi Maxime,
>>> Thanks you for this further clarification.
>>>
>>> I think a formulation you define, where the litteral would be the
>>> _object_ of the triple while the span is the subject, may be sufficient for
>>> what ITS is looking for. We only want to mark the litteral for further
>>> processing, rather than wanting to make direct assertions about it as a
>>> subject.
>>>
>>> The question of whether we should be using RDFa for this at all is a
>>> broader one. It would be good to get other views, especially from potential
>>> implementors of ITS2.0 on this?
>>>
>>> Also, to reinforce Maxime's point, the ontolex members and their
>>> expertise would be very welcome at the upcoming dublin workshop. On the 11
>>> june we are looking at future roadmaps for convergence of the multilingual
>>> web with LOD. On the 12 and 13th we will be focussing directly on the
>>> requirements for the ITS2.0 recommendation that the MLW-LT WG is currently
>>> producing. We've not finalised the schedule yet, but I imagine that these
>>> RDFa issue would be examined early on the 12th in the context of
>>> terminology management and it tool support in localization.
>>>
>>> Kind Regards,
>>> Dave
>>>
>>>
>>> On 02/05/2012 11:08, Maxime Lefrançois wrote:
>>>
>>>> Hi Dave, The MSW-CG and MLW-LT-XG members,
>>>> my answers below
>>>>
>>>> ------------------------------**------------------------------**
>>>> ------------
>>>>
>>>>     *De: *"David Lewis"<dave.lewis@cs.tcd.ie>
>>>>     *À: *public-multilingualweb-lt@w3.**org<public-multilingualweb-lt@w3.org>
>>>>     *Envoyé: *Mardi 1 Mai 2012 02:23:47
>>>>     *Objet: *Re: Let's drop RDFa in the requirements !
>>>>
>>>>     Hi Maxime,
>>>>     Some comments below:
>>>>
>>>>     On 27/04/2012 15:57, Maxime Lefrançois wrote:
>>>>
>>>>         Hi,
>>>>
>>>>         in mail
>>>>         http://lists.w3.org/Archives/**Public/public-multilingualweb-**
>>>> lt/2012Apr/0131.html<http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Apr/0131.html>
>>>> ,
>>>>         I wrote a possible RDFa markup to represent the fact that "a
>>>>         fragment of text is identified as a named entity". I stressed
>>>>         that there is a shift of meaning : the meaning using RDFa is:
>>>>         "there is a resource in the document that its:lexicalizes a
>>>>         named entity, and that has for its:value in english some
>>>>         fragment of text".
>>>>
>>>>         Actually, there will always be a shift of meaning if we are to
>>>>         use RDFa, and this is a strong conceptualization
>>>>         incompatibility between ITS and RDF. In fact, in ITS one
>>>>         annotates fragments of text (litterals), but in RDF litterals
>>>>         can't be subject of a triple. As simple as that.
>>>>
>>>>
>>>>     But does wrapping the litteral in a span and then adding an id
>>>>     attribute to that not make it dereferencable and then therefore
>>>>     the potential subject of a triple?
>>>>
>>>> Yes and no,
>>>>   - the uri could be the subject of a triple anywhere of the web, but the
>>>> uri refers to the span, and not to the the text fragment that the span
>>>> contains.
>>>>   - if you want to add a triple in the very same document, you need RDFa,
>>>> and in RDF/RDFa there is no mechanism to use a litteral as a subject, it is
>>>> forbidden. In RDFa lite, the minimal triple needs a property="" attribute
>>>> to define the property of the triple, and the text fragment is the object
>>>> of the triple.:
>>>> <span id="myid" property="its:property">**mytext</span>  ----->  [:myid
>>>> its:property "mytext"]
>>>>
>>>
>>>
>> --
>> Dipl. Inf. Sebastian Hellmann
>> Department of Computer Science, University of Leipzig
>> Projects: http://nlp2rdf.org , http://dbpedia.org
>> Homepage: http://bis.informatik.uni-**leipzig.de/SebastianHellmann<http://bis.informatik.uni-leipzig.de/SebastianHellmann>
>> Research Group: http://aksw.org
>>
>>
>>
>


-- 
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Projects: http://nlp2rdf.org , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org

Received on Friday, 11 May 2012 08:48:17 UTC