Re: [ISSUE-29][ACTION-164] ITS2NIF2ITS - RDF roundtrip from Felix Sasaki on 2012-08-15 (public-multilingualweb-lt@w3.org from August 2012)

From: Felix Sasaki <fsasaki@w3.org>
Date: Wed, 15 Aug 2012 22:58:35 +0200
To: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
Cc: Jirka Kosek <jirka@kosek.cz>, public-multilingualweb-lt@w3.org
Message-ID: <CAL58czpyU_sqCuPsXjAPPkiCR5-9t_w5U=+nQZsYR=hGa5if8Q@mail.gmail.com>
Hi Sebastian,

thanks for the feedback. I think I now have cached all issues, and with the
example input file at
http://wiki.nlp2rdf.org/wiki/ITS2NIF2ITS
I am generating the attached RDF/XML output. The turtle syntax issues are
due to the converter I used (I think)
http://www.rdfabout.com/demo/validator/

2012/8/15 Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>

> Hi Felix,
> there are some minor issues:
>
> - turtle syntax => "<str:Context>" should either be "str:Context" (no <>)
> or full <http://nlp2rdf.lod2.eu/**schema/string/Context<http://nlp2rdf.lod2.eu/schema/string/Context>
> >
> - "offset" is missing sometimes "http://example.com/**
> exampledoc.html#23_30 <http://example.com/exampledoc.html#23_30>"
> - there is the open question, whether the fragment that covers the whole
> content of the document is equal to the document:
> <http://example.com/**exampledoc.html <http://example.com/exampledoc.html>>
> owl:sameAs <http://example.com/**exampledoc.html#offset_0_29<http://example.com/exampledoc.html#offset_0_29>
> >
> But this might be rather philosophical.
>

I agree.


> - RDF recommends Unicode NormalForm C : http://www.w3.org/TR/rdf-**
> concepts/#section-Literals<http://www.w3.org/TR/rdf-concepts/#section-Literals>
> This is why, we will make it mandatory. Some of the RDF parsers might
> complain, if any literals are not in Unicode Normalform C . Sometimes these
> are just warning and sometimes parsing fails completely.
>

This might create an issue for HTML5, which doesn't require normalization,
but I'll come back to you if this is the case.

For now I'll start converting the test cases we have, see
http://lists.w3.org/Archives/Public/public-multilingualweb-lt-tests/2012Aug/0003.html
http://phaedrus.scss.tcd.ie/its2.0/its-testsuite.html
to NIF. At the end we might just add another column parallel to "expected
result" to cover the NIF conversion.

Best,

Felix


>
>
> Please see below for the correct output for the string "Dublin" in the
> Context: "Welcome to Dublin in Ireland!" occuring in
> http://example.com/exampledoc.**html <http://example.com/exampledoc.html>
> I validated it with the command line tools libraptor2 or rapper for unix:
> http://librdf.org/raptor/**rapper.html<http://librdf.org/raptor/rapper.html>
>
>
> [
>
> @prefix itsrdf: <http://www.w3.org/2005/11/**its/rdf#<http://www.w3.org/2005/11/its/rdf#>
> >.
> @prefix str: <http://nlp2rdf.lod2.eu/**schema/string/<http://nlp2rdf.lod2.eu/schema/string/>
> >.
> @prefix rdf: <http://www.w3.org/1999/02/22-**rdf-syntax-ns#<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
> >.
> # the reference context, i.e. the whole string that occurs in
> http://example.com/exampledoc.**html <http://example.com/exampledoc.html>
> <http://example.com/**exampledoc.html#offset_0_29<http://example.com/exampledoc.html#offset_0_29>
> >
> # encodes some simple provenance
> str:occursIn <http://example.com/**exampledoc.html<http://example.com/exampledoc.html>>
> ;
> # includes the whole string
>
> str:isString "Welcome to Dublin in Ireland!" ;
> a str:Context.
> # this is a the substring "Dublin"
> <http://example.com/**exampledoc.html#offset_11_17<http://example.com/exampledoc.html#offset_11_17>
> >
> a str:String ;
> str:anchorOf "Dublin";
> itsrdf:translate "no"^^<http://www.w3.org/TR/**its-2.0/its.xsd#yesOrNo<http://www.w3.org/TR/its-2.0/its.xsd#yesOrNo>>
> ;
> # all substrings have a reference to their context
> str:referenceContext <http://example.com/**exampledoc.html#offset_0_29<http://example.com/exampledoc.html#offset_0_29>>
> .
>
> ]
>
> All the best,
> Sebastian
>
> Am 10.08.2012 10:03, schrieb Felix Sasaki:
>
>> Hi Sebastian, Jirka, all,
>>
>> thanks for the feedback. I have tried to integrate it into the output,
>> with
>> the (X)HTML file attached. This is the output:
>>
>> [
>>
>> @prefix itsrdf: <http://www.w3.org/2005/11/**its/rdf#<http://www.w3.org/2005/11/its/rdf#>
>> >.
>> @prefix str: <http://nlp2rdf.lod2.eu/**schema/string/<http://nlp2rdf.lod2.eu/schema/string/>
>> >.
>> @prefix rdf: <http://www.w3.org/1999/02/22-**rdf-syntax-ns#<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>> >.
>> <http://example.com/**exampledoc.html#0_29<http://example.com/exampledoc.html#0_29>>
>> str:anchorOf
>> <http://example.com/**exampledoc.html<http://example.com/exampledoc.html>
>> >;
>>         str:isString "Welcome to Dublin in Ireland!";
>>         a <str:Context>.
>> <http://example.com/**exampledoc.html#11_17<http://example.com/exampledoc.html#11_17>>
>> str:anchorOf
>> <http://example.com/**exampledoc.html<http://example.com/exampledoc.html>
>> >;
>>         str:isString "Dublin";
>>         a <str:Context>.
>> <http://example.com/**exampledoc.html#23_30<http://example.com/exampledoc.html#23_30>>
>> str:anchorOf
>> <http://example.com/**exampledoc.html<http://example.com/exampledoc.html>
>> >;
>>         str:isString "Ireland";
>>         a <str:Context>.
>> <http://example.com/**exampledoc.html#offset_0_29<http://example.com/exampledoc.html#offset_0_29>>
>> str:referenceContext
>> <http://example.com/**exampledoc.html#0_29<http://example.com/exampledoc.html#0_29>
>> >;
>>         a <str:String>;
>>         itsrdf:translate "yes"^^<http://www.w3.org/TR/**
>> its-2.0/its.xsd#yesOrNo <http://www.w3.org/TR/its-2.0/its.xsd#yesOrNo>>.
>> <http://example.com/**exampledoc.html#offset_11_17<http://example.com/exampledoc.html#offset_11_17>>
>> str:referenceContext
>> <http://example.com/**exampledoc.html#11_17<http://example.com/exampledoc.html#11_17>
>> >;
>>         a <str:String>;
>>         itsrdf:translate "no"^^<http://www.w3.org/TR/**
>> its-2.0/its.xsd#yesOrNo <http://www.w3.org/TR/its-2.0/its.xsd#yesOrNo>>.
>> <http://example.com/**exampledoc.html#offset_23_30<http://example.com/exampledoc.html#offset_23_30>>
>> str:referenceContext
>> <http://example.com/**exampledoc.html#23_30<http://example.com/exampledoc.html#23_30>
>> >;
>>         a <str:String>;
>>         itsrdf:translate "no"^^<http://www.w3.org/TR/**
>> its-2.0/its.xsd#yesOrNo <http://www.w3.org/TR/its-2.0/its.xsd#yesOrNo>>.
>>
>> ]
>>
>>
>> Let me know if there are any open issues with this output. One question: I
>> don't understand your reference to normalization form C - do you require
>> Unicode normalization for generating the output? Above offets are based on
>> non normalized processing, let me know if this needs to be changed. We
>> just
>> need to have clear rules with regards to whitespace and normalization.
>>
>> Thanks,
>>
>> Felix
>>
>> 2012/8/9 Sebastian Hellmann <hellmann@informatik.uni-**leipzig.de<hellmann@informatik.uni-leipzig.de>
>> >
>>
>>  HI Felix,
>>> there are some syntactic errors: <str:String> .
>>>
>>> Maybe this helps:
>>> curl -X POST --data-urlencode input="Apache Stanbol can detect entities."
>>> --data input-type=text  --data format=turtle
>>> http://nlp2rdf.lod2.eu/demo/*
>>> *NIFStanfordCore <http://nlp2rdf.lod2.eu/demo/**NIFStanfordCore<http://nlp2rdf.lod2.eu/demo/NIFStanfordCore>
>>> >
>>>
>>> curl -X POST --data-urlencode input="Apache Stanbol can detect entities."
>>> --data input-type=text  --data format=turtle --data-urlencode prefix="
>>> http://example.com/****exampledoc.html#<http://example.com/**exampledoc.html#>
>>> <http://**example.com/exampledoc.html#<http://example.com/exampledoc.html#>
>>> >"
>>> http://nlp2rdf.lod2.eu/demo/****NIFStanfordCore<http://nlp2rdf.lod2.eu/demo/**NIFStanfordCore>
>>> <http://**nlp2rdf.lod2.eu/demo/**NIFStanfordCore<http://nlp2rdf.lod2.eu/demo/NIFStanfordCore>
>>> >
>>>
>>> curl -X POST --data-urlencode input="Apache Stanbol can detect entities."
>>> --data input-type=text  --data format=turtle --data-urlencode
>>> prefix="urn:uuid:CEB9FD94-****6779-4257-B992-C853617CB791#"
>>> http://nlp2rdf.lod2.eu/demo/****NIFStanfordCore<http://nlp2rdf.lod2.eu/demo/**NIFStanfordCore>
>>> <http://**nlp2rdf.lod2.eu/demo/**NIFStanfordCore<http://nlp2rdf.lod2.eu/demo/NIFStanfordCore>
>>> >
>>>
>>>
>>> I also attached the output. It is the Stanford Pos tagger NIF 2.0 draft
>>> wrapper. (Errata: Context uses anchorOf instead of isString)
>>> Normally, the prefix parameter is variable and set as config option.
>>>   Please don't worry about UUIDs . NIF and ITS in NIF don't need them.
>>> The
>>> reason, why I included them, was that I am writing a converter for Apache
>>> Stanbol to NIF and ITS and Stanbol uses UUIDs. I removed them from the
>>> wiki
>>> page.
>>>
>>> So here are some corrections:
>>>
>>> <urn:uuid:CEB9FD94-6779-4257-****B992-C853617CB791#0_50> str:isString
>>>
>>> "\r\n    \r\n        Welcome to Dublin in Ireland! \r\n    \r\n";
>>>      str:occursIn <http://example.com/****exampledoc.html<http://example.com/**exampledoc.html>
>>> <http://**example.com/exampledoc.html<http://example.com/exampledoc.html>
>>> >
>>>
>>>  ;
>>>>
>>>      a <str:Context>.
>>>
>>> Should be:
>>> <http://example.com/****exampledoc.html#0_54<http://example.com/**exampledoc.html#0_54>
>>> <http://**example.com/exampledoc.html#0_**54<http://example.com/exampledoc.html#0_54>
>>> >>
>>>
>>> str:isString
>>>
>>> "\r\n    \r\n        Welcome to Dublin in Ireland! \r\n    \r\n";
>>>      str:occursIn <http://example.com/****exampledoc.html<http://example.com/**exampledoc.html>
>>> <http://**example.com/exampledoc.html<http://example.com/exampledoc.html>
>>> >
>>>
>>>  ;
>>>>
>>>      a str:Context .
>>> Character length of 54 is correct as this is based on Unicode Normal Form
>>> C, counted in Code Units: http://unicode.org/faq/char_****
>>> combmark.html#7 <http://unicode.org/faq/char_**combmark.html#7><http://*
>>> *unicode.org/faq/char_combmark.**html#7<http://unicode.org/faq/char_combmark.html#7>
>>> >
>>>
>>>
>>> **************************
>>>
>>> <urn:uuid:CEB9FD94-6779-4257-****B992-C853617CB791#25_31> str:isString
>>> "Dublin";
>>>      str:occursIn <http://example.com/****exampledoc.html<http://example.com/**exampledoc.html>
>>> <http://**example.com/exampledoc.html<http://example.com/exampledoc.html>
>>> >
>>>
>>>> ;
>>>>
>>>      a <str:Context>.
>>> <urn:uuid:CEB9FD94-6779-4257-****B992-C853617CB791#25_32> str:isString
>>> "Ireland";
>>>      str:occursIn <http://example.com/****exampledoc.html<http://example.com/**exampledoc.html>
>>> <http://**example.com/exampledoc.html<http://example.com/exampledoc.html>
>>> >
>>>
>>>  ;
>>>>
>>>      a <str:Context>.
>>> Should be:
>>> <http://example.com/****exampledoc.html#31_37<http://example.com/**exampledoc.html#31_37>
>>> <http://**example.com/exampledoc.html#**31_37<http://example.com/exampledoc.html#31_37>
>>> >
>>>      str:anchorOf "Dublin";
>>>
>>>      str:occursIn <http://example.com/****exampledoc.html<http://example.com/**exampledoc.html>
>>> <http://**example.com/exampledoc.html<http://example.com/exampledoc.html>
>>> >
>>>
>>>> ;
>>>>
>>>      a str:Context.
>>> <http://example.com/****exampledoc.html#41_48<http://example.com/**exampledoc.html#41_48>
>>> <http://**example.com/exampledoc.html#**41_48<http://example.com/exampledoc.html#41_48>
>>> >
>>>      str:anchorOf "Ireland";
>>>
>>>      str:occursIn <http://example.com/****exampledoc.html<http://example.com/**exampledoc.html>
>>> <http://**example.com/exampledoc.html<http://example.com/exampledoc.html>
>>> >
>>>
>>>  ;
>>>>
>>>      a str:Context.
>>>
>>> The counts seem to be wrong. Other than that it looks already quite
>>> close.
>>> All the best,
>>> Sebastian
>>>
>>> Am 09.08.2012 13:30, schrieb Felix Sasaki:
>>>
>>>  Hi Sebastian, all,
>>>>
>>>> I tried to create the NIF output (since we need two implementations) for
>>>>
>>>> <html xmlns:its="http://www.w3.org/****2005/11/its<http://www.w3.org/**2005/11/its>
>>>> <http://www.w3.**org/2005/11/its <http://www.w3.org/2005/11/its>>
>>>>
>>>> ">
>>>>       <body>
>>>>           <h2 its:translate="yes">Welcome to <span its:translate="no"
>>>>                   >Dublin</span> in <b its:translate="no">Ireland</b>**
>>>> **!
>>>>
>>>> </h2>
>>>>       </body>
>>>> </html>
>>>>
>>>> (I used an XML input here, but otherwise this is the same like your
>>>> example
>>>> in the wiki.
>>>>
>>>> Does the below output make sense? I am sure that the uuid is wrong, but
>>>> I
>>>> don't know how to generate one.
>>>>
>>>>
>>>> [
>>>>
>>>> @prefix itsrdf: <http://www.w3.org/2005/11/****its/rdf#<http://www.w3.org/2005/11/**its/rdf#>
>>>> <http://www.w3.org/**2005/11/its/rdf#<http://www.w3.org/2005/11/its/rdf#>
>>>> >
>>>>
>>>>> .
>>>>>
>>>> @prefix str: <http://nlp2rdf.lod2.eu/****schema/string/<http://nlp2rdf.lod2.eu/**schema/string/>
>>>> <http://nlp2rdf.**lod2.eu/schema/string/<http://nlp2rdf.lod2.eu/schema/string/>
>>>> >
>>>>
>>>>> .
>>>>>
>>>> @prefix rdf: <http://www.w3.org/1999/02/22-****rdf-syntax-ns#<http://www.w3.org/1999/02/22-**rdf-syntax-ns#>
>>>> <http://www.**w3.org/1999/02/22-rdf-syntax-**ns#<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>>>> >
>>>>
>>>>> .
>>>>>
>>>> <http://example.com/****exampledoc.html#offset_0_50<http://example.com/**exampledoc.html#offset_0_50>
>>>> <ht**tp://example.com/exampledoc.**html#offset_0_50<http://example.com/exampledoc.html#offset_0_50>
>>>> >>
>>>> str:referenceContext
>>>> <urn:uuid:CEB9FD94-6779-4257-****B992-C853617CB791#0_50>;
>>>>          a <str:String>;
>>>>          itsrdf:translate "yes"^^<http://www.w3.org/TR/****<http://www.w3.org/TR/**>
>>>> its-2.0/its.xsd#yesOrNo <http://www.w3.org/TR/its-2.0/**its.xsd#yesOrNo<http://www.w3.org/TR/its-2.0/its.xsd#yesOrNo>
>>>> >>.
>>>> <http://example.com/****exampledoc.html#offset_14_44<http://example.com/**exampledoc.html#offset_14_44>
>>>> <h**ttp://example.com/exampledoc.**html#offset_14_44<http://example.com/exampledoc.html#offset_14_44>
>>>> >>
>>>> str:referenceContext
>>>> <urn:uuid:CEB9FD94-6779-4257-****B992-C853617CB791#14_44>;
>>>>          a <str:String>;
>>>>          itsrdf:translate "yes"^^<http://www.w3.org/TR/****<http://www.w3.org/TR/**>
>>>> its-2.0/its.xsd#yesOrNo <http://www.w3.org/TR/its-2.0/**its.xsd#yesOrNo<http://www.w3.org/TR/its-2.0/its.xsd#yesOrNo>
>>>> >>.
>>>> <http://example.com/****exampledoc.html#offset_25_31<http://example.com/**exampledoc.html#offset_25_31>
>>>> <h**ttp://example.com/exampledoc.**html#offset_25_31<http://example.com/exampledoc.html#offset_25_31>
>>>> >>
>>>> str:referenceContext
>>>> <urn:uuid:CEB9FD94-6779-4257-****B992-C853617CB791#25_31>;
>>>>          a <str:String>;
>>>>          itsrdf:translate "no"^^<http://www.w3.org/TR/**
>>>> its-2.0/its.xsd#yesOrNo <http://www.w3.org/TR/its-2.0/**its.xsd#yesOrNo<http://www.w3.org/TR/its-2.0/its.xsd#yesOrNo>
>>>> >>.
>>>> <http://example.com/****exampledoc.html#offset_25_32<http://example.com/**exampledoc.html#offset_25_32>
>>>> <h**ttp://example.com/exampledoc.**html#offset_25_32<http://example.com/exampledoc.html#offset_25_32>
>>>> >>
>>>> str:referenceContext
>>>> <urn:uuid:CEB9FD94-6779-4257-****B992-C853617CB791#25_32>;
>>>>          a <str:String>;
>>>>          itsrdf:translate "no"^^<http://www.w3.org/TR/**
>>>> its-2.0/its.xsd#yesOrNo <http://www.w3.org/TR/its-2.0/**its.xsd#yesOrNo<http://www.w3.org/TR/its-2.0/its.xsd#yesOrNo>
>>>> >>.
>>>> <http://example.com/****exampledoc.html#offset_5_49<http://example.com/**exampledoc.html#offset_5_49>
>>>> <ht**tp://example.com/exampledoc.**html#offset_5_49<http://example.com/exampledoc.html#offset_5_49>
>>>> >>
>>>> str:referenceContext
>>>> <urn:uuid:CEB9FD94-6779-4257-****B992-C853617CB791#5_49>;
>>>>          a <str:String>;
>>>>          itsrdf:translate "yes"^^<http://www.w3.org/TR/****<http://www.w3.org/TR/**>
>>>> its-2.0/its.xsd#yesOrNo <http://www.w3.org/TR/its-2.0/**its.xsd#yesOrNo<http://www.w3.org/TR/its-2.0/its.xsd#yesOrNo>
>>>> >>.
>>>> <urn:uuid:CEB9FD94-6779-4257-****B992-C853617CB791#0_50> str:isString
>>>>
>>>> "\r\n    \r\n        Welcome to Dublin in Ireland! \r\n    \r\n";
>>>>          str:occursIn <http://example.com/****exampledoc.html<http://example.com/**exampledoc.html>
>>>> <http://**example.com/exampledoc.html<http://example.com/exampledoc.html>
>>>> >
>>>>
>>>>> ;
>>>>>
>>>>          a <str:Context>.
>>>> <urn:uuid:CEB9FD94-6779-4257-****B992-C853617CB791#14_44> str:isString
>>>>
>>>> "Welcome to Dublin in Ireland! ";
>>>>          str:occursIn <http://example.com/****exampledoc.html<http://example.com/**exampledoc.html>
>>>> <http://**example.com/exampledoc.html<http://example.com/exampledoc.html>
>>>> >
>>>>
>>>>> ;
>>>>>
>>>>          a <str:Context>.
>>>> <urn:uuid:CEB9FD94-6779-4257-****B992-C853617CB791#25_31> str:isString
>>>> "Dublin";
>>>>          str:occursIn <http://example.com/****exampledoc.html<http://example.com/**exampledoc.html>
>>>> <http://**example.com/exampledoc.html<http://example.com/exampledoc.html>
>>>> >
>>>>
>>>>> ;
>>>>>
>>>>          a <str:Context>.
>>>> <urn:uuid:CEB9FD94-6779-4257-****B992-C853617CB791#25_32> str:isString
>>>> "Ireland";
>>>>          str:occursIn <http://example.com/****exampledoc.html<http://example.com/**exampledoc.html>
>>>> <http://**example.com/exampledoc.html<http://example.com/exampledoc.html>
>>>> >
>>>>
>>>>> ;
>>>>>
>>>>          a <str:Context>.
>>>> <urn:uuid:CEB9FD94-6779-4257-****B992-C853617CB791#5_49> str:isString
>>>>
>>>> "\r\n        Welcome to Dublin in Ireland! \r\n    ";
>>>>          str:occursIn <http://example.com/****exampledoc.html<http://example.com/**exampledoc.html>
>>>> <http://**example.com/exampledoc.html<http://example.com/exampledoc.html>
>>>> >
>>>>
>>>>  ;
>>>>>
>>>>          a <str:Context>.
>>>>
>>>> ]
>>>>
>>>> Thanks,
>>>>
>>>> Felix
>>>>
>>>> 2012/8/9 Sebastian Hellmann <hellmann@informatik.uni-**lei**pzig.de<http://leipzig.de>
>>>> <hellmann@informatik.**uni-leipzig.de<hellmann@informatik.uni-leipzig.de>
>>>> >
>>>>   Hi Jirka,
>>>>
>>>>> thanks, for your feedback. I thought it was a requirement that the DOM
>>>>> should not be touched. I really never had any whitespace problems in
>>>>> any
>>>>> RDF serialization formats, so this was new to me. By the way, I can
>>>>> understand now, what your problem with the bloated mapping is. We
>>>>> really
>>>>> don't need to serialize it. Actually it can be kept in memory, which is
>>>>> more efficient. I added serialization as optional. Also I made an XML
>>>>> version, because for transferring such kind of data, XML is much better
>>>>> suited. (Is the XML alright?)  I made all the changes you suggested,
>>>>> the
>>>>> new version is online here:
>>>>> http://wiki.nlp2rdf.org/index.******php?title=ITS2NIF2ITS&**oldid=**<http://wiki.nlp2rdf.org/index.****php?title=ITS2NIF2ITS&oldid=**>
>>>>> **622#Example<http://wiki.**nlp2rdf.org/index.**php?title=**
>>>>> ITS2NIF2ITS&oldid=**622#**Example<http://wiki.nlp2rdf.org/index.**php?title=ITS2NIF2ITS&oldid=**622#Example>
>>>>> >
>>>>> <http://wiki.**nlp2rdf.org/**index.php?title=**<http://nlp2rdf.org/index.php?title=**>
>>>>> ITS2NIF2ITS&oldid=622#Example<**http://wiki.nlp2rdf.org/index.**
>>>>> php?title=ITS2NIF2ITS&oldid=**622#Example<http://wiki.nlp2rdf.org/index.php?title=ITS2NIF2ITS&oldid=622#Example>
>>>>> >
>>>>>
>>>>>
>>>>> all the best,
>>>>> Sebastian
>>>>>
>>>>>
>>>>> Am 09.08.2012 11:59, schrieb Jirka Kosek:
>>>>>
>>>>>    On 9.8.2012 11:47, Sebastian Hellmann wrote:
>>>>>
>>>>>     you found an interesting point.
>>>>>>
>>>>>>  I wrote some notes on the optimization:
>>>>>>> http://wiki.nlp2rdf.org/wiki/******ITS2NIF2ITS#Notes_on_****
>>>>>>> optional_**<http://wiki.nlp2rdf.org/wiki/****ITS2NIF2ITS#Notes_on_**optional_**>
>>>>>>> <http://wiki.**nlp2rdf.org/wiki/****ITS2NIF2ITS#Notes_on_optional_**
>>>>>>> ** <http://wiki.nlp2rdf.org/wiki/**ITS2NIF2ITS#Notes_on_optional_**>
>>>>>>> >
>>>>>>> optimizations<http://wiki.**nl**p2rdf.org/wiki/ITS2NIF2ITS#**<http://nlp2rdf.org/wiki/ITS2NIF2ITS#**>
>>>>>>> Notes_on_optional_****optimizations<http://wiki.**
>>>>>>> nlp2rdf.org/wiki/ITS2NIF2ITS#**Notes_on_optional_**optimizations<http://wiki.nlp2rdf.org/wiki/ITS2NIF2ITS#Notes_on_optional_optimizations>
>>>>>>> >
>>>>>>> http://wiki.nlp2rdf.org/index.******php?title=ITS2NIF2ITS&**
>>>>>>> oldid=****<http://wiki.nlp2rdf.org/index.****php?title=ITS2NIF2ITS&oldid=****>
>>>>>>> <http://wiki.**nlp2rdf.org/index.**php?title=**ITS2NIF2ITS&oldid=**<http://wiki.nlp2rdf.org/index.**php?title=ITS2NIF2ITS&oldid=**>
>>>>>>> >
>>>>>>> 614#Notes_on_optional_******optimizations<http://wiki.**
>>>>>>> nlp2rdf.org/index.php?title=****ITS2NIF2ITS&oldid=614#Notes_**<http://nlp2rdf.org/index.php?title=**ITS2NIF2ITS&oldid=614#Notes_**>
>>>>>>> on_optional_optimizations<http**://wiki.nlp2rdf.org/index.php?**
>>>>>>> title=ITS2NIF2ITS&oldid=614#**Notes_on_optional_**optimizations<http://wiki.nlp2rdf.org/index.php?title=ITS2NIF2ITS&oldid=614#Notes_on_optional_optimizations>
>>>>>>> >
>>>>>>>
>>>>>>>
>>>>>>> I think, it  generally depends on the use case, whether you would
>>>>>>> optimize.  Do you think we should specify/limit what optimizations
>>>>>>> are
>>>>>>> possible?
>>>>>>> It might be easier to explain implications to help developers,
>>>>>>> but leave the implementation under-specified.
>>>>>>> Do you think I should remove them from the algorithm description and
>>>>>>> move them to a completely different section? Would this help the
>>>>>>> structure of the document?
>>>>>>>
>>>>>>>   I think that NIF mapping is so unnatural as is that optimization
>>>>>>> can
>>>>>>>
>>>>>> make it really messy. If the goal of optimization was to create less
>>>>>> complex RDF representation with not blank text nodes and trimmed text
>>>>>> nodes with a lot of whitespace I can think that easier and workable
>>>>>> approach would be to:
>>>>>>
>>>>>> - remove all whitespace optimization from mapping algorithm
>>>>>>
>>>>>> - saying that algorithm can produce a lot of "phantom" predicates from
>>>>>> excessive whitespace
>>>>>>
>>>>>> - recommending to normalize whitespace in the input XML/HTML/DOM in
>>>>>> order to minimize such phantom predicates
>>>>>>
>>>>>> This way each user/application can create custom whitespace
>>>>>> normalization based on nature of input data and we don't have to care
>>>>>> about it.
>>>>>>
>>>>>> For example for your sample document it is safe (knowing HTML
>>>>>> whitespace
>>>>>> handling rules) to normalize it to
>>>>>>
>>>>>> <html><body><h2 translate = "yes" >Welcome to <span
>>>>>> its-disambig-ident-ref = "http://dbpedia.org/resource/******Dublin<http://dbpedia.org/resource/****Dublin>
>>>>>> <http://dbpedia.org/**resource/**Dublin<http://dbpedia.org/resource/**Dublin>
>>>>>> >
>>>>>> <http://dbpedia.org/****resource/Dublin<http://dbpedia.org/**resource/Dublin>
>>>>>> <http://**dbpedia.org/resource/Dublin<http://dbpedia.org/resource/Dublin>
>>>>>> >
>>>>>>
>>>>>>> ”
>>>>>>>
>>>>>> translate
>>>>>> = "no">Dublin</span> in <b translate="no">Ireland</b>!</******
>>>>>>
>>>>>>
>>>>>> h2></body></html>
>>>>>>
>>>>>> (Actually one line with no excessive whitespace.)
>>>>>>
>>>>>> Does this sounds reasonable to my SemWeb-educated friends?
>>>>>>
>>>>>>                           Jirka
>>>>>>
>>>>>>
>>>>>>   --
>>>>>>
>>>>> Dipl. Inf. Sebastian Hellmann
>>>>> Department of Computer Science, University of Leipzig
>>>>> Events:
>>>>>     * http://sabre2012.infai.org/******mlode<http://sabre2012.infai.org/****mlode>
>>>>> <http://sabre2012.infai.**org/**mlode<http://sabre2012.infai.org/**mlode>
>>>>> ><
>>>>>
>>>>> http://sabre2012.infai.org/****mlode<http://sabre2012.infai.org/**mlode><
>>>>> http://sabre2012.infai.org/**mlode <http://sabre2012.infai.org/mlode>
>>>>> >>(Leipzig,
>>>>> Sept. 23-24-25, 2012)
>>>>>
>>>>>     * http://wole2012.eurecom.fr (*Deadline: July 31st 2012*)
>>>>> Projects: http://nlp2rdf.org , http://dbpedia.org
>>>>> Homepage: http://bis.informatik.uni-****le**ipzig.de/SebastianHellmann
>>>>> **<http://leipzig.de/**SebastianHellmann<http://leipzig.de/SebastianHellmann>
>>>>> >
>>>>> <htt**p://bis.informatik.uni-****leipzig.de/SebastianHellmann<**
>>>>> http://bis.informatik.uni-**leipzig.de/SebastianHellmann<http://bis.informatik.uni-leipzig.de/SebastianHellmann>
>>>>> >
>>>>> Research Group: http://aksw.org
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>  --
>>> Dipl. Inf. Sebastian Hellmann
>>> Department of Computer Science, University of Leipzig
>>> Events:
>>>    * http://sabre2012.infai.org/****mlode<http://sabre2012.infai.org/**mlode><
>>> http://sabre2012.infai.org/**mlode <http://sabre2012.infai.org/mlode>>(Leipzig,
>>> Sept. 23-24-25, 2012)
>>>    * http://wole2012.eurecom.fr (*Deadline: July 31st 2012*)
>>> Projects: http://nlp2rdf.org , http://dbpedia.org
>>> Homepage: http://bis.informatik.uni-**le**ipzig.de/SebastianHellmann<http://leipzig.de/SebastianHellmann>
>>> <htt**p://bis.informatik.uni-**leipzig.de/SebastianHellmann<http://bis.informatik.uni-leipzig.de/SebastianHellmann>
>>> >
>>> Research Group: http://aksw.org
>>>
>>>
>>>
>>
>
> --
> Dipl. Inf. Sebastian Hellmann
> Department of Computer Science, University of Leipzig
> Events:
>   * http://sabre2012.infai.org/**mlode <http://sabre2012.infai.org/mlode>(Leipzig, Sept. 23-24-25, 2012)
>   * http://wole2012.eurecom.fr (*Deadline: July 31st 2012*)
> Projects: http://nlp2rdf.org , http://dbpedia.org
> Homepage: http://bis.informatik.uni-**leipzig.de/SebastianHellmann<http://bis.informatik.uni-leipzig.de/SebastianHellmann>
> Research Group: http://aksw.org
>
>


-- 
Felix Sasaki
DFKI / W3C Fellow
Attachments

text/xml attachment: nodelist-nif.xml
Received on Wednesday, 15 August 2012 20:59:03 UTC