W3C home > Mailing lists > Public > public-multilingualweb-lt@w3.org > August 2012

Re: [ISSUE-29][ACTION-164] ITS2NIF2ITS - RDF roundtrip

From: Felix Sasaki <fsasaki@w3.org>
Date: Thu, 9 Aug 2012 13:30:40 +0200
Message-ID: <CAL58czrfu55tEA-1_Rv+1=B2FG3KToSOxw1DJ_Eqrz+V7+-OUw@mail.gmail.com>
To: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
Cc: Jirka Kosek <jirka@kosek.cz>, public-multilingualweb-lt@w3.org
Hi Sebastian, all,

I tried to create the NIF output (since we need two implementations) for

<html xmlns:its="http://www.w3.org/2005/11/its">
        <h2 its:translate="yes">Welcome to <span its:translate="no"
                >Dublin</span> in <b its:translate="no">Ireland</b>! </h2>

(I used an XML input here, but otherwise this is the same like your example
in the wiki.

Does the below output make sense? I am sure that the uuid is wrong, but I
don't know how to generate one.


@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#>.
@prefix str: <http://nlp2rdf.lod2.eu/schema/string/>.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
<http://example.com/exampledoc.html#offset_0_50> str:referenceContext
	a <str:String>;
	itsrdf:translate "yes"^^<http://www.w3.org/TR/its-2.0/its.xsd#yesOrNo>.
<http://example.com/exampledoc.html#offset_14_44> str:referenceContext
	a <str:String>;
	itsrdf:translate "yes"^^<http://www.w3.org/TR/its-2.0/its.xsd#yesOrNo>.
<http://example.com/exampledoc.html#offset_25_31> str:referenceContext
	a <str:String>;
	itsrdf:translate "no"^^<http://www.w3.org/TR/its-2.0/its.xsd#yesOrNo>.
<http://example.com/exampledoc.html#offset_25_32> str:referenceContext
	a <str:String>;
	itsrdf:translate "no"^^<http://www.w3.org/TR/its-2.0/its.xsd#yesOrNo>.
<http://example.com/exampledoc.html#offset_5_49> str:referenceContext
	a <str:String>;
	itsrdf:translate "yes"^^<http://www.w3.org/TR/its-2.0/its.xsd#yesOrNo>.
<urn:uuid:CEB9FD94-6779-4257-B992-C853617CB791#0_50> str:isString
"\r\n    \r\n        Welcome to Dublin in Ireland! \r\n    \r\n";
	str:occursIn <http://example.com/exampledoc.html>;
	a <str:Context>.
<urn:uuid:CEB9FD94-6779-4257-B992-C853617CB791#14_44> str:isString
"Welcome to Dublin in Ireland! ";
	str:occursIn <http://example.com/exampledoc.html>;
	a <str:Context>.
<urn:uuid:CEB9FD94-6779-4257-B992-C853617CB791#25_31> str:isString "Dublin";
	str:occursIn <http://example.com/exampledoc.html>;
	a <str:Context>.
<urn:uuid:CEB9FD94-6779-4257-B992-C853617CB791#25_32> str:isString "Ireland";
	str:occursIn <http://example.com/exampledoc.html>;
	a <str:Context>.
<urn:uuid:CEB9FD94-6779-4257-B992-C853617CB791#5_49> str:isString
"\r\n        Welcome to Dublin in Ireland! \r\n    ";
	str:occursIn <http://example.com/exampledoc.html>;
	a <str:Context>.




2012/8/9 Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>

> Hi Jirka,
> thanks, for your feedback. I thought it was a requirement that the DOM
> should not be touched. I really never had any whitespace problems in any
> RDF serialization formats, so this was new to me. By the way, I can
> understand now, what your problem with the bloated mapping is. We really
> don't need to serialize it. Actually it can be kept in memory, which is
> more efficient. I added serialization as optional. Also I made an XML
> version, because for transferring such kind of data, XML is much better
> suited. (Is the XML alright?)  I made all the changes you suggested, the
> new version is online here:
> http://wiki.nlp2rdf.org/index.**php?title=ITS2NIF2ITS&oldid=**622#Example<http://wiki.nlp2rdf.org/index.php?title=ITS2NIF2ITS&oldid=622#Example>
> all the best,
> Sebastian
> Am 09.08.2012 11:59, schrieb Jirka Kosek:
>  On 9.8.2012 11:47, Sebastian Hellmann wrote:
>>  you found an interesting point.
>>> I wrote some notes on the optimization:
>>> http://wiki.nlp2rdf.org/wiki/**ITS2NIF2ITS#Notes_on_optional_**
>>> optimizations<http://wiki.nlp2rdf.org/wiki/ITS2NIF2ITS#Notes_on_optional_optimizations>
>>> http://wiki.nlp2rdf.org/index.**php?title=ITS2NIF2ITS&oldid=**
>>> 614#Notes_on_optional_**optimizations<http://wiki.nlp2rdf.org/index.php?title=ITS2NIF2ITS&oldid=614#Notes_on_optional_optimizations>
>>> I think, it  generally depends on the use case, whether you would
>>> optimize.  Do you think we should specify/limit what optimizations are
>>> possible?
>>> It might be easier to explain implications to help developers,
>>> but leave the implementation under-specified.
>>> Do you think I should remove them from the algorithm description and
>>> move them to a completely different section? Would this help the
>>> structure of the document?
>> I think that NIF mapping is so unnatural as is that optimization can
>> make it really messy. If the goal of optimization was to create less
>> complex RDF representation with not blank text nodes and trimmed text
>> nodes with a lot of whitespace I can think that easier and workable
>> approach would be to:
>> - remove all whitespace optimization from mapping algorithm
>> - saying that algorithm can produce a lot of "phantom" predicates from
>> excessive whitespace
>> - recommending to normalize whitespace in the input XML/HTML/DOM in
>> order to minimize such phantom predicates
>> This way each user/application can create custom whitespace
>> normalization based on nature of input data and we don't have to care
>> about it.
>> For example for your sample document it is safe (knowing HTML whitespace
>> handling rules) to normalize it to
>> <html><body><h2 translate = "yes" >Welcome to <span
>> its-disambig-ident-ref = "http://dbpedia.org/resource/**Dublin<http://dbpedia.org/resource/Dublin>
>> translate
>> = "no">Dublin</span> in <b translate="no">Ireland</b>!</**
>> h2></body></html>
>> (Actually one line with no excessive whitespace.)
>> Does this sounds reasonable to my SemWeb-educated friends?
>>                         Jirka
> --
> Dipl. Inf. Sebastian Hellmann
> Department of Computer Science, University of Leipzig
> Events:
>   * http://sabre2012.infai.org/**mlode <http://sabre2012.infai.org/mlode>(Leipzig, Sept. 23-24-25, 2012)
>   * http://wole2012.eurecom.fr (*Deadline: July 31st 2012*)
> Projects: http://nlp2rdf.org , http://dbpedia.org
> Homepage: http://bis.informatik.uni-**leipzig.de/SebastianHellmann<http://bis.informatik.uni-leipzig.de/SebastianHellmann>
> Research Group: http://aksw.org

Felix Sasaki
DFKI / W3C Fellow
Received on Thursday, 9 August 2012 11:31:18 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:31:50 UTC