- From: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
- Date: Thu, 09 Aug 2012 15:32:02 +0200
- To: Felix Sasaki <fsasaki@w3.org>
- CC: Jirka Kosek <jirka@kosek.cz>, public-multilingualweb-lt@w3.org
- Message-ID: <5023BBD2.6030207@informatik.uni-leipzig.de>
HI Felix, there are some syntactic errors: <str:String> . Maybe this helps: curl -X POST --data-urlencode input="Apache Stanbol can detect entities." --data input-type=text --data format=turtle http://nlp2rdf.lod2.eu/demo/NIFStanfordCore curl -X POST --data-urlencode input="Apache Stanbol can detect entities." --data input-type=text --data format=turtle --data-urlencode prefix="http://example.com/exampledoc.html#" http://nlp2rdf.lod2.eu/demo/NIFStanfordCore curl -X POST --data-urlencode input="Apache Stanbol can detect entities." --data input-type=text --data format=turtle --data-urlencode prefix="urn:uuid:CEB9FD94-6779-4257-B992-C853617CB791#" http://nlp2rdf.lod2.eu/demo/NIFStanfordCore I also attached the output. It is the Stanford Pos tagger NIF 2.0 draft wrapper. (Errata: Context uses anchorOf instead of isString) Normally, the prefix parameter is variable and set as config option. Please don't worry about UUIDs . NIF and ITS in NIF don't need them. The reason, why I included them, was that I am writing a converter for Apache Stanbol to NIF and ITS and Stanbol uses UUIDs. I removed them from the wiki page. So here are some corrections: <urn:uuid:CEB9FD94-6779-4257-B992-C853617CB791#0_50> str:isString "\r\n \r\n Welcome to Dublin in Ireland! \r\n \r\n"; str:occursIn <http://example.com/exampledoc.html>; a <str:Context>. Should be: <http://example.com/exampledoc.html#0_54> str:isString "\r\n \r\n Welcome to Dublin in Ireland! \r\n \r\n"; str:occursIn <http://example.com/exampledoc.html>; a str:Context . Character length of 54 is correct as this is based on Unicode Normal Form C, counted in Code Units: http://unicode.org/faq/char_combmark.html#7 ************************** <urn:uuid:CEB9FD94-6779-4257-B992-C853617CB791#25_31> str:isString "Dublin"; str:occursIn <http://example.com/exampledoc.html>; a <str:Context>. <urn:uuid:CEB9FD94-6779-4257-B992-C853617CB791#25_32> str:isString "Ireland"; str:occursIn <http://example.com/exampledoc.html>; a <str:Context>. Should be: <http://example.com/exampledoc.html#31_37> str:anchorOf "Dublin"; str:occursIn <http://example.com/exampledoc.html>; a str:Context. <http://example.com/exampledoc.html#41_48> str:anchorOf "Ireland"; str:occursIn <http://example.com/exampledoc.html>; a str:Context. The counts seem to be wrong. Other than that it looks already quite close. All the best, Sebastian Am 09.08.2012 13:30, schrieb Felix Sasaki: > Hi Sebastian, all, > > I tried to create the NIF output (since we need two implementations) for > > <html xmlns:its="http://www.w3.org/2005/11/its"> > <body> > <h2 its:translate="yes">Welcome to <span its:translate="no" > >Dublin</span> in <b its:translate="no">Ireland</b>! </h2> > </body> > </html> > > (I used an XML input here, but otherwise this is the same like your example > in the wiki. > > Does the below output make sense? I am sure that the uuid is wrong, but I > don't know how to generate one. > > > [ > > @prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#>. > @prefix str: <http://nlp2rdf.lod2.eu/schema/string/>. > @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>. > <http://example.com/exampledoc.html#offset_0_50> str:referenceContext > <urn:uuid:CEB9FD94-6779-4257-B992-C853617CB791#0_50>; > a <str:String>; > itsrdf:translate "yes"^^<http://www.w3.org/TR/its-2.0/its.xsd#yesOrNo>. > <http://example.com/exampledoc.html#offset_14_44> str:referenceContext > <urn:uuid:CEB9FD94-6779-4257-B992-C853617CB791#14_44>; > a <str:String>; > itsrdf:translate "yes"^^<http://www.w3.org/TR/its-2.0/its.xsd#yesOrNo>. > <http://example.com/exampledoc.html#offset_25_31> str:referenceContext > <urn:uuid:CEB9FD94-6779-4257-B992-C853617CB791#25_31>; > a <str:String>; > itsrdf:translate "no"^^<http://www.w3.org/TR/its-2.0/its.xsd#yesOrNo>. > <http://example.com/exampledoc.html#offset_25_32> str:referenceContext > <urn:uuid:CEB9FD94-6779-4257-B992-C853617CB791#25_32>; > a <str:String>; > itsrdf:translate "no"^^<http://www.w3.org/TR/its-2.0/its.xsd#yesOrNo>. > <http://example.com/exampledoc.html#offset_5_49> str:referenceContext > <urn:uuid:CEB9FD94-6779-4257-B992-C853617CB791#5_49>; > a <str:String>; > itsrdf:translate "yes"^^<http://www.w3.org/TR/its-2.0/its.xsd#yesOrNo>. > <urn:uuid:CEB9FD94-6779-4257-B992-C853617CB791#0_50> str:isString > "\r\n \r\n Welcome to Dublin in Ireland! \r\n \r\n"; > str:occursIn <http://example.com/exampledoc.html>; > a <str:Context>. > <urn:uuid:CEB9FD94-6779-4257-B992-C853617CB791#14_44> str:isString > "Welcome to Dublin in Ireland! "; > str:occursIn <http://example.com/exampledoc.html>; > a <str:Context>. > <urn:uuid:CEB9FD94-6779-4257-B992-C853617CB791#25_31> str:isString "Dublin"; > str:occursIn <http://example.com/exampledoc.html>; > a <str:Context>. > <urn:uuid:CEB9FD94-6779-4257-B992-C853617CB791#25_32> str:isString "Ireland"; > str:occursIn <http://example.com/exampledoc.html>; > a <str:Context>. > <urn:uuid:CEB9FD94-6779-4257-B992-C853617CB791#5_49> str:isString > "\r\n Welcome to Dublin in Ireland! \r\n "; > str:occursIn <http://example.com/exampledoc.html>; > a <str:Context>. > > ] > > Thanks, > > Felix > > 2012/8/9 Sebastian Hellmann <hellmann@informatik.uni-leipzig.de> > >> Hi Jirka, >> thanks, for your feedback. I thought it was a requirement that the DOM >> should not be touched. I really never had any whitespace problems in any >> RDF serialization formats, so this was new to me. By the way, I can >> understand now, what your problem with the bloated mapping is. We really >> don't need to serialize it. Actually it can be kept in memory, which is >> more efficient. I added serialization as optional. Also I made an XML >> version, because for transferring such kind of data, XML is much better >> suited. (Is the XML alright?) I made all the changes you suggested, the >> new version is online here: >> http://wiki.nlp2rdf.org/index.**php?title=ITS2NIF2ITS&oldid=**622#Example<http://wiki.nlp2rdf.org/index.php?title=ITS2NIF2ITS&oldid=622#Example> >> >> all the best, >> Sebastian >> >> >> Am 09.08.2012 11:59, schrieb Jirka Kosek: >> >> On 9.8.2012 11:47, Sebastian Hellmann wrote: >>> you found an interesting point. >>>> I wrote some notes on the optimization: >>>> http://wiki.nlp2rdf.org/wiki/**ITS2NIF2ITS#Notes_on_optional_** >>>> optimizations<http://wiki.nlp2rdf.org/wiki/ITS2NIF2ITS#Notes_on_optional_optimizations> >>>> http://wiki.nlp2rdf.org/index.**php?title=ITS2NIF2ITS&oldid=** >>>> 614#Notes_on_optional_**optimizations<http://wiki.nlp2rdf.org/index.php?title=ITS2NIF2ITS&oldid=614#Notes_on_optional_optimizations> >>>> >>>> I think, it generally depends on the use case, whether you would >>>> optimize. Do you think we should specify/limit what optimizations are >>>> possible? >>>> It might be easier to explain implications to help developers, >>>> but leave the implementation under-specified. >>>> Do you think I should remove them from the algorithm description and >>>> move them to a completely different section? Would this help the >>>> structure of the document? >>>> >>> I think that NIF mapping is so unnatural as is that optimization can >>> make it really messy. If the goal of optimization was to create less >>> complex RDF representation with not blank text nodes and trimmed text >>> nodes with a lot of whitespace I can think that easier and workable >>> approach would be to: >>> >>> - remove all whitespace optimization from mapping algorithm >>> >>> - saying that algorithm can produce a lot of "phantom" predicates from >>> excessive whitespace >>> >>> - recommending to normalize whitespace in the input XML/HTML/DOM in >>> order to minimize such phantom predicates >>> >>> This way each user/application can create custom whitespace >>> normalization based on nature of input data and we don't have to care >>> about it. >>> >>> For example for your sample document it is safe (knowing HTML whitespace >>> handling rules) to normalize it to >>> >>> <html><body><h2 translate = "yes" >Welcome to <span >>> its-disambig-ident-ref = "http://dbpedia.org/resource/**Dublin<http://dbpedia.org/resource/Dublin>” >>> translate >>> = "no">Dublin</span> in <b translate="no">Ireland</b>!</** >>> h2></body></html> >>> >>> (Actually one line with no excessive whitespace.) >>> >>> Does this sounds reasonable to my SemWeb-educated friends? >>> >>> Jirka >>> >>> >> -- >> Dipl. Inf. Sebastian Hellmann >> Department of Computer Science, University of Leipzig >> Events: >> * http://sabre2012.infai.org/**mlode <http://sabre2012.infai.org/mlode>(Leipzig, Sept. 23-24-25, 2012) >> * http://wole2012.eurecom.fr (*Deadline: July 31st 2012*) >> Projects: http://nlp2rdf.org , http://dbpedia.org >> Homepage: http://bis.informatik.uni-**leipzig.de/SebastianHellmann<http://bis.informatik.uni-leipzig.de/SebastianHellmann> >> Research Group: http://aksw.org >> >> >> > -- Dipl. Inf. Sebastian Hellmann Department of Computer Science, University of Leipzig Events: * http://sabre2012.infai.org/mlode (Leipzig, Sept. 23-24-25, 2012) * http://wole2012.eurecom.fr (*Deadline: July 31st 2012*) Projects: http://nlp2rdf.org , http://dbpedia.org Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann Research Group: http://aksw.org
Attachments
- text/plain attachment: stanford.example.ttl
- text/plain attachment: stanford.noprefix..ttl
- text/plain attachment: stanford.urn.ttl
Received on Thursday, 9 August 2012 13:32:30 UTC