Re: NIF testing (Re: NIF documentation and review of the last draft regarding NIF) from Sebastian Hellmann on 2013-05-26 (public-multilingualweb-lt@w3.org from May 2013)

From: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
Date: Sun, 26 May 2013 20:27:34 +0200
To: Felix Sasaki <fsasaki@w3.org>
CC: "public-multilingualweb-lt@w3.org" <public-multilingualweb-lt@w3.org>, Phil Ritchie <philr@vistatec.ie>, Leroy Finn <finnle@tcd.ie>
Message-ID: <51A25416.5080906@informatik.uni-leipzig.de>
Hi Felix,
a) nif:anchorOf will be optional (MAY) in the future, as it is 
redundant, nif:beginIndex and endIndex are preferred (SHOULD) .
Both are not really required, the rationale for the latter is rather 
that they are more useful in the processing chain. anchorOf can be very 
well calculated with most programing languages:
String c = "abc".substring(2,3);
Actually, we should consider including begin and endIndex in the example 
as well, although they might get bloated.

b)  nif:beginIndex "11"
      nif:endIndex "17"
are intended to provide an explicit representation of #char=11,17
This is good for querying (e.g. SPARQL FILTER <>= ) and processing
The #char= is taken from RFC 5147 which regulates exactly how they 
should be counted: http://tools.ietf.org/html/rfc5147#section-2.2.1

beginIndex and endIndex have two open issues:

nif:beginIndex
     a owl:DatatypeProperty ;
     vs:term_status "testing" ;
     rdfs:label "begin index"@en ;
     rdfs:comment """The begin index of a character range as defined in http://tools.ietf.org/html/rfc5147#section-2.2.1 and http://tools.ietf.org/html/rfc5147#section-2.2.2, measured as the gap between two characters, starting to count from 0 (the position before the first character of a text).
     Example: Index "2" is the postion between "Mr" and "."  in "Mr. Sandman".
     Note: RFC 5147 is re-used for the definition of character ranges. RFC 5147 is assuming a text/plain MIME type. NIF builds upon Unicode and is content agnostic.
     Requirement (1): This property has the same value the "Character position" of RFC 5147 and it must therefore be an xsd:nonNegativeInteger .
     Requirement (2): The index of the subject string MUST be calculated relative to the nif:referenceContext of the subject. If available, this is the rdf:Literal of the nif:isString property.""" ;
     # still being discussed:
     # rdfs:subPropertyOf oa:start ;
     rdfs:range <http://www.w3.org/2001/XMLSchema#nonNegativeInteger> ;
     rdfs:domain nif:String .


Issue b1)
rdfs:subPropertyOf oa:start ;
The definition of Open Annotation is pretty weak at the moment. I am 
working together with them to clarify this[1].
You can safely ignore this, as we are focusing on RFC 5147. We will 
extend OA to match RFC 5147.

[1] 
http://lists.w3.org/Archives/Public/public-openannotation/2013May/0038.html

Issue b2)
can best be answered by you, I guess:

rdfs:range <http://www.w3.org/2001/XMLSchema#nonNegativeInteger> ;

We were also considering xsd:int or xsd:long or having no range at all.
nonNegativeInteger is infinite, but based on decimal .
For memory consumption xsd:int would be best, but this would limit it to 
2GB text files.
I am lacking experience how well implementations optimize on this or if 
it is just used for validation.


c) nif:convertedFrom is now nif:wasConvertedFrom
-> easier to understand
-> matches prov:wasDerivedFrom
-> "Current state"  wasConvertedFrom "former state"
Correct in the examples


d) The prefix can be removed: @prefix its: <http://www.w3.org/2005/11/its> .

e) blank nodes are not my favorite, but they are unavoidable in this 
scenario.
Ideally you can use a more elegant notation for writing them in turtle:

<http://example.com/exampledoc.html#char=114,127>
     nif:anchorOf "tranport inc." ;
     nif:convertedFrom 
<http://example.com/exampledoc.html#xpath(/doc/para%5B1%5D/span%5B2%5D)> ;
     nif:referenceContext <http://example.com/exampledoc.html#char=0,180> ;
     a nif:RFC5147String ;
     itsrdf:hasLocQualityIssue [
         a itsrdf:LocQualityIssue ;
         itsrdf:locQualityIssueComment "should be 'transport include'" ;
         itsrdf:locQualityIssueProfileRef <http://example.org/qaMovel/v1> ;
         itsrdf:locQualityIssueSeverity "75"
     ] .


All the best,
Sebastian


Am 26.05.2013 18:40, schrieb Felix Sasaki:
> Hi Sebastian, all, puttin Phil and Leroy into CC since they are 
> interested in the NIF testing topic,
>
> thank you for looking into the NIF section. This is now
> https://www.w3.org/International/multilingualweb/lt/track/issues/125
> Most of the issues look pretty clear. All: I will implement them in 
> the spec on Tuesday if there are no further comments.
>
> One question, though: are the properties
> nif:beginIndex
> and
> nif:endIndex
> stable and does it express the same like "#char" in URIs? That is,
>      nif:beginIndex "11"
>      nif:endIndex "17"
> is equal to #char=11,17
> ?
>
> FYI, I added input and output files for testing the NIF conversion, see
> https://github.com/finnle/ITS-2.0-Testsuite/tree/master/its2.0/nif-conversion
> At Leroy and Phil: since Phil asked for localization quality issue 
> (and XML) as test files and there was no other request, the input 
> files are all LQI. That also leads to blank nodes in the output, since 
> LQI in itsrdf requires these. Sebastian, the order of 
> "nif:convertedFrom" should be correct in the output
> https://github.com/finnle/ITS-2.0-Testsuite/tree/master/its2.0/nif-conversion/expected
> Let me know if s.t. is wrong.
>
> Best,
>
> Felix
>
> Am 26.05.13 15:12, schrieb Sebastian Hellmann:
>> Dear all,
>> We have recently produced a PDF document which gives a pretty good 
>> overview of NIF:
>> http://svn.aksw.org/papers/2013/ISWC_NIF/public.pdf
>>
>> Furthermore, I have read the ITS 2.0  draft very closely once more 
>> and brushed up everything regarding NIF. There aren't any significant 
>> changes. Let's say we are going for "extra credit" ;)
>> Please find a list of issues here:
>> https://docs.google.com/document/d/1VagqM-Ty69mPYh0wHfkNTVUndOO34ub5cO4X9Eo572Q/edit#
>>
>> I will have a look at the other sections soon.
>>
>> All the best,
>> Sebastian
>>
>>
>> -- 
>> Dipl. Inf. Sebastian Hellmann
>> Department of Computer Science, University of Leipzig
>> Events: NLP & DBpedia 2013 (http://nlp-dbpedia2013.blogs.aksw.org, 
>> Deadline: *July 8th*)
>> Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf
>> Projects: http://nlp2rdf.org , http://linguistics.okfn.org , 
>> http://dbpedia.org/Wiktionary , http://dbpedia.org
>> Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
>> Research Group: http://aksw.org
>


-- 
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Events: NLP & DBpedia 2013 (http://nlp-dbpedia2013.blogs.aksw.org, 
Deadline: *July 8th*)
Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf
Projects: http://nlp2rdf.org , http://linguistics.okfn.org , 
http://dbpedia.org/Wiktionary , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org
Received on Sunday, 26 May 2013 18:28:08 UTC