NIF and annotation and nlp services - Re: Questions about TBX to RDF handling from Felix Sasaki on 2015-10-27 (public-bpmlod@w3.org from October 2015)

From: Felix Sasaki <fsasaki@w3.org>
Date: Tue, 27 Oct 2015 22:20:35 +0900
To: Peter Svanberg <Peter.Svanberg@tnc.se>
Cc: "public-bpmlod@w3.org" <public-bpmlod@w3.org>
Message-Id: <19B19AC2-1B06-4974-AF7D-D7105178A432@w3.org>
Hi Peter,

> Am 27.10.2015 um 20:25 schrieb Peter Svanberg <Peter.Svanberg@tnc.se>:
> 
>> 27 okt. 2015 kl. 09:59 skrev Felix Sasaki <fsasaki@w3.org <mailto:fsasaki@w3.org>>:
>> 
>>> So, a cross reference in text like this is never made into triplets? You require the “consuming end” to parse the XML?
>> 
>> I would rather propose to generate triples and not use XMLLiteral for the items that you want to represent explicitly. 
>> 
> 
> 
> I want to represent text string where some parts of it should be cross references to other resources.
> 
> How do you do that, if you don’t use rdf:Seq? Use rdf:List? Or use NIF (that I just learned about in the mail on the list that just arrived) combined with something that points out that some of the words is a cross reference? My lack of RDF experience makes me lost – point me on techniques or examples, please.


Yes, NIF would be an option. I am not sure how to work with NIF in the TBX context. Here is an example of how it is working for HTML, see this input document

<!doctype html>
<html lang="en">
<head>
  <meta charset="utf-8">
  <title>An HTML example document</title>
</head>
<body>
<p>I like <strong>Marlene Dietrich</strong>.</p>
</body>
</html>

And this RDF, represented as json-ld syntax:

{
  "@graph": [
    {
      "@id": "http://freme-project.eu/#char=0,49",
      "@type": [
        "nif:String",
        "nif:Context",
        "nif:RFC5147String"
      ],
      "nif:beginIndex": {
        "@type": "xsd:int",
        "@value": "0"
      },
      "nif:endIndex": {
        "@type": "xsd:int",
        "@value": "49"
      },
      "isString": "An HTML example document I like Marlene Dietrich."
    },
    {
      "@id": "http://freme-project.eu/#char=32,48",
      "@type": [
        "nif:Phrase",
        "nif:Word",
        "nif:String",
        "nif:RFC5147String"
      ],
      "anchorOf": "Marlene Dietrich",
      "nif:beginIndex": {
        "@type": "xsd:int",
        "@value": "32"
      },
      "nif:endIndex": {
        "@type": "xsd:int",
        "@value": "48"
      },
      "referenceContext": "http://freme-project.eu/#char=0,49",
      "taClassRef": "http://nerd.eurecom.fr/ontology#Person",
      "itsrdf:taConfidence": 0.9919178643771336,
      "taIdentRef": "http://dbpedia.org/resource/Marlene_Dietrich"
    }
  ],
  "@context": {
    "ReferenceContext": "http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#ReferenceContext",
    "anchorOf": "http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#anchorOf",
    "beginIndex": {
      "@id": "http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#beginIndex",
      "@type": "http://www.w3.org/2001/XMLSchema#nonNegativeInteger"
    },
    "endIndex": {
      "@id": "http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#endIndex",
      "@type": "http://www.w3.org/2001/XMLSchema#nonNegativeInteger"
    },
    "identifier": "http://purl.org/dc/elements/1.1/identifier",
    "taIdentRef": {
      "@id": "http://www.w3.org/2005/11/its/rdf#taIdentRef",
      "@type": "@id"
    },
    "referenceContext": {
      "@id": "http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#referenceContext",
      "@type": "@id"
    },
    "taClassRef": {
      "@id": "http://www.w3.org/2005/11/its/rdf#taClassRef",
      "@type": "@id"
    },
    "taConfidence": {
      "@id": "http://www.w3.org/2005/11/its/rdf#taConfidence",
      "@type": "http://www.w3.org/2001/XMLSchema#double"
    },
    "isString": "http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#isString",
    "nif": "http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#",
    "itsrdf": "http://www.w3.org/2005/11/its/rdf#",
    "xsd": "http://www.w3.org/2001/XMLSchema#"
  }
}

This URI
      "@id": "http://freme-project.eu/#char=32,48",

allows you to identify „Marlene Dietrich“. The example is generated via the FREME API, see
http://api.freme-project.eu/doc/0.3/ <http://api.freme-project.eu/doc/0.3/>
using named entity recognition - see the related thread from Tatjana in response to Philipp here
https://lists.w3.org/Archives/Public/public-bpmlod/2015Oct/0028.html <https://lists.w3.org/Archives/Public/public-bpmlod/2015Oct/0028.html>
 
But you could do also the above NIF representation for identifying the substring, without deploying NLP.

In the ITS 2 spec we described an algorithm to convert from markup to NIF
http://www.w3.org/TR/its20/#conversion-to-nif <http://www.w3.org/TR/its20/#conversion-to-nif>
and back
http://www.w3.org/TR/its20/#nif-backconversion <http://www.w3.org/TR/its20/#nif-backconversion>
(see also examples in above section)
In the upcoming 0.4 version of the FREME API we will also support such roundtripping, for HTML and the XLIFF format. It may make sense for TBX as well. But before doing that probably the TBX>RDF conversion experts (on this list) should say how NIF should be used to respond to your requirement in the TBX context. 

Best,

Felix

> 
> /Peter Svanberg
>
Received on Tuesday, 27 October 2015 13:20:58 UTC