Re: [ISSUE-131] update to NIF mapping section in spec re comments from RDF WG from Felix Sasaki on 2013-09-06 (public-multilingualweb-lt@w3.org from September 2013)

From: Felix Sasaki <fsasaki@w3.org>
Date: Fri, 06 Sep 2013 11:46:16 +0200
To: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
CC: Dave Lewis <dave.lewis@cs.tcd.ie>, public-multilingualweb-lt@w3.org
Message-ID: <5229A468.70401@w3.org>
Hi Sebastian,

Am 06.09.13 11:39, schrieb Sebastian Hellmann:
> Ok, here is the updated example file: 
> https://dl.dropboxusercontent.com/u/375401/tmp/EX-nif-conversion-output.xml

Thanks a lot for this. About the problem: why not just percent escape 
"[" and "]"  ?

>
> There is a problem, however. [ and ] are not allowed in the query 
> component, so
> rdf:resource="http://example.com/myitsservice?informat=html&amp;intype=url&amp;input=http://example.com/doc.html&amp;xpath=/html/body[1]/h2[1]"
> violates https://tools.ietf.org/html/rfc3986#appendix-A
>
> pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
> query         = *( pchar / "/" / "?" )
> fragment      = *( pchar / "/" / "?" )
> pct-encoded   = "%" HEXDIG HEXDIG
> unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
> reserved      = gen-delims / sub-delims
> gen-delims    = ":" / "/" / "?" / "#" / "[" / "]" / "@"
> sub-delims    = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / 
> ";" / "="
>
> I am not an XPath expert. Could we do it like this?
> xpath=/html/body/1/h2/1

I would rather do the perecent escape approach.

>
>
> NIF 2.0 has become quite delayed. I am one of the main coordinators, 
> which makes the standardization process lightweight, but I am also a 
> bottleneck. One of the reason for the delay is my timely contribution 
> here via email and meetings. I hope this is an acceptable trade off. 
> The consequence is, that most of NIF 2.0 is not well documented, yet, 
> although it is getting better day by day.

Indeed. And I am sure that really soon you won't need to look into this 
anymore.

- Felix

>
> All the best,
> Sebastian
>
>
>
> Am 06.09.2013 11:05, schrieb Sebastian Hellmann:
>> Hi Dave,
>>
>> Am 05.09.2013 13:19, schrieb Dave Lewis:
>>> Following decision on the 4th December call to opt for a query style 
>>> URL for the NIF string in RDF (which will also be supported in NIF) 
>>> when defining the mapping the following need to be changed in the spec:
>>>
>>> 1) all occurrences of RDF URLs with #char or #xpath fragments to be 
>>> changed to a query style as suggested by the RDF group and expanded 
>>> on by Felix in 
>>> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Sep/0000.html 
>>>
>>> i.e. all fragment identifiers for NIF strings in annex F and G 
>>> should be changed from, e.g.:
>>>
>>> one other associated question is, as we are using the query type to get around the limitations ofrfc 5147
>>> char fragment in working with XML and HTML, is it still appropriate after the above change to type the NIF string in the example with
>>> the subclass nif:RFC5147String? Sebastien? e.g.
>>>
>>> http://example.com/myitsservice?input=http://example.com/exampldoc.html&  <http://example.com/myitsservice?input=http://example.com/exampldoc.html&char=0,29>char=0,11
>>>   rdf:type nif:RFC5147String;
>>
>> I am currently working on a formal ABNF definition for this, but you 
>> can consider it to be like this in Java:
>>
>> String prefix = 
>> "http://example.com/myitsservice?informat=html&intype=url&input=http://example.com/exampldoc.html&" 
>> ;
>> String identifier = "char=0,11" ;
>> String uri = prefix+identifier ;
>>
>> // only identifier has to have the syntax given by the rdf:type
>> validate ("nif:RFC5147String", identifier) ;
>>
>> So the syntax is only relevant for the identifier part. These would 
>> be alternative prefixes as well:
>> String prefixOption1 = 
>> "http://example.com/myitsservice/informat/html/intype/url/input/http://example.com/exampldoc.html/" 
>> ;
>> String  prefixOption2 = 
>> "http://example.com/myitsservice?informat=html&intype=url&input=http://example.com/exampldoc.html#" 
>> ;
>> String  prefixOption3 = 
>> "http://example.com/myitsservice?informat=html&intype=url&input=http://example.com/exampldoc.html&" 
>> ;
>>
>> It really doesn't matter and all three are valid RDF (The first one 
>> is a bit awkward, of course)
>>
>>
>>> 2) Once this is fixed we need to update the NIF part of the test suite and tests rerun by Felix, Leroy and Phil
>>
>> As written above, this is not strictly necessary, but it is nice to 
>> be consistent.
>>
>>> 3) Add the following suggested note wording to the end of Annex
>>> "Note: NIF allows URL for a String resource to be referenced as URIs that are fragments of the original document in the form:
>>> http://example.com/exampledoc.html#char=0,11
>>> or
>>> http://example.com/exampledoc.html#xpath(/html/body[1]/h2[1]/text()[1])
>>>
>>> Though this offers a potentially convenient mechanism for linking NIF resources in RDF back to the original document, the char
>>> fragment is defined currently only for text/plain while the xpath fragment is not defined for HTML. Therefore this URL
>>> recipe does fulfil the ITS requirements to support both XML and HTML and the aim of this mapping to produce resources adhering
>>> to the Linked Data principle of dereferenceablility. The future definition and registration of these fragment types, while a potentially
>>> attractive feature, is beyond the scope of this specification."
>>
>>
>> Let's change this like this:
>> http://example.com/doc.html#xpath(/html/body[1]/h2[1]/text()[1])
>> maps to
>> http://example.com/myitsservice?informat=html&intype=url&input=http://example.com/doc.html#char=0,11
>>
>> Note that RDF is ok, with all Fragment Ids:
>> http://www.w3.org/TR/rdf-concepts/#section-fragID
>>
>> RFC 3986 as well: http://tools.ietf.org/html/rfc3986#page-24
>>
>>
>>> The semantics of a fragment identifier are defined by the set of
>>>    representations that might result from a retrieval action on the
>>>    primary resource.  The fragment's format and resolution is therefore
>>>    dependent on the media type [RFC2046] of a potentially retrieved
>>>    representation, even though such a retrieval is only performed if the
>>>    URI is dereferenced.  If no such representation exists, then the
>>>    semantics of the fragment are considered unknown and are effectively
>>>    unconstrained.
>>
>>
>>
>> The text could be like this:
>> "Note: NIF allows URL for a String resource to be referenced as URIs 
>> that are fragments of the original document in the form:
>> http://example.com/myitsservice?informat=html&intype=url&input=http://example.com/doc.html#char=0,11
>> or
>> http://example.com/doc.html#xpath(/html/body[1]/h2[1]/text()[1])
>>
>> This offers a convenient mechanism for linking NIF resources in RDF 
>> back to the original document. RDF treats URIs as opaque and does not 
>> impose any semantic constraints on the used fragment identifiers, 
>> thus enabling their usage in RDF in a consistent manner. However, 
>> fragment identifiers get interpreted according to the retrieved mime 
>> type, if a retrieval action occurs as is the case in Linked Data. The 
>> char fragment is defined currently only for text/plain while the 
>> xpath fragment is not defined for HTML. Therefore this URL recipe 
>> does fulfil the ITS requirements to support both XML and HTML and the 
>> aim of this mapping to produce resources adhering to the Linked Data 
>> principle of dereferenceablility. The future definition and 
>> registration of these fragment types, while a potentially  attractive 
>> feature, is beyond the scope of this specification."
>>
>> I will try to update the example in the spec as well.
>>
>> All the best,
>> Sebastian
>>
>>> cheers,
>>> Dave
>>>
>>>   
>>>
>>>   <http://example.com/myitsservice?input=http://example.com/exampldoc.html&char=0,29>
>>>
>>>
>>>
>>>
>>> On 03/09/2013 09:14, Felix Sasaki wrote:
>>>> 1) last call item "RDF - NIF conversion". See
>>>> https://www.w3.org/International/multilingualweb/lt/track/issues/131
>>>> and these mails
>>>> Phil 
>>>> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Sep/0001.html 
>>>>
>>>> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Aug/0066.html 
>>>>
>>>>
>>>> Dave
>>>> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Aug/0067.html 
>>>>
>>>>
>>>> Felix
>>>> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Aug/0068.html 
>>>>
>>>> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Sep/0000.html 
>>>>
>>>>
>>>> Goal: decide about the option 1) or 2) or something else (see a 
>>>> variation of option 2) in 
>>>> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Sep/0000.html 
>>>>
>>>> IMPORTANT: even if you are not implementing ITS <> NIF, please 
>>>> state your opinion since tomorrow want want to form a working group 
>>>> opinion, to be able to move forward.
>>>
>>
>>
>> -- 
>> Dipl. Inf. Sebastian Hellmann
>> Department of Computer Science, University of Leipzig
>> Events:
>> * NLP & DBpedia 2013 (http://nlp-dbpedia2013.blogs.aksw.org, Extended 
>> Deadline: *July 18th*)
>> * LSWT 23/24 Sept, 2013 in Leipzig (http://aksw.org/lswt)
>> Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf
>> Projects: http://nlp2rdf.org , http://linguistics.okfn.org , 
>> http://dbpedia.org/Wiktionary , http://dbpedia.org
>> Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
>> Research Group: http://aksw.org
>
>
> -- 
> Dipl. Inf. Sebastian Hellmann
> Department of Computer Science, University of Leipzig
> Events:
> * NLP & DBpedia 2013 (http://nlp-dbpedia2013.blogs.aksw.org, Extended 
> Deadline: *July 18th*)
> * LSWT 23/24 Sept, 2013 in Leipzig (http://aksw.org/lswt)
> Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf
> Projects: http://nlp2rdf.org , http://linguistics.okfn.org , 
> http://dbpedia.org/Wiktionary , http://dbpedia.org
> Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
> Research Group: http://aksw.org
Received on Friday, 6 September 2013 09:46:50 UTC