Re: Fw: Re: Request for review from the RDF working group: ITS 2.0 from Felix Sasaki on 2013-09-03 (public-multilingualweb-lt@w3.org from September 2013)

From: Felix Sasaki <fsasaki@w3.org>
Date: Tue, 03 Sep 2013 10:03:32 +0200
To: Phil Ritchie <philr@vistatec.ie>
CC: "public-multilingualweb-lt@w3.org" <public-multilingualweb-lt@w3.org>, public-rdf-wg@w3.org
Message-ID: <522597D4.6060904@w3.org>
Hi Phil,

Am 03.09.13 09:21, schrieb Phil Ritchie:
> All
>
> Since my reply below I've been trying to see what it would take to 
> implement option 1 (char fragment identifier) and it would seem easier 
> to implement option 2. I wonder if this is why Sebastian implemented 
> this option.

With "implementing" you probably mean "consuming the URIs and 
'understanding' the offset information". Production of the URIs is no 
big difference.
I agree, option two is easier to implement. You need to analyze the 
query parts of the URI - that's it. For doing that, your implementation 
does not need to "understand" what is before the query part. So it makes 
no difference implementationwise whether you have

http://www.w3.org/its?resource=http://example.com/exampldoc.html&char=0,29
or
http://www.w3.org/its.html?resource=http://example.com/exampldoc.html&char=0,29 

so in an .htaccess file you need a rewrite rule like this

RewriteEngine On
RewriteBase /
rewriteCond %{QUERY_STRING} (.*)
RewriteRule ^/its$ http://example.com.myservice&%1 [P,L]

This will give all parameters to your application that processes the 
query parameters - wherever the application may be.

With the fragment identifier this is different: it is defined in terms 
of the media type. So a browser knows assumes that
http://www.w3.org/its.html
served as text/html is an HTML document and tries to render it. You 
could serve
http://www.w3.org/its.html
as something else and tell your application to resolve the stuff after 
"#" to the related character offset - but it would look really strange.

Best,

Felix

>
> Phil.
>
>
>
>
> -----Forwarded by Phil Ritchie/VISTATEC on 09/03/2013 08:18AM -----
> To: Ivan Herman <ivan@w3.org>
> From: Phil Ritchie/VISTATEC
> Date: 08/30/2013 01:42PM
> Cc: Felix Sasaki <fsasaki@w3.org>, public-multilingualweb-lt@w3.org, 
> W3C RDF WG <public-rdf-wg@w3.org>
> Subject: Re: Request for review from the RDF working group: ITS 2.0
>
> All
>
> I like option 1. of registering the char fragment id.
>
> Phil.
>
>
>
> Inactive hide details for Ivan Herman ---28/08/2013 17:03:30---Felix, 
> this is the official review of the RDF WG on the ITS DrafIvan Herman 
> ---28/08/2013 17:03:30---Felix, this is the official review of the RDF 
> WG on the ITS Draft, more exactly the NIF conversion s
>
> From: Ivan Herman <ivan@w3.org>
> To: Felix Sasaki <fsasaki@w3.org>,
> Cc: W3C RDF WG <public-rdf-wg@w3.org>, public-multilingualweb-lt@w3.org
> Date: 28/08/2013 17:03
> Subject: Re: Request for review from the RDF working group: ITS 2.0
> ------------------------------------------------------------------------
>
>
>
> Felix,
>
> this is the official review of the RDF WG on the ITS Draft, more 
> exactly the NIF conversion section[1]. The RDF WG discussed the issue 
> and took a resolution on this response[2]
>
> The problem we see in the conversion algorithm is the URI-s that the 
> algorithm generates, namely the URI-s of the form
>
> <http://example.com/exampledoc.html#char=0,29>
> <http://example.com/exampledoc.html#xpath(/html/body 
> <http://example.com/exampledoc.html#xpath%28/html/body>[1]/h2[1])>
>
> although it is quite obvious what these are for, we do sense a problem 
> with these nevertheless. Indeed
>
> - RDF Concepts 1.1 Last Call document[3] refers to IRI-s: RFC3987[4]
> - IRI-s map to URI-s: RFC3986[5]
> - What RFC3986 says about fragments is:
>
> [[[
> The fragment's format and resolution is therefore dependent on the 
> media type [RFC2046] of a potentially retrieved representation, even 
> though such a retrieval is only performed if the URI is dereferenced. 
>  If no such representation exists, then the semantics of the fragment 
> are considered unknown and are effectively unconstrained.
> ]]]
>
> Looking at the URI-s above:
>
> - The 'char' fragment id is defined by rfc 5147[6], but is defined for 
> text/plain only. ITS talks about XML and HTML, ie, talks about 
> resources whose media types are definitely _not_ text/plain
> - The 'xpath' fragment id is fine for XML. But it is not defined for 
> text/html
>
> In view of this, we do not feel comfortable with the choice of the 
> mapping; the resulting RDF triples will not be entirely correct 
> because these URI-s are not correct. Additionally, although that is 
> not an RDF requirement per se, the URI-s are not dereferenceable 
> (because they are incorrect) which is also in contradiction with 
> Linked Data Principles which are also prevalent in the community.
>
> We do see two ways around this issue
>
> 1. The WG registers the 'char' fragment id-s (see also [7] for 
> guidelines) through IETF for HTML and XML. (Actually, extending the 
> usage of 'char' to XML/HTML would be generally very useful). Also, the 
> WG registers 'xpath' for HTML (although we realize that this may be 
> difficult because it might not be acceptable for the HTML WG which 
> 'owns' the text/html media type)
>
> 2. The WG uses a different URI scheme, trying to avoid fragment ids. 
> Something like:
>
> http://www.w3.org/its?resource=http://example.com/exampldoc.html&char=0,29
> http://www.w3.org/its?resource=http://example.com/exampldoc.html&xpath=/html/body[1]/h2[1] 
>
>
> where, of course, the www.w3.org/its part can be some other URI and, 
> ideally, would refer to a service returning something feasible and 
> intelligent on the request there.
>
> However. We also recognize that the mapping in the ITS document is 
> _not_ normative. As a consequence, the ITS WG is perfectly in its 
> right to go ahead and not to follow the comments of the RDF Working 
> Group. In other words, the ITS Working Group does not have to ask 
> again for a formal approval of the RDF Working Group on any decision 
> it may take (although I would be interested by the decision:-)
>
> I hope this was helpful to you
>
> Sincerely, in the name of the RDF Working Group
>
> Ivan Herman (staff contact for the RDF WG)
>
> P.S. Note that there are similar efforts elsewhere, like the 
> string-range fragment id[8] or the work IDPF did for ebooks[9], but we 
> recognize none of these offer an alternative.
>
>
> [1] http://www.w3.org/TR/2013/WD-its20-20130820/#conversion-to-nif
> [2] https://www.w3.org/2013/meeting/rdf-wg/2013-08-28#resolution_1
> [3] http://www.w3.org/TR/2013/WD-rdf11-concepts-20130723/
> [4] http://tools.ietf.org/html/rfc3987
> [5] http://tools.ietf.org/html/rfc3986
> [6] http://tools.ietf.org/html/rfc5147
> [7] http://www.w3.org/TR/fragid-best-practices/
>
>
> On Aug 1, 2013, at 14:17 , Felix Sasaki <fsasaki@w3.org> wrote:
>
> > (Apologies for re-sending, I wasn't subscribed to the RDF WG list)
> >
> > Dear RDF Working Group (sending this also explicitly to Guus, David 
> and Sandro as co-chairs / staff contact, to raise their awareness), 
> with CC to the MultilingualWeb-LT Working Group,
> >
> > with this mail I am asking the RDF Working Group to review the ITS 
> 2.0 draft at [1]. The latest draft under TR space is a last call draft 
> [2]. A diff between the two drafts is here [3]. Note that during last 
> call we did a lot of changes to the informative sections 1-2 (which 
> are not relevant for the normative definition of ITS 2.0).
> >
> > ITS 2.0 provides metadata items ("data categories") to foster the 
> (automated) creation and processing of multilingual Web content: 
> mostly HTML and XML. What may be of special interest for you is the 
> ITS 2.0 approach to convert markup documents into RDF. This results in 
> triples that make use of the NIF ontology [4]. See the definition of 
> the NIF conversion algorithm at [5] and tests (= examples) from our 
> test suite in the implementation report [6]. Of course a general 
> review from the RDF WG would be nice, but I assume that this feature 
> of ITS 2.0 is of most interest for you.
> >
> > Our last call period already ended 11 June, and my apologies for 
> being late with this request. If you need more info to move this 
> forward please let me know.
> >
> > Best regards,
> >
> > Felix Sasaki (co-chair and staff contact for the MultilingualWeb-LT 
> Working Group)
> >
> > [1] 
> http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html
> > [2] http://www.w3.org/TR/2013/WD-its20-20130521/
> > [3] http://tinyurl.com/k4duo76
> > [4] http://persistence.uni-leipzig.org/nlp2rdf/
> > [5] 
> http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#conversion-to-nif
> > [6] 
> http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20-implementation-report.html#conformance-nif-conversion
> >
>
>
> ----
> Ivan Herman, W3C
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> FOAF: http://www.ivan-herman.net/foaf.rdf
>
>
>
>
>
>
>
> ************************************************************
> VistaTEC Ltd. Registered in Ireland 268483.
> Registered Office, VistaTEC House, 700, South Circular Road,
> Kilmainham. Dublin 8. Ireland.
>
> The information contained in this message, including any accompanying
> documents, is confidential and is intended only for the addressee(s).
> The unauthorized use, disclosure, copying, or alteration of this
> message is strictly forbidden. If you have received this message in
> error please notify the sender immediately.
> ************************************************************
>
Received on Tuesday, 3 September 2013 08:04:07 UTC