- From: Felix Sasaki <fsasaki@w3.org>
- Date: Tue, 03 Sep 2013 10:03:32 +0200
- To: Phil Ritchie <philr@vistatec.ie>
- CC: "public-multilingualweb-lt@w3.org" <public-multilingualweb-lt@w3.org>, public-rdf-wg@w3.org
- Message-ID: <522597D4.6060904@w3.org>
Hi Phil,
Am 03.09.13 09:21, schrieb Phil Ritchie:
> All
>
> Since my reply below I've been trying to see what it would take to
> implement option 1 (char fragment identifier) and it would seem easier
> to implement option 2. I wonder if this is why Sebastian implemented
> this option.
With "implementing" you probably mean "consuming the URIs and
'understanding' the offset information". Production of the URIs is no
big difference.
I agree, option two is easier to implement. You need to analyze the
query parts of the URI - that's it. For doing that, your implementation
does not need to "understand" what is before the query part. So it makes
no difference implementationwise whether you have
http://www.w3.org/its?resource=http://example.com/exampldoc.html&char=0,29
or
http://www.w3.org/its.html?resource=http://example.com/exampldoc.html&char=0,29
so in an .htaccess file you need a rewrite rule like this
RewriteEngine On
RewriteBase /
rewriteCond %{QUERY_STRING} (.*)
RewriteRule ^/its$ http://example.com.myservice&%1 [P,L]
This will give all parameters to your application that processes the
query parameters - wherever the application may be.
With the fragment identifier this is different: it is defined in terms
of the media type. So a browser knows assumes that
http://www.w3.org/its.html
served as text/html is an HTML document and tries to render it. You
could serve
http://www.w3.org/its.html
as something else and tell your application to resolve the stuff after
"#" to the related character offset - but it would look really strange.
Best,
Felix
>
> Phil.
>
>
>
>
> -----Forwarded by Phil Ritchie/VISTATEC on 09/03/2013 08:18AM -----
> To: Ivan Herman <ivan@w3.org>
> From: Phil Ritchie/VISTATEC
> Date: 08/30/2013 01:42PM
> Cc: Felix Sasaki <fsasaki@w3.org>, public-multilingualweb-lt@w3.org,
> W3C RDF WG <public-rdf-wg@w3.org>
> Subject: Re: Request for review from the RDF working group: ITS 2.0
>
> All
>
> I like option 1. of registering the char fragment id.
>
> Phil.
>
>
>
> Inactive hide details for Ivan Herman ---28/08/2013 17:03:30---Felix,
> this is the official review of the RDF WG on the ITS DrafIvan Herman
> ---28/08/2013 17:03:30---Felix, this is the official review of the RDF
> WG on the ITS Draft, more exactly the NIF conversion s
>
> From: Ivan Herman <ivan@w3.org>
> To: Felix Sasaki <fsasaki@w3.org>,
> Cc: W3C RDF WG <public-rdf-wg@w3.org>, public-multilingualweb-lt@w3.org
> Date: 28/08/2013 17:03
> Subject: Re: Request for review from the RDF working group: ITS 2.0
> ------------------------------------------------------------------------
>
>
>
> Felix,
>
> this is the official review of the RDF WG on the ITS Draft, more
> exactly the NIF conversion section[1]. The RDF WG discussed the issue
> and took a resolution on this response[2]
>
> The problem we see in the conversion algorithm is the URI-s that the
> algorithm generates, namely the URI-s of the form
>
> <http://example.com/exampledoc.html#char=0,29>
> <http://example.com/exampledoc.html#xpath(/html/body
> <http://example.com/exampledoc.html#xpath%28/html/body>[1]/h2[1])>
>
> although it is quite obvious what these are for, we do sense a problem
> with these nevertheless. Indeed
>
> - RDF Concepts 1.1 Last Call document[3] refers to IRI-s: RFC3987[4]
> - IRI-s map to URI-s: RFC3986[5]
> - What RFC3986 says about fragments is:
>
> [[[
> The fragment's format and resolution is therefore dependent on the
> media type [RFC2046] of a potentially retrieved representation, even
> though such a retrieval is only performed if the URI is dereferenced.
> If no such representation exists, then the semantics of the fragment
> are considered unknown and are effectively unconstrained.
> ]]]
>
> Looking at the URI-s above:
>
> - The 'char' fragment id is defined by rfc 5147[6], but is defined for
> text/plain only. ITS talks about XML and HTML, ie, talks about
> resources whose media types are definitely _not_ text/plain
> - The 'xpath' fragment id is fine for XML. But it is not defined for
> text/html
>
> In view of this, we do not feel comfortable with the choice of the
> mapping; the resulting RDF triples will not be entirely correct
> because these URI-s are not correct. Additionally, although that is
> not an RDF requirement per se, the URI-s are not dereferenceable
> (because they are incorrect) which is also in contradiction with
> Linked Data Principles which are also prevalent in the community.
>
> We do see two ways around this issue
>
> 1. The WG registers the 'char' fragment id-s (see also [7] for
> guidelines) through IETF for HTML and XML. (Actually, extending the
> usage of 'char' to XML/HTML would be generally very useful). Also, the
> WG registers 'xpath' for HTML (although we realize that this may be
> difficult because it might not be acceptable for the HTML WG which
> 'owns' the text/html media type)
>
> 2. The WG uses a different URI scheme, trying to avoid fragment ids.
> Something like:
>
> http://www.w3.org/its?resource=http://example.com/exampldoc.html&char=0,29
> http://www.w3.org/its?resource=http://example.com/exampldoc.html&xpath=/html/body[1]/h2[1]
>
>
> where, of course, the www.w3.org/its part can be some other URI and,
> ideally, would refer to a service returning something feasible and
> intelligent on the request there.
>
> However. We also recognize that the mapping in the ITS document is
> _not_ normative. As a consequence, the ITS WG is perfectly in its
> right to go ahead and not to follow the comments of the RDF Working
> Group. In other words, the ITS Working Group does not have to ask
> again for a formal approval of the RDF Working Group on any decision
> it may take (although I would be interested by the decision:-)
>
> I hope this was helpful to you
>
> Sincerely, in the name of the RDF Working Group
>
> Ivan Herman (staff contact for the RDF WG)
>
> P.S. Note that there are similar efforts elsewhere, like the
> string-range fragment id[8] or the work IDPF did for ebooks[9], but we
> recognize none of these offer an alternative.
>
>
> [1] http://www.w3.org/TR/2013/WD-its20-20130820/#conversion-to-nif
> [2] https://www.w3.org/2013/meeting/rdf-wg/2013-08-28#resolution_1
> [3] http://www.w3.org/TR/2013/WD-rdf11-concepts-20130723/
> [4] http://tools.ietf.org/html/rfc3987
> [5] http://tools.ietf.org/html/rfc3986
> [6] http://tools.ietf.org/html/rfc5147
> [7] http://www.w3.org/TR/fragid-best-practices/
>
>
> On Aug 1, 2013, at 14:17 , Felix Sasaki <fsasaki@w3.org> wrote:
>
> > (Apologies for re-sending, I wasn't subscribed to the RDF WG list)
> >
> > Dear RDF Working Group (sending this also explicitly to Guus, David
> and Sandro as co-chairs / staff contact, to raise their awareness),
> with CC to the MultilingualWeb-LT Working Group,
> >
> > with this mail I am asking the RDF Working Group to review the ITS
> 2.0 draft at [1]. The latest draft under TR space is a last call draft
> [2]. A diff between the two drafts is here [3]. Note that during last
> call we did a lot of changes to the informative sections 1-2 (which
> are not relevant for the normative definition of ITS 2.0).
> >
> > ITS 2.0 provides metadata items ("data categories") to foster the
> (automated) creation and processing of multilingual Web content:
> mostly HTML and XML. What may be of special interest for you is the
> ITS 2.0 approach to convert markup documents into RDF. This results in
> triples that make use of the NIF ontology [4]. See the definition of
> the NIF conversion algorithm at [5] and tests (= examples) from our
> test suite in the implementation report [6]. Of course a general
> review from the RDF WG would be nice, but I assume that this feature
> of ITS 2.0 is of most interest for you.
> >
> > Our last call period already ended 11 June, and my apologies for
> being late with this request. If you need more info to move this
> forward please let me know.
> >
> > Best regards,
> >
> > Felix Sasaki (co-chair and staff contact for the MultilingualWeb-LT
> Working Group)
> >
> > [1]
> http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html
> > [2] http://www.w3.org/TR/2013/WD-its20-20130521/
> > [3] http://tinyurl.com/k4duo76
> > [4] http://persistence.uni-leipzig.org/nlp2rdf/
> > [5]
> http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#conversion-to-nif
> > [6]
> http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20-implementation-report.html#conformance-nif-conversion
> >
>
>
> ----
> Ivan Herman, W3C
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> FOAF: http://www.ivan-herman.net/foaf.rdf
>
>
>
>
>
>
>
> ************************************************************
> VistaTEC Ltd. Registered in Ireland 268483.
> Registered Office, VistaTEC House, 700, South Circular Road,
> Kilmainham. Dublin 8. Ireland.
>
> The information contained in this message, including any accompanying
> documents, is confidential and is intended only for the addressee(s).
> The unauthorized use, disclosure, copying, or alteration of this
> message is strictly forbidden. If you have received this message in
> error please notify the sender immediately.
> ************************************************************
>
Received on Tuesday, 3 September 2013 08:04:07 UTC