- From: Felix Sasaki <fsasaki@w3.org>
- Date: Tue, 03 Sep 2013 10:03:32 +0200
- To: Phil Ritchie <philr@vistatec.ie>
- CC: "public-multilingualweb-lt@w3.org" <public-multilingualweb-lt@w3.org>, public-rdf-wg@w3.org
- Message-ID: <522597D4.6060904@w3.org>
Hi Phil, Am 03.09.13 09:21, schrieb Phil Ritchie: > All > > Since my reply below I've been trying to see what it would take to > implement option 1 (char fragment identifier) and it would seem easier > to implement option 2. I wonder if this is why Sebastian implemented > this option. With "implementing" you probably mean "consuming the URIs and 'understanding' the offset information". Production of the URIs is no big difference. I agree, option two is easier to implement. You need to analyze the query parts of the URI - that's it. For doing that, your implementation does not need to "understand" what is before the query part. So it makes no difference implementationwise whether you have http://www.w3.org/its?resource=http://example.com/exampldoc.html&char=0,29 or http://www.w3.org/its.html?resource=http://example.com/exampldoc.html&char=0,29 so in an .htaccess file you need a rewrite rule like this RewriteEngine On RewriteBase / rewriteCond %{QUERY_STRING} (.*) RewriteRule ^/its$ http://example.com.myservice&%1 [P,L] This will give all parameters to your application that processes the query parameters - wherever the application may be. With the fragment identifier this is different: it is defined in terms of the media type. So a browser knows assumes that http://www.w3.org/its.html served as text/html is an HTML document and tries to render it. You could serve http://www.w3.org/its.html as something else and tell your application to resolve the stuff after "#" to the related character offset - but it would look really strange. Best, Felix > > Phil. > > > > > -----Forwarded by Phil Ritchie/VISTATEC on 09/03/2013 08:18AM ----- > To: Ivan Herman <ivan@w3.org> > From: Phil Ritchie/VISTATEC > Date: 08/30/2013 01:42PM > Cc: Felix Sasaki <fsasaki@w3.org>, public-multilingualweb-lt@w3.org, > W3C RDF WG <public-rdf-wg@w3.org> > Subject: Re: Request for review from the RDF working group: ITS 2.0 > > All > > I like option 1. of registering the char fragment id. > > Phil. > > > > Inactive hide details for Ivan Herman ---28/08/2013 17:03:30---Felix, > this is the official review of the RDF WG on the ITS DrafIvan Herman > ---28/08/2013 17:03:30---Felix, this is the official review of the RDF > WG on the ITS Draft, more exactly the NIF conversion s > > From: Ivan Herman <ivan@w3.org> > To: Felix Sasaki <fsasaki@w3.org>, > Cc: W3C RDF WG <public-rdf-wg@w3.org>, public-multilingualweb-lt@w3.org > Date: 28/08/2013 17:03 > Subject: Re: Request for review from the RDF working group: ITS 2.0 > ------------------------------------------------------------------------ > > > > Felix, > > this is the official review of the RDF WG on the ITS Draft, more > exactly the NIF conversion section[1]. The RDF WG discussed the issue > and took a resolution on this response[2] > > The problem we see in the conversion algorithm is the URI-s that the > algorithm generates, namely the URI-s of the form > > <http://example.com/exampledoc.html#char=0,29> > <http://example.com/exampledoc.html#xpath(/html/body > <http://example.com/exampledoc.html#xpath%28/html/body>[1]/h2[1])> > > although it is quite obvious what these are for, we do sense a problem > with these nevertheless. Indeed > > - RDF Concepts 1.1 Last Call document[3] refers to IRI-s: RFC3987[4] > - IRI-s map to URI-s: RFC3986[5] > - What RFC3986 says about fragments is: > > [[[ > The fragment's format and resolution is therefore dependent on the > media type [RFC2046] of a potentially retrieved representation, even > though such a retrieval is only performed if the URI is dereferenced. > If no such representation exists, then the semantics of the fragment > are considered unknown and are effectively unconstrained. > ]]] > > Looking at the URI-s above: > > - The 'char' fragment id is defined by rfc 5147[6], but is defined for > text/plain only. ITS talks about XML and HTML, ie, talks about > resources whose media types are definitely _not_ text/plain > - The 'xpath' fragment id is fine for XML. But it is not defined for > text/html > > In view of this, we do not feel comfortable with the choice of the > mapping; the resulting RDF triples will not be entirely correct > because these URI-s are not correct. Additionally, although that is > not an RDF requirement per se, the URI-s are not dereferenceable > (because they are incorrect) which is also in contradiction with > Linked Data Principles which are also prevalent in the community. > > We do see two ways around this issue > > 1. The WG registers the 'char' fragment id-s (see also [7] for > guidelines) through IETF for HTML and XML. (Actually, extending the > usage of 'char' to XML/HTML would be generally very useful). Also, the > WG registers 'xpath' for HTML (although we realize that this may be > difficult because it might not be acceptable for the HTML WG which > 'owns' the text/html media type) > > 2. The WG uses a different URI scheme, trying to avoid fragment ids. > Something like: > > http://www.w3.org/its?resource=http://example.com/exampldoc.html&char=0,29 > http://www.w3.org/its?resource=http://example.com/exampldoc.html&xpath=/html/body[1]/h2[1] > > > where, of course, the www.w3.org/its part can be some other URI and, > ideally, would refer to a service returning something feasible and > intelligent on the request there. > > However. We also recognize that the mapping in the ITS document is > _not_ normative. As a consequence, the ITS WG is perfectly in its > right to go ahead and not to follow the comments of the RDF Working > Group. In other words, the ITS Working Group does not have to ask > again for a formal approval of the RDF Working Group on any decision > it may take (although I would be interested by the decision:-) > > I hope this was helpful to you > > Sincerely, in the name of the RDF Working Group > > Ivan Herman (staff contact for the RDF WG) > > P.S. Note that there are similar efforts elsewhere, like the > string-range fragment id[8] or the work IDPF did for ebooks[9], but we > recognize none of these offer an alternative. > > > [1] http://www.w3.org/TR/2013/WD-its20-20130820/#conversion-to-nif > [2] https://www.w3.org/2013/meeting/rdf-wg/2013-08-28#resolution_1 > [3] http://www.w3.org/TR/2013/WD-rdf11-concepts-20130723/ > [4] http://tools.ietf.org/html/rfc3987 > [5] http://tools.ietf.org/html/rfc3986 > [6] http://tools.ietf.org/html/rfc5147 > [7] http://www.w3.org/TR/fragid-best-practices/ > > > On Aug 1, 2013, at 14:17 , Felix Sasaki <fsasaki@w3.org> wrote: > > > (Apologies for re-sending, I wasn't subscribed to the RDF WG list) > > > > Dear RDF Working Group (sending this also explicitly to Guus, David > and Sandro as co-chairs / staff contact, to raise their awareness), > with CC to the MultilingualWeb-LT Working Group, > > > > with this mail I am asking the RDF Working Group to review the ITS > 2.0 draft at [1]. The latest draft under TR space is a last call draft > [2]. A diff between the two drafts is here [3]. Note that during last > call we did a lot of changes to the informative sections 1-2 (which > are not relevant for the normative definition of ITS 2.0). > > > > ITS 2.0 provides metadata items ("data categories") to foster the > (automated) creation and processing of multilingual Web content: > mostly HTML and XML. What may be of special interest for you is the > ITS 2.0 approach to convert markup documents into RDF. This results in > triples that make use of the NIF ontology [4]. See the definition of > the NIF conversion algorithm at [5] and tests (= examples) from our > test suite in the implementation report [6]. Of course a general > review from the RDF WG would be nice, but I assume that this feature > of ITS 2.0 is of most interest for you. > > > > Our last call period already ended 11 June, and my apologies for > being late with this request. If you need more info to move this > forward please let me know. > > > > Best regards, > > > > Felix Sasaki (co-chair and staff contact for the MultilingualWeb-LT > Working Group) > > > > [1] > http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html > > [2] http://www.w3.org/TR/2013/WD-its20-20130521/ > > [3] http://tinyurl.com/k4duo76 > > [4] http://persistence.uni-leipzig.org/nlp2rdf/ > > [5] > http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#conversion-to-nif > > [6] > http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20-implementation-report.html#conformance-nif-conversion > > > > > ---- > Ivan Herman, W3C > Home: http://www.w3.org/People/Ivan/ > mobile: +31-641044153 > FOAF: http://www.ivan-herman.net/foaf.rdf > > > > > > > > ************************************************************ > VistaTEC Ltd. Registered in Ireland 268483. > Registered Office, VistaTEC House, 700, South Circular Road, > Kilmainham. Dublin 8. Ireland. > > The information contained in this message, including any accompanying > documents, is confidential and is intended only for the addressee(s). > The unauthorized use, disclosure, copying, or alteration of this > message is strictly forbidden. If you have received this message in > error please notify the sender immediately. > ************************************************************ >
Received on Tuesday, 3 September 2013 08:04:07 UTC