- From: Phil Ritchie <philr@vistatec.ie>
- Date: Fri, 30 Aug 2013 13:42:52 +0100
- To: Ivan Herman <ivan@w3.org>
- Cc: Felix Sasaki <fsasaki@w3.org>, public-multilingualweb-lt@w3.org, W3C RDF WG <public-rdf-wg@w3.org>
- Message-ID: <OF5FA23FE0.488512CF-ON80257BD7.00454F9C-80257BD7.0045D7DE@vistatec.ie>
All
I like option 1. of registering the char fragment id.
Phil.
From: Ivan Herman <ivan@w3.org>
To: Felix Sasaki <fsasaki@w3.org>,
Cc: W3C RDF WG <public-rdf-wg@w3.org>,
public-multilingualweb-lt@w3.org
Date: 28/08/2013 17:03
Subject: Re: Request for review from the RDF working group: ITS 2.0
Felix,
this is the official review of the RDF WG on the ITS Draft, more exactly
the NIF conversion section[1]. The RDF WG discussed the issue and took a
resolution on this response[2]
The problem we see in the conversion algorithm is the URI-s that the
algorithm generates, namely the URI-s of the form
<http://example.com/exampledoc.html#char=0,29>
<http://example.com/exampledoc.html#xpath(/html/body[1]/h2[1])>
although it is quite obvious what these are for, we do sense a problem
with these nevertheless. Indeed
- RDF Concepts 1.1 Last Call document[3] refers to IRI-s: RFC3987[4]
- IRI-s map to URI-s: RFC3986[5]
- What RFC3986 says about fragments is:
[[[
The fragment's format and resolution is therefore dependent on the media
type [RFC2046] of a potentially retrieved representation, even though such
a retrieval is only performed if the URI is dereferenced. If no such
representation exists, then the semantics of the fragment are considered
unknown and are effectively unconstrained.
]]]
Looking at the URI-s above:
- The 'char' fragment id is defined by rfc 5147[6], but is defined for
text/plain only. ITS talks about XML and HTML, ie, talks about resources
whose media types are definitely _not_ text/plain
- The 'xpath' fragment id is fine for XML. But it is not defined for
text/html
In view of this, we do not feel comfortable with the choice of the
mapping; the resulting RDF triples will not be entirely correct because
these URI-s are not correct. Additionally, although that is not an RDF
requirement per se, the URI-s are not dereferenceable (because they are
incorrect) which is also in contradiction with Linked Data Principles
which are also prevalent in the community.
We do see two ways around this issue
1. The WG registers the 'char' fragment id-s (see also [7] for guidelines)
through IETF for HTML and XML. (Actually, extending the usage of 'char' to
XML/HTML would be generally very useful). Also, the WG registers 'xpath'
for HTML (although we realize that this may be difficult because it might
not be acceptable for the HTML WG which 'owns' the text/html media type)
2. The WG uses a different URI scheme, trying to avoid fragment ids.
Something like:
http://www.w3.org/its?resource=http://example.com/exampldoc.html&char=0,29
http://www.w3.org/its?resource=http://example.com/exampldoc.html&xpath=/html/body
[1]/h2[1]
where, of course, the www.w3.org/its part can be some other URI and,
ideally, would refer to a service returning something feasible and
intelligent on the request there.
However. We also recognize that the mapping in the ITS document is _not_
normative. As a consequence, the ITS WG is perfectly in its right to go
ahead and not to follow the comments of the RDF Working Group. In other
words, the ITS Working Group does not have to ask again for a formal
approval of the RDF Working Group on any decision it may take (although I
would be interested by the decision:-)
I hope this was helpful to you
Sincerely, in the name of the RDF Working Group
Ivan Herman (staff contact for the RDF WG)
P.S. Note that there are similar efforts elsewhere, like the string-range
fragment id[8] or the work IDPF did for ebooks[9], but we recognize none
of these offer an alternative.
[1] http://www.w3.org/TR/2013/WD-its20-20130820/#conversion-to-nif
[2] https://www.w3.org/2013/meeting/rdf-wg/2013-08-28#resolution_1
[3] http://www.w3.org/TR/2013/WD-rdf11-concepts-20130723/
[4] http://tools.ietf.org/html/rfc3987
[5] http://tools.ietf.org/html/rfc3986
[6] http://tools.ietf.org/html/rfc5147
[7] http://www.w3.org/TR/fragid-best-practices/
On Aug 1, 2013, at 14:17 , Felix Sasaki <fsasaki@w3.org> wrote:
> (Apologies for re-sending, I wasn't subscribed to the RDF WG list)
>
> Dear RDF Working Group (sending this also explicitly to Guus, David and
Sandro as co-chairs / staff contact, to raise their awareness), with CC to
the MultilingualWeb-LT Working Group,
>
> with this mail I am asking the RDF Working Group to review the ITS 2.0
draft at [1]. The latest draft under TR space is a last call draft [2]. A
diff between the two drafts is here [3]. Note that during last call we did
a lot of changes to the informative sections 1-2 (which are not relevant
for the normative definition of ITS 2.0).
>
> ITS 2.0 provides metadata items ("data categories") to foster the
(automated) creation and processing of multilingual Web content: mostly
HTML and XML. What may be of special interest for you is the ITS 2.0
approach to convert markup documents into RDF. This results in triples
that make use of the NIF ontology [4]. See the definition of the NIF
conversion algorithm at [5] and tests (= examples) from our test suite in
the implementation report [6]. Of course a general review from the RDF WG
would be nice, but I assume that this feature of ITS 2.0 is of most
interest for you.
>
> Our last call period already ended 11 June, and my apologies for being
late with this request. If you need more info to move this forward please
let me know.
>
> Best regards,
>
> Felix Sasaki (co-chair and staff contact for the MultilingualWeb-LT
Working Group)
>
> [1]
http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html
> [2] http://www.w3.org/TR/2013/WD-its20-20130521/
> [3] http://tinyurl.com/k4duo76
> [4] http://persistence.uni-leipzig.org/nlp2rdf/
> [5]
http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#conversion-to-nif
> [6]
http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20-implementation-report.html#conformance-nif-conversion
>
----
Ivan Herman, W3C
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
FOAF: http://www.ivan-herman.net/foaf.rdf
************************************************************
VistaTEC Ltd. Registered in Ireland 268483.
Registered Office, VistaTEC House, 700, South Circular Road,
Kilmainham. Dublin 8. Ireland.
The information contained in this message, including any accompanying
documents, is confidential and is intended only for the addressee(s).
The unauthorized use, disclosure, copying, or alteration of this
message is strictly forbidden. If you have received this message in
error please notify the sender immediately.
************************************************************
Received on Friday, 30 August 2013 12:43:24 UTC