W3C home > Mailing lists > Public > public-rdf-wg@w3.org > August 2013

Re: Draft Review of the ITS 2.0 draft document

From: Ivan Herman <ivan@w3.org>
Date: Sat, 24 Aug 2013 16:49:51 +0200
Cc: Pat Hayes <phayes@ihmc.us>, W3C RDF WG <public-rdf-wg@w3.org>, Felix Sasaki <fsasaki@w3.org>
Message-Id: <898EEEF1-9B7F-476B-821A-21E3A24B785F@w3.org>
To: David Wood <david@3roundstones.com>
Indeed, I thought about the OA (and also talked with Felix about it). But, unless I miss something,

http://www.openannotation.org/spec/core/core.html#FragmentURIs

simply site-steps the issue, and says: 'you guys, it is your responsibility to give the right URI'. Which is fair enough for OA, but I am not sure it is o.k. for ITS, it remains too vague.

But, also looking at Jeni's draft[1], maybe the right approach for the ITS WG is (if they want to stick to frag id-s):

- Define that, for XML sources they use an existing scheme; they may be able to use the 'string-range' xpointer form[2,3] instead of #char (using the xpath is fine for XML)

- Go down the (possibly painful) road of formally registering the #char fragid for html (I think it would be pretty useful for the community anyway) per [1]. This has to be done, I presume, in cooperation with the HTML WG, because they 'own' text/html

- For xpath in HTML: it may be more difficult to get the HTML WG accept that as a valid fragid and register it. But, who knows...

As far as the RDF WG is concerned, I would propose the following

- in our comments we say that we do feel a problem in this space for reasons I outlined in my original mail
- we also say that they should either use query URI-s or try to register the #char and #xpath URI-s as fragment IDs properly
- finally, as this is not a normative part to the ITS spec, we, as a Working Group, leave it to the ITS WG to decide at their discretion (ie, they do not have to get our formal approval for whatever they decide to do)

Ivan

B.t.w., I just realized the other day the IDPF has defined something similar for their own purposes[4], but that is of course for epub media type...


[1] http://www.w3.org/TR/fragid-best-practices/
[2] http://www.w3.org/2005/04/xpointer-schemes/
[3] http://www.w3.org/2005/04/xpointer-schemes/string-range
[4] http://www.idpf.org/epub/linking/cfi/epub-cfi.html

On Aug 24, 2013, at 13:20 , David Wood <david@3roundstones.com> wrote:

> The Open Annotation Community Group [1] is the best fit, I think. Section 2.1.4 of their spec [2] is entitled "Fragment URIs Identifying Body or Target" and attempts to define an RDF- and URI-friendly way to identify a particular part of a resource to annotate. 
> 
> Having worked with their spec, I don't think they have quite succeeded either. It may not be possible to do this cleanly given the state of the specs.
> 
> Regards,
> Dave
> --
> http://about.me/david_wood
> 
> [1] http://www.w3.org/community/openannotation/
> [2] http://www.openannotation.org/spec/core/20130208/core.html#FragmentURIs
> 
> 
> On Aug 24, 2013, at 2:10, Ivan Herman <ivan@w3.org> wrote:
> 
>> That is a good point, but it may still be good for the records of the ITS WG to, at the minimum, share our opinion without requesting a change. The ITS WG may then decide to contact, eg, the TAG if they want... The problem is that I do not really see which group owns this thing.
>> 
>> Actually... we are not completely out of this. After all, the concepts document does talk about fragments, ie, we do go beyond a purely opaque IRI...
>> 
>> Note sure. 
>> 
>> Ivan
>> 
>> ---
>> Ivan Herman
>> Tel:+31 641044153
>> http://www.ivan-herman.net
>> 
>> (Written on mobile, sorry for brevity and misspellings...)
>> 
>> 
>> 
>> On 24 Aug 2013, at 06:19, Pat Hayes <phayes@ihmc.us> wrote:
>> 
>>> Ivan
>>> 
>>> While I sympathise with, and share, your discomfort, I don't see that this is an issue particularly for RDF to comment upon. RDF, as you note, treats IRIs as opaque, so this entire discussion seems irrelevant to RDF-WG. Maybe some other WG, or the TAG, should be asked to take up this issue with ITS WG ?
>>> 
>>> Pat
>>> 
>>> 
>>> On Aug 23, 2013, at 5:57 AM, Ivan Herman wrote:
>>> 
>>>> As recorded as an action (wait, it was not recorded on the call because tracker got confused by several ivan-s:-) I reviewed the ITS 2.0 document, as requested by the ITS WG via Felix Sasaki[1]. The section that is relevant for this Working Group is the mapping to an external ontology, called NIF[2]. Actually, the details of that ontology are also not relevant for this Working Group; the issue is to map the attributes set on the textual content of an HTML (or XML) document into RDF.
>>>> 
>>>> To take the example of the document:
>>>> 
>>>> <html><body><h2 translate="yes">Welcome to <span 
>>>> its-ta-ident-ref="http://dbpedia.org/resource/Dublin" its-within-text="yes"
>>>> translate="no">Dublin</span> in 
>>>> <b translate="no" its-within-text="yes">Ireland</b>!</h2></body></html>
>>>> 
>>>> the goal is to produce a set of RDF statements of the form:
>>>> 
>>>> <URI_TO_IDENTIFY_A_TEXT_PORTION>
>>>> nif:property1 value1;
>>>> nif:property2 value2;
>>>> nif:prop <URI_TO_IDENTIFY_A_TEXT_POSITION>
>>>> ...
>>>> 
>>>> The really interesting question is how to define the two URI-s <URI_TO_IDENTIFY_A_TEXT_PORTION> and <URI_TO_IDENTIFY_A_TEXT_POSITION>, where, say, the first should somehow refer to "Welcome to Dublin Ireland!" and the other should tell the world that this text is within the <h2> element of the file.
>>>> 
>>>> The current mapping uses the following two URI-s
>>>> 
>>>> <http://example.com/exampledoc.html#char=0,29>
>>>> <http://example.com/exampledoc.html#xpath(/html/body[1]/h2[1])>
>>>> 
>>>> although it is quite obvious what these are for, I sense some sort of a problem with these. We may end in a rathole, but...
>>>> 
>>>> - We refer to IRI-s in our concept document: RFC3987
>>>> - IRI-s map to URI-s: RFC3987
>>>> - What RFC3987 says about fragments is:
>>>> 
>>>> "The fragment's format and resolution is therefore dependent on the media type [RFC2046] of a potentially retrieved representation, even though such a retrieval is only performed if the URI is dereferenced.  If no such representation exists, then the semantics of the fragment are considered unknown and are effectively unconstrained."
>>>> 
>>>> The way I translate is that if I want to have a proper URI, where I expect the media type to be BLA, then the fragment ID should somehow be defined for BLA. Although RDF regards IRI-s as opaque, I would still feel uneasy to do otherwise.
>>>> 
>>>> Looking at the URI-s above
>>>> 
>>>> - The 'char' fragment is defined by rfc 5147, but is defined for text/plain only. ITS talks about XML and HTML, ie, talks about resources whose media types are definitely _not_ text/plain
>>>> - The xpath fragment id is fine for XML. But it is not defined for text/html and, knowing how XML is frown upon by the HTML WG, I do not expect that to ever change.
>>>> 
>>>> In view of this, I do not feel comfortable with the choice of the mapping. The URI-s are not dereferenceable, neither are they correct...
>>>> 
>>>> That being said, I may be too picky and we could let this go, also considering the fact that this section is _not_ normative in ITS.
>>>> 
>>>> I had some discussion with Felix and also with Sebastian Hellmann, who is the author of NIF; one proposal I had was to use a URI of the form
>>>> 
>>>> http://www.w3.org/its?resource=http://example.com/exampldoc.html&char=0,29 
>>>> 
>>>> which, if some simple service is provided, can provide some simple information back, and is ok as a URI. I think that would be acceptable to them. But again, this WG may decide that I am just way too pedantic...
>>>> 
>>>> Ivan
>>>> 
>>>> P.S. It is of course possible to radically change the mapping with some blank nodes in the middle to avoid the issue...
>>>> 
>>>> [1] http://lists.w3.org/Archives/Public/public-rdf-wg/2013Aug/0000.html
>>>> [2] http://www.w3.org/TR/2013/WD-its20-20130820/#conversion-to-nif
>>>> 
>>>> ----
>>>> Ivan Herman, W3C 
>>>> Home: http://www.w3.org/People/Ivan/
>>>> mobile: +31-641044153
>>>> FOAF: http://www.ivan-herman.net/foaf.rdf
>>> 
>>> ------------------------------------------------------------
>>> IHMC                                     (850)434 8903 home
>>> 40 South Alcaniz St.            (850)202 4416   office
>>> Pensacola                            (850)202 4440   fax
>>> FL 32502                              (850)291 0667   mobile (preferred)
>>> phayes@ihmc.us       http://www.ihmc.us/users/phayes
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 


----
Ivan Herman, W3C 
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
FOAF: http://www.ivan-herman.net/foaf.rdf
Received on Saturday, 24 August 2013 14:50:19 UTC

This archive was generated by hypermail 2.3.1 : Saturday, 24 August 2013 14:50:20 UTC