Re: Fragment issues in ITS/HTML/XML mapping to NIF from Sebastian Hellmann on 2013-09-11 (public-bpmlod@w3.org from September 2013)

From: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
Date: Wed, 11 Sep 2013 12:45:41 +0200
To: Dave Lewis <dave.lewis@cs.tcd.ie>
CC: public-bpmlod@w3.org
Message-ID: <523049D5.4080509@informatik.uni-leipzig.de>
Hi Dave,
yes, that would make sense, some comments:

1. RFC 5147, i.e. #char=x,y is defined for text/plain as well as all RDF 
mime types (text/turtle, application/rdf+xml, ....) since the Fragment 
in RDF refers to part of the RDF graph.

As far as I know, all Fragment IDs work well with RDF.

2. RFC 5147 for HTML technically works for small static document, but 
not on the Web as HTML changes quite frequently. This was the reason, 
why they didn't adopt it, I guess.

3. XPointer/XPointer was already quite good and can be improved further 
and standardized (it fell short on the threshold)

4. There are project like GreaseMonkey 
(https://addons.mozilla.org/en-US/firefox/addon/greasemonkey/) or 
[CrowdsourcingTranslation], which are bound to break often, since there 
is no well researched mechanism to anchor annotations to parts in 
HTML/XML persistently.

5. Open annotation didn't solve this problem sufficiently either (this 
is my opinion) [oa]

6. The whole task is a real big problem of Web Architecture, but I 
think, we can tackle it as a community.

All the best,
Sebastian

[CrowdsourcingTranslation] 
http://www.w3.org/International/multilingualweb/limerick/slides/wasala.pdf
[oa] http://www.openannotation.org/spec/core/specific.html#Selectors

Am 11.09.2013 11:27, schrieb Dave Lewis:
> Hi all,
> I won't be able to make the call today, but as promised last week here 
> a pointer to an issue raised by the RDF WG in relation to the use of 
> fragment identifiers in NIF, as used in the ITS-NIF mapping in the ITS 
> 2.0 specification.
>
> The issue is described at;
> https://www.w3.org/International/multilingualweb/lt/track/issues/131
>
> basically pointing out that the 'char' media fragments use in the 
> mapping to identity specific annotated text, is only specified for 
> plain text file types and not for html or xml. Also the xpath option 
> for fragment identifies, while ok for xml files does not apply to html 
> files. the result is that if we use such fragment URL in the RDF we 
> generate from a  xml+its or html+its mapping the URL won't be 
> derferencable, therefore violating this core linked data principle.
>
> The ideal solution would be to get these media type and associated 
> processing descriptions registered with the RDF. This wasn't an option 
> for the MLW-LT working group due to our time constraints, so we went 
> for a query style URL instead, which is derferenceable, and added a 
> note about the issues around the fragment option. The agreed text is 
> in the spec at:
> http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#conversion-to-nif 
>
> note there is a reverse mapping also.
>
> While this is obviously an issue for NIF in the short term also. Its 
> acknowledged by the RDF group that registering those fragment types 
> would be generally useful in tying together the web document parsing 
> world with the linked data world more clearly.
>
> Perhaps that is a task that we in this group could consider taking on? 
> This guide gives us a starting point:
> http://www.w3.org/TR/fragid-best-practices/
>
> Regards,
> Dave
>
>
>
>
>
>


-- 
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Events:
* NLP & DBpedia 2013 (http://nlp-dbpedia2013.blogs.aksw.org, Extended 
Deadline: *July 18th*)
* LSWT 23/24 Sept, 2013 in Leipzig (http://aksw.org/lswt)
Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf
Projects: http://nlp2rdf.org , http://linguistics.okfn.org , 
http://dbpedia.org/Wiktionary , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org
Received on Wednesday, 11 September 2013 10:46:18 UTC