- From: Felix Sasaki <fsasaki@w3.org>
- Date: Thu, 21 Jun 2012 22:22:45 +0200
- To: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
- Cc: Jirka Kosek <jirka@kosek.cz>, MultilingualWeb-LT Working Group <public-multilingualweb-lt@w3.org>
- Message-ID: <CAL58czqF4tEndOuxzFsXLEYtxfNY+6EHkZE98J7hpNgK9zcEyg@mail.gmail.com>
2012/6/21 Sebastian Hellmann <hellmann@informatik.uni-leipzig.de> > Hi Jirka, > > > On 06/21/2012 03:40 PM, Jirka Kosek wrote: > >> On 21.6.2012 10:16, MultilingualWeb-LT Working Group Issue Tracker wrote: >> >> Generally, we distinguish between solutions on three aspects: >>> >>> 1. ITS info may be in HTML or in RDF or both. >>> b. RDF coexist with ITS attributes in the HTML >>> - pros: the document is kept unchanged >>> - cons: redundancy >>> >> I think that having document unchanged is non-goal. By having the same >> information expressed in a two different ways means that it is very >> likely that such information can diverge and you will have to decide >> what takes precedence then. >> >> I think that mapping from HTML+ITS to RDFa should strip all original >> its-* attributes and global rules referenced by<link rel="itsrules"> >> >> 2. three possible ways to attach a set of triples to a document (may be >>> a combination of all three): >>> a. inline RDFa (RDFa is usually used to make text that's already >>> human-readable machine-readable too.) ; >>> - cons: We face lots of issues with RDFa inline annotation. For >>> instance, when we want to add RDFa inline progressively during the process: >>> we'd need to modify every NIF URIs computed so far. >>> >> Maybe you can use something different then NIF then. For example XPath >> location of element that has its-* attached to it? >> > Well, your chainsaw picture still seems quite appropriate. For RDFa to > work *at all* , you will need to create URIs. XPath is a candidate for that > and it is straightforward to append after a '#', which is very similar to > the way NIF uses URIs. What features would you like that XPath location to > have? Felix mentioned that evaluating XPath can be quite expensive. Could > you give us a figure what would be feasible? Something like > "#xpath_html/1/body/1/h2/1" would allow to use SAX instead of DOM, but > xpath would also not 100% proof against changes, right? Just FYI, without arguing for anything, for the ITS 1.0 test suite http://www.w3.org/International/its/tests/ we created something like this /{}myMetaDoc/{}body[1]/{}insert[1]/{myChineseMakupLanguage}书籍[1] to identify each element and attribute note - taken from a "path" attribute at http://www.w3.org/International/its/tests/test1/Translate1-result.xml The format expands namespace (if there is none, there is empty curly brackets). Felix > The only 100% stable way to use RDFa (and also RDF) is to create URIs > based on element ids. Still it remains impossible to mark up everything > inline and you would need additional triples either in the script or > parallel in DOM. > > <html><head> > <!-- option 1--> > <script type="text/turtle"> > @prefix my:<http://domain.org/doc.**html# <http://domain.org/doc.html#>> > :id_XX its:translate "false" . > </script></head> > <body> <h2>How can I get my<span id="id_XX" about="my:id_XX" > property="str:anchorOf">**dataset</span> into the diagram?</h2> > <!-- option 2 --> > <span about="my:id_XX" property="translate" content="true" /> > > > My personal opinion on this is that RDFa inline is possible, but might not > be worth the trouble. Having RDF would be feasible, however. > > > b. in the head: using a script element and a media type of text/n3, >>> text/turtle or application/rdf+xml ; >>> c. in the head: using a link element to refer to another document. >>> - pros of 2.b. and 2.c.: NIF works well >>> - The offset-based recipe only need to have a tiny extension so >>> that offset 0 stands just before the '<' of "<body... " >>> - The hash-based recipe only needs to be tweaked a little. >>> >> More I think about NIF which references location based on its character >> offset (please correct me if I'm wrong) I think it's solution that will >> not work. Even simple edit in an underlying HTML document will break up >> all existing annotations. >> > The use case for this is not to be stable during editing. It rather > defines a transition from the RDF/NIF World to ITS. So, if you have tools > such as NERD, Open Calais, DBpedia Spotlight, you can define a transition > from their NIF-RDF output to ITS inline markup. I think the conversion to > ITS should be trivial. > > > We discussed a lot about this 3.a., 3.b. and 3.c., ... any comment/idea >>> for the specific use case of ITS 2.0 is welcome >>> Whatever case we choose (String annotation or DOM node annotation), we >>> are discussing about a new NIF recipe inspired by XPath 1.0 to create a URI >>> for that fragment. >>> >> ITS 1.0 is used to attach data categories to element and attributes. So >> anything what will be able to address DOM nodes of type Element and >> Attribute should work. Supporting just DOM Text node is not necessary as >> there is no corresponding XML representation. >> > Ok, so all its annotations are either referring to Element or Attribute. > This should be feasible with a computationally efficient subset of xpath. > The NIF offset solution is built upon RFC 5147 [1] in a way that > @prefix ld: <http://www.w3.org/**DesignIssues/LinkedData.html#<http://www.w3.org/DesignIssues/LinkedData.html#>> > . > ld:offset_717_729 owl:sameAs ld:char=717,12 . > > If you were to include a URI scheme (using id and/or subset of xpath) for > RDF and RDFa in MLW-LT and ITS, we could extend NIF by building upon it. > I would assume, this would be a legit way to do it. IIRC there were some > use cases for RDF (especially for provenance and for harvesting NER tool > output ). > > All the best, > Sebastian > > [1] http://tools.ietf.org/html/**rfc5147<http://tools.ietf.org/html/rfc5147> > > > > >> Jirka >> >> > > -- > Dipl. Inf. Sebastian Hellmann > Department of Computer Science, University of Leipzig > Projects: http://nlp2rdf.org , http://dbpedia.org > Homepage: http://bis.informatik.uni-**leipzig.de/SebastianHellmann<http://bis.informatik.uni-leipzig.de/SebastianHellmann> > Research Group: http://aksw.org > > > -- Felix Sasaki DFKI / W3C Fellow
Received on Thursday, 21 June 2012 20:23:12 UTC