- From: Tadej Štajner <tadej.stajner@ijs.si>
- Date: Wed, 27 Jun 2012 16:28:10 +0200
- To: Felix Sasaki <fsasaki@w3.org>
- CC: public-multilingualweb-lt@w3.org
- Message-ID: <4FEB187A.10800@ijs.si>
Hi, there's a slight distinction between 'entity disambiguation' and 'word sense disambiguation' but I think this is well-addressed with the markup. The rest, I agree with. I thought about the consolidated mark-up a bit more, and have some examples. Something like this will go in the July spec: * Entity: ** Word sense disambiguation <span entityRel="wsd" entityIdent="synsets-836" entityResource="http://example.com/myWordnet">bank</span> ** Named entity disambiguation <span entityType="ned" entityIdent="http://dbpedia.org/resource/Mike_Jones_(poet)" entityResource="http://dbpedia.org/">Mike Jones</span> ** Named entity type <span entityRel="ne" entityIdent="Person" entityResource="http://www.schema.org/">Mike Jones</span> * Term <span entityRel="term" entityIdent="lexEntry473" entityResource="http://example.com/myLexion">language technology</span> With regard to the term data category, is it necessary to use the same markup as in ITS1.0? For instance, what used to be its:term="yes" is now ts:entityRel="term", etc. -- Tadej On 25. 06. 2012 10:32, Felix Sasaki wrote: > Hi Tadej, > > sorry for the late reply. So this sounds like we would have an > "entity" data category instead of "disambiguation". Disambiguation > would then be one usage scenario for "entity". > > I had proposed at > http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Jun/0133.html > that you, Tadej, write a "disambiguation" section, but maybe it makes > sense to have an "entity" section with use cases (and markup) for > "named entity" and "word sense disambiguation". The "terminology" > aspect (linking to a term lexicon) could be realized by updating the > existing terminology data category with a lexicon link. > > What do you or others think? > > Best, > > Felix > > > 2012/6/21 Tadej Štajner <tadej.stajner@ijs.si > <mailto:tadej.stajner@ijs.si>> > > Hi, > this is feasible. The rationale behind my decision was that having > individual attributes for different relationships is less verbose, > at the expense of having more attributes in the spec. If > minimising the latter is higher priority, then I agree with this way. > > Some points: in example 2, this syntax has now way to > simultaneously express that the "Mike Jones" can actually be > described with an pointer to a resource (let's say, > http://dbpedia.org/resource/Mike_Jones_(poet)) > <http://dbpedia.org/resource/Mike_Jones_%28poet%29%29>. So, > basically, saying both that he is a Person and that he's actually > some concrete person. This entails introducing this distinction: > > for unknown but detected entities: > <span entityType="ne-type" entityIdent="Person" > entityResource="http://www.schema.org/">Mike Jones</span> > > for known entities: > <span > entityType="ne-ref" entityIdent="http://dbpedia.org/resource/Mike_Jones_(poet)" > entityResource="http://dbpedia.org/">Mike Jones</span> > > which is not ideal and reduces expressivity, since we're unable to > assert both at the same time within the same element. I guess > nesting the elemets could work, but that's introducing > complexities in markup. In a global selector setting, it's > probably fine. > > And re your comments. > - that's the current state, of the software, yes. Automation of 3) > is possible provided that a term lexicon is specified. > - agree, but there can be a pretty big number of such rules > following this example, especially since we'd have to explicitly > state every type mapping, since the selector doesn't reason that a > itemtype=Musician (for example) is also a Person. Is this > something that is worth maintaining? > > -- Tadej > > > On 20. 06. 2012 20 <tel:06.%202012%2020>:41, Felix Sasaki wrote: >> Tadej, all, >> >> I was looking at >> http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#Terminology >> and I'm wondering whether your proposal can be merged. Let me >> start with examples bottom-up >> >> 1) >> <span entityType="wsd" entityIdent="synsets-836" >> entityResource="http://example.com/myWordnet">bank</span> >> tries to capture >> http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#disambiguation >> >> 2) >> <span entityType="ne" entityIdent="Person" >> entityResource="http://www.schema.org/">Mike Jones</span> >> tries to capture >> http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#namedEntity >> >> 3) >> <span entityType="term" entityIdent="lexEntry473" >> entityResource="http://example.com/myLexion">language >> technology</span> >> tries to capture >> http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#terminology_2 >> >> Does above merging make sense? One motivation for me is to >> propose as less attributes as possible - in that way we can >> Also, some general questions / comments: >> - I assume that 1) and 2) could be automatically generated by >> tools, but 3) not? >> - to allow people to re-use existing annotations (e.g. from >> schema.org <http://schema.org>), we could define global rules >> like this: >> <its:entity Rule selector="//div[@itemtype='Person']" >> entityResource="http://www.schema.org/" entityType="ne"/> >> >> Felix >> >> >> 2012/6/19 Tadej Stajner <tadej.stajner@ijs.si >> <mailto:tadej.stajner@ijs.si>> >> >> Hi, Felix, >> I've cleaned up the Terminology section in the requirements >> document with regard to recent discussions on the list and in >> Dublin. What kind of worklow do we have in order to update >> the draft, to post recommendations, examples, etc? Is the >> Requirements wiki page the right place for this? >> >> http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#Terminology >> >> -- Tadej >> >> >> >> >> On 6/19/2012 12:09 PM, Maxime Lefrançois wrote: >>> Hi, >>> >>> The taskforce is on the HTML to RDFa algorithm. >>> It should be ready by tomorrow afternoon for review. >>> >>> Maxime >>> >>> ------------------------------------------------------------------------ >>> >>> *De: *"Felix Sasaki" <fsasaki@w3.org> >>> <mailto:fsasaki@w3.org> >>> *À: *"Jirka Kosek" <jirka@kosek.cz> <mailto:jirka@kosek.cz> >>> *Cc: *public-multilingualweb-lt@w3.org >>> <mailto:public-multilingualweb-lt@w3.org> >>> *Envoyé: *Mardi 19 Juin 2012 12:00:25 >>> *Objet: *Re: [All] ITS 2.0 first draft, please review by >>> Thursday >>> >>> >>> >>> 2012/6/19 Jirka Kosek <jirka@kosek.cz >>> <mailto:jirka@kosek.cz>> >>> >>> On 19.6.2012 5:48, Felix Sasaki wrote: >>> >>> > Thanks for the reminder - just changed this. >>> > >>> > I also created a section including examples >>> > >>> http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#usage-in-html5 >>> > and >>> > >>> http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#selection-global-html5 >>> > please have a look. >>> >>> Looks good. Except small typo: >>> >>> <link href="EX-translateRule-html5-1.xml" >>> type="itsRules"/> >>> >>> Should read as: >>> >>> <link href="EX-translateRule-html5-1.xml" >>> rel="itsRules"/> >>> >>> Also I think that for consistency we should use >>> lower-case letters in >>> rel value, either type="itsrules" or type="its-rules". >>> >>> >>> Thanks, fixed. >>> >>> Felix >>> >>> >>> Jirka >>> -- >>> ------------------------------------------------------------------ >>> Jirka Kosek e-mail: jirka@kosek.cz >>> <mailto:jirka@kosek.cz> http://xmlguru.cz >>> ------------------------------------------------------------------ >>> Professional XML consulting and training services >>> DocBook customization, custom XSLT/XSL-FO document >>> processing >>> ------------------------------------------------------------------ >>> OASIS DocBook TC member, W3C Invited Expert, ISO >>> JTC1/SC34 member >>> ------------------------------------------------------------------ >>> >>> >>> >>> >>> -- >>> Felix Sasaki >>> DFKI / W3C Fellow >>> >>> >> >> >> >> >> -- >> Felix Sasaki >> DFKI / W3C Fellow >> > > > > > -- > Felix Sasaki > DFKI / W3C Fellow >
Received on Wednesday, 27 June 2012 14:28:45 UTC