W3C home > Mailing lists > Public > public-multilingualweb-lt@w3.org > June 2012

Re: Terminology merging ? (Re: [All] ITS 2.0 first draft, please review by Thursday)

From: Tadej Štajner <tadej.stajner@ijs.si>
Date: Wed, 27 Jun 2012 16:28:10 +0200
Message-ID: <4FEB187A.10800@ijs.si>
To: Felix Sasaki <fsasaki@w3.org>
CC: public-multilingualweb-lt@w3.org
Hi,
there's a slight distinction between 'entity disambiguation' and 'word 
sense disambiguation' but I think this is well-addressed with the 
markup. The rest, I agree with.

I thought about the consolidated mark-up a bit more, and have some 
examples. Something like this will go in the July spec:

* Entity:
    ** Word sense disambiguation
<span entityRel="wsd" entityIdent="synsets-836" 
entityResource="http://example.com/myWordnet">bank</span>

     ** Named entity disambiguation
<span 
entityType="ned" entityIdent="http://dbpedia.org/resource/Mike_Jones_(poet)" 
entityResource="http://dbpedia.org/">Mike Jones</span>

     ** Named entity type
<span entityRel="ne" entityIdent="Person" 
entityResource="http://www.schema.org/">Mike Jones</span>

* Term
<span entityRel="term" entityIdent="lexEntry473" 
entityResource="http://example.com/myLexion">language technology</span>

With regard to the term data category, is it necessary to use the same 
markup as in ITS1.0? For instance, what used to be its:term="yes" is now 
ts:entityRel="term", etc.

-- Tadej


On 25. 06. 2012 10:32, Felix Sasaki wrote:
> Hi Tadej,
>
> sorry for the late reply. So this sounds like we would have an 
> "entity" data category instead of "disambiguation". Disambiguation 
> would then be one usage scenario for "entity".
>
> I had proposed at
> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Jun/0133.html
> that you, Tadej, write a "disambiguation" section, but maybe it makes 
> sense to have an "entity" section with use cases (and markup) for 
> "named entity" and "word sense disambiguation". The "terminology" 
> aspect (linking to a term lexicon) could be realized by updating the 
> existing terminology data category with a lexicon link.
>
> What do you or others think?
>
> Best,
>
> Felix
>
>
> 2012/6/21 Tadej Štajner <tadej.stajner@ijs.si 
> <mailto:tadej.stajner@ijs.si>>
>
>     Hi,
>     this is feasible. The rationale behind my decision was that having
>     individual attributes for different relationships is less verbose,
>     at the expense of having more attributes in the spec. If
>     minimising the latter is higher priority, then I agree with this way.
>
>     Some points: in example 2, this syntax has now way to
>     simultaneously express that the "Mike Jones" can actually be
>     described with an pointer to a resource (let's say,
>     http://dbpedia.org/resource/Mike_Jones_(poet))
>     <http://dbpedia.org/resource/Mike_Jones_%28poet%29%29>. So,
>     basically, saying both that he is a Person and that he's actually
>     some concrete person. This entails introducing this distinction:
>
>     for unknown but detected entities:
>     <span entityType="ne-type" entityIdent="Person"
>     entityResource="http://www.schema.org/">Mike Jones</span>
>
>     for known entities:
>     <span
>     entityType="ne-ref" entityIdent="http://dbpedia.org/resource/Mike_Jones_(poet)"
>     entityResource="http://dbpedia.org/">Mike Jones</span>
>
>     which is not ideal and reduces expressivity, since we're unable to
>     assert both at the same time within the same element. I guess
>     nesting the elemets could work, but that's introducing
>     complexities in markup. In a global selector setting, it's
>     probably fine.
>
>     And re your comments.
>     - that's the current state, of the software, yes. Automation of 3)
>     is possible provided that a term lexicon is specified.
>     - agree, but there can be a pretty big number of such rules
>     following this example, especially since we'd have to explicitly
>     state every type mapping, since the selector doesn't reason that a
>     itemtype=Musician (for example) is also a Person. Is this
>     something that is worth maintaining?
>
>     -- Tadej
>
>
>     On 20. 06. 2012 20 <tel:06.%202012%2020>:41, Felix Sasaki wrote:
>>     Tadej, all,
>>
>>     I was looking at
>>     http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#Terminology
>>     and I'm wondering whether your proposal can be merged. Let me
>>     start with examples bottom-up
>>
>>     1)
>>     <span entityType="wsd" entityIdent="synsets-836"
>>     entityResource="http://example.com/myWordnet">bank</span>
>>     tries to capture
>>     http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#disambiguation
>>
>>     2)
>>     <span entityType="ne" entityIdent="Person"
>>     entityResource="http://www.schema.org/">Mike Jones</span>
>>     tries to capture
>>     http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#namedEntity
>>
>>     3)
>>     <span entityType="term" entityIdent="lexEntry473"
>>     entityResource="http://example.com/myLexion">language
>>     technology</span>
>>     tries to capture
>>     http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#terminology_2
>>
>>     Does above merging make sense? One motivation for me is to
>>     propose as less attributes as possible - in that way we can
>>     Also, some general questions / comments:
>>     - I assume that 1) and 2) could be automatically generated by
>>     tools, but 3) not?
>>     - to allow people to re-use existing annotations (e.g. from
>>     schema.org <http://schema.org>), we could define global rules
>>     like this:
>>     <its:entity Rule selector="//div[@itemtype='Person']"
>>     entityResource="http://www.schema.org/" entityType="ne"/>
>>
>>     Felix
>>
>>
>>     2012/6/19 Tadej Stajner <tadej.stajner@ijs.si
>>     <mailto:tadej.stajner@ijs.si>>
>>
>>         Hi, Felix,
>>         I've cleaned up the Terminology section in the requirements
>>         document with regard to recent discussions on the list and in
>>         Dublin. What kind of worklow do we have in order to update
>>         the draft, to post recommendations, examples, etc? Is the
>>         Requirements wiki page the right place for this?
>>
>>         http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#Terminology
>>
>>          -- Tadej
>>
>>
>>
>>
>>         On 6/19/2012 12:09 PM, Maxime Lefrançois wrote:
>>>         Hi,
>>>
>>>         The taskforce is on the HTML to RDFa algorithm.
>>>         It should be ready by tomorrow afternoon for review.
>>>
>>>         Maxime
>>>
>>>         ------------------------------------------------------------------------
>>>
>>>             *De: *"Felix Sasaki" <fsasaki@w3.org>
>>>             <mailto:fsasaki@w3.org>
>>>             *À: *"Jirka Kosek" <jirka@kosek.cz> <mailto:jirka@kosek.cz>
>>>             *Cc: *public-multilingualweb-lt@w3.org
>>>             <mailto:public-multilingualweb-lt@w3.org>
>>>             *Envoyé: *Mardi 19 Juin 2012 12:00:25
>>>             *Objet: *Re: [All] ITS 2.0 first draft, please review by
>>>             Thursday
>>>
>>>
>>>
>>>             2012/6/19 Jirka Kosek <jirka@kosek.cz
>>>             <mailto:jirka@kosek.cz>>
>>>
>>>                 On 19.6.2012 5:48, Felix Sasaki wrote:
>>>
>>>                 > Thanks for the reminder  - just changed this.
>>>                 >
>>>                 > I also created a section including examples
>>>                 >
>>>                 http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#usage-in-html5
>>>                 > and
>>>                 >
>>>                 http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#selection-global-html5
>>>                 > please have a look.
>>>
>>>                 Looks good. Except small typo:
>>>
>>>                 <link href="EX-translateRule-html5-1.xml"
>>>                 type="itsRules"/>
>>>
>>>                 Should read as:
>>>
>>>                 <link href="EX-translateRule-html5-1.xml"
>>>                 rel="itsRules"/>
>>>
>>>                 Also I think that for consistency we should use
>>>                 lower-case letters in
>>>                 rel value, either type="itsrules" or type="its-rules".
>>>
>>>
>>>             Thanks, fixed.
>>>
>>>             Felix
>>>
>>>
>>>                                        Jirka
>>>                 --
>>>                 ------------------------------------------------------------------
>>>                  Jirka Kosek      e-mail: jirka@kosek.cz
>>>                 <mailto:jirka@kosek.cz> http://xmlguru.cz
>>>                 ------------------------------------------------------------------
>>>                       Professional XML consulting and training services
>>>                  DocBook customization, custom XSLT/XSL-FO document
>>>                 processing
>>>                 ------------------------------------------------------------------
>>>                  OASIS DocBook TC member, W3C Invited Expert, ISO
>>>                 JTC1/SC34 member
>>>                 ------------------------------------------------------------------
>>>
>>>
>>>
>>>
>>>             -- 
>>>             Felix Sasaki
>>>             DFKI / W3C Fellow
>>>
>>>
>>
>>
>>
>>
>>     -- 
>>     Felix Sasaki
>>     DFKI / W3C Fellow
>>
>
>
>
>
> -- 
> Felix Sasaki
> DFKI / W3C Fellow
>
Received on Wednesday, 27 June 2012 14:28:45 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 9 June 2013 00:24:56 UTC