- From: Tadej Štajner <tadej.stajner@ijs.si>
- Date: Wed, 27 Jun 2012 16:28:10 +0200
- To: Felix Sasaki <fsasaki@w3.org>
- CC: public-multilingualweb-lt@w3.org
- Message-ID: <4FEB187A.10800@ijs.si>
Hi,
there's a slight distinction between 'entity disambiguation' and 'word
sense disambiguation' but I think this is well-addressed with the
markup. The rest, I agree with.
I thought about the consolidated mark-up a bit more, and have some
examples. Something like this will go in the July spec:
* Entity:
** Word sense disambiguation
<span entityRel="wsd" entityIdent="synsets-836"
entityResource="http://example.com/myWordnet">bank</span>
** Named entity disambiguation
<span
entityType="ned" entityIdent="http://dbpedia.org/resource/Mike_Jones_(poet)"
entityResource="http://dbpedia.org/">Mike Jones</span>
** Named entity type
<span entityRel="ne" entityIdent="Person"
entityResource="http://www.schema.org/">Mike Jones</span>
* Term
<span entityRel="term" entityIdent="lexEntry473"
entityResource="http://example.com/myLexion">language technology</span>
With regard to the term data category, is it necessary to use the same
markup as in ITS1.0? For instance, what used to be its:term="yes" is now
ts:entityRel="term", etc.
-- Tadej
On 25. 06. 2012 10:32, Felix Sasaki wrote:
> Hi Tadej,
>
> sorry for the late reply. So this sounds like we would have an
> "entity" data category instead of "disambiguation". Disambiguation
> would then be one usage scenario for "entity".
>
> I had proposed at
> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Jun/0133.html
> that you, Tadej, write a "disambiguation" section, but maybe it makes
> sense to have an "entity" section with use cases (and markup) for
> "named entity" and "word sense disambiguation". The "terminology"
> aspect (linking to a term lexicon) could be realized by updating the
> existing terminology data category with a lexicon link.
>
> What do you or others think?
>
> Best,
>
> Felix
>
>
> 2012/6/21 Tadej Štajner <tadej.stajner@ijs.si
> <mailto:tadej.stajner@ijs.si>>
>
> Hi,
> this is feasible. The rationale behind my decision was that having
> individual attributes for different relationships is less verbose,
> at the expense of having more attributes in the spec. If
> minimising the latter is higher priority, then I agree with this way.
>
> Some points: in example 2, this syntax has now way to
> simultaneously express that the "Mike Jones" can actually be
> described with an pointer to a resource (let's say,
> http://dbpedia.org/resource/Mike_Jones_(poet))
> <http://dbpedia.org/resource/Mike_Jones_%28poet%29%29>. So,
> basically, saying both that he is a Person and that he's actually
> some concrete person. This entails introducing this distinction:
>
> for unknown but detected entities:
> <span entityType="ne-type" entityIdent="Person"
> entityResource="http://www.schema.org/">Mike Jones</span>
>
> for known entities:
> <span
> entityType="ne-ref" entityIdent="http://dbpedia.org/resource/Mike_Jones_(poet)"
> entityResource="http://dbpedia.org/">Mike Jones</span>
>
> which is not ideal and reduces expressivity, since we're unable to
> assert both at the same time within the same element. I guess
> nesting the elemets could work, but that's introducing
> complexities in markup. In a global selector setting, it's
> probably fine.
>
> And re your comments.
> - that's the current state, of the software, yes. Automation of 3)
> is possible provided that a term lexicon is specified.
> - agree, but there can be a pretty big number of such rules
> following this example, especially since we'd have to explicitly
> state every type mapping, since the selector doesn't reason that a
> itemtype=Musician (for example) is also a Person. Is this
> something that is worth maintaining?
>
> -- Tadej
>
>
> On 20. 06. 2012 20 <tel:06.%202012%2020>:41, Felix Sasaki wrote:
>> Tadej, all,
>>
>> I was looking at
>> http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#Terminology
>> and I'm wondering whether your proposal can be merged. Let me
>> start with examples bottom-up
>>
>> 1)
>> <span entityType="wsd" entityIdent="synsets-836"
>> entityResource="http://example.com/myWordnet">bank</span>
>> tries to capture
>> http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#disambiguation
>>
>> 2)
>> <span entityType="ne" entityIdent="Person"
>> entityResource="http://www.schema.org/">Mike Jones</span>
>> tries to capture
>> http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#namedEntity
>>
>> 3)
>> <span entityType="term" entityIdent="lexEntry473"
>> entityResource="http://example.com/myLexion">language
>> technology</span>
>> tries to capture
>> http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#terminology_2
>>
>> Does above merging make sense? One motivation for me is to
>> propose as less attributes as possible - in that way we can
>> Also, some general questions / comments:
>> - I assume that 1) and 2) could be automatically generated by
>> tools, but 3) not?
>> - to allow people to re-use existing annotations (e.g. from
>> schema.org <http://schema.org>), we could define global rules
>> like this:
>> <its:entity Rule selector="//div[@itemtype='Person']"
>> entityResource="http://www.schema.org/" entityType="ne"/>
>>
>> Felix
>>
>>
>> 2012/6/19 Tadej Stajner <tadej.stajner@ijs.si
>> <mailto:tadej.stajner@ijs.si>>
>>
>> Hi, Felix,
>> I've cleaned up the Terminology section in the requirements
>> document with regard to recent discussions on the list and in
>> Dublin. What kind of worklow do we have in order to update
>> the draft, to post recommendations, examples, etc? Is the
>> Requirements wiki page the right place for this?
>>
>> http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#Terminology
>>
>> -- Tadej
>>
>>
>>
>>
>> On 6/19/2012 12:09 PM, Maxime Lefrançois wrote:
>>> Hi,
>>>
>>> The taskforce is on the HTML to RDFa algorithm.
>>> It should be ready by tomorrow afternoon for review.
>>>
>>> Maxime
>>>
>>> ------------------------------------------------------------------------
>>>
>>> *De: *"Felix Sasaki" <fsasaki@w3.org>
>>> <mailto:fsasaki@w3.org>
>>> *À: *"Jirka Kosek" <jirka@kosek.cz> <mailto:jirka@kosek.cz>
>>> *Cc: *public-multilingualweb-lt@w3.org
>>> <mailto:public-multilingualweb-lt@w3.org>
>>> *Envoyé: *Mardi 19 Juin 2012 12:00:25
>>> *Objet: *Re: [All] ITS 2.0 first draft, please review by
>>> Thursday
>>>
>>>
>>>
>>> 2012/6/19 Jirka Kosek <jirka@kosek.cz
>>> <mailto:jirka@kosek.cz>>
>>>
>>> On 19.6.2012 5:48, Felix Sasaki wrote:
>>>
>>> > Thanks for the reminder - just changed this.
>>> >
>>> > I also created a section including examples
>>> >
>>> http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#usage-in-html5
>>> > and
>>> >
>>> http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#selection-global-html5
>>> > please have a look.
>>>
>>> Looks good. Except small typo:
>>>
>>> <link href="EX-translateRule-html5-1.xml"
>>> type="itsRules"/>
>>>
>>> Should read as:
>>>
>>> <link href="EX-translateRule-html5-1.xml"
>>> rel="itsRules"/>
>>>
>>> Also I think that for consistency we should use
>>> lower-case letters in
>>> rel value, either type="itsrules" or type="its-rules".
>>>
>>>
>>> Thanks, fixed.
>>>
>>> Felix
>>>
>>>
>>> Jirka
>>> --
>>> ------------------------------------------------------------------
>>> Jirka Kosek e-mail: jirka@kosek.cz
>>> <mailto:jirka@kosek.cz> http://xmlguru.cz
>>> ------------------------------------------------------------------
>>> Professional XML consulting and training services
>>> DocBook customization, custom XSLT/XSL-FO document
>>> processing
>>> ------------------------------------------------------------------
>>> OASIS DocBook TC member, W3C Invited Expert, ISO
>>> JTC1/SC34 member
>>> ------------------------------------------------------------------
>>>
>>>
>>>
>>>
>>> --
>>> Felix Sasaki
>>> DFKI / W3C Fellow
>>>
>>>
>>
>>
>>
>>
>> --
>> Felix Sasaki
>> DFKI / W3C Fellow
>>
>
>
>
>
> --
> Felix Sasaki
> DFKI / W3C Fellow
>
Received on Wednesday, 27 June 2012 14:28:45 UTC