W3C home > Mailing lists > Public > public-multilingualweb-lt@w3.org > June 2012

Re: Terminology merging ? (Re: [All] ITS 2.0 first draft, please review by Thursday)

From: Felix Sasaki <fsasaki@w3.org>
Date: Thu, 28 Jun 2012 09:46:05 +0200
Message-ID: <CAL58czpMEpXmv_08S_ZDoAcJWHU4_mn8Hf+P3UJ0sk4mTiooEQ@mail.gmail.com>
To: Tadej Štajner <tadej.stajner@ijs.si>
Cc: public-multilingualweb-lt@w3.org
Hi Tadej all,

thanks a lot for this, and let's discuss this today too. The bottom line is
that we cannot change the existing markup of ITS "Terminology" data
category, so we need to somehow integrate or relate your (very useful)
differentiations to ITS 1.0.

Felix

2012/6/27 Tadej Štajner <tadej.stajner@ijs.si>

>  Hi,
> there's a slight distinction between 'entity disambiguation' and 'word
> sense disambiguation' but I think this is well-addressed with the markup.
> The rest, I agree with.
>
> I thought about the consolidated mark-up a bit more, and have some
> examples. Something like this will go in the July spec:
>
> * Entity:
>    ** Word sense disambiguation
> <span entityRel="wsd" entityIdent="synsets-836" entityResource="
> http://example.com/myWordnet">bank</span>
>
>     ** Named entity disambiguation
> <span entityType="ned" entityIdent="http://dbpedia.org/resource/
> Mike_Jones_(poet)" entityResource="http://dbpedia.org/">Mike Jones</span>
>
>     ** Named entity type
> <span entityRel="ne" entityIdent="Person" entityResource="
> http://www.schema.org/">Mike Jones</span>
>
> * Term
> <span entityRel="term" entityIdent="lexEntry473" entityResource="
> http://example.com/myLexion">language technology</span>
>
> With regard to the term data category, is it necessary to use the same
> markup as in ITS1.0? For instance, what used to be its:term="yes" is now
> ts:entityRel="term", etc.
>
> -- Tadej
>
>
>
> On 25. 06. 2012 10:32, Felix Sasaki wrote:
>
> Hi Tadej,
>
>  sorry for the late reply. So this sounds like we would have an "entity"
> data category instead of "disambiguation". Disambiguation would then be one
> usage scenario for "entity".
>
>  I had proposed at
>
> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Jun/0133.html
> that you, Tadej, write a "disambiguation" section, but maybe it makes
> sense to have an "entity" section with use cases (and markup) for "named
> entity" and "word sense disambiguation". The "terminology" aspect (linking
> to a term lexicon) could be realized by updating the existing terminology
> data category with a lexicon link.
>
>  What do you or others think?
>
>  Best,
>
>  Felix
>
>
>  2012/6/21 Tadej Štajner <tadej.stajner@ijs.si>
>
>>  Hi,
>> this is feasible. The rationale behind my decision was that having
>> individual attributes for different relationships is less verbose, at the
>> expense of having more attributes in the spec. If minimising the latter is
>> higher priority, then I agree with this way.
>>
>> Some points: in example 2, this syntax has now way to simultaneously
>> express that the "Mike Jones" can actually be described with an pointer to
>> a resource (let's say, http://dbpedia.org/resource/Mike_Jones_(poet)).
>> So, basically, saying both that he is a Person and that he's actually some
>> concrete person. This entails introducing this distinction:
>>
>> for unknown but detected entities:
>> <span entityType="ne-type" entityIdent="Person" entityResource="
>> http://www.schema.org/">Mike Jones</span>
>>
>> for known entities:
>> <span entityType="ne-ref" entityIdent="http://dbpedia.org/resource/
>> Mike_Jones_(poet)" entityResource="http://dbpedia.org/">Mike Jones</span>
>>
>> which is not ideal and reduces expressivity, since we're unable to assert
>> both at the same time within the same element. I guess nesting the elemets
>> could work, but that's introducing complexities in markup. In a global
>> selector setting, it's probably fine.
>>
>> And re your comments.
>> - that's the current state, of the software, yes. Automation of 3) is
>> possible provided that a term lexicon is specified.
>> - agree, but there can be a pretty big number of such rules following
>> this example, especially since we'd have to explicitly state every type
>> mapping, since the selector doesn't reason that a itemtype=Musician (for
>> example) is also a Person. Is this something that is worth maintaining?
>>
>> -- Tadej
>>
>>
>> On 20. 06. 2012 20:41, Felix Sasaki wrote:
>>
>>  Tadej, all,
>>
>>  I was looking at
>>
>> http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#Terminology
>> and I'm wondering whether your proposal can be merged. Let me start with
>> examples bottom-up
>>
>>  1)
>> <span entityType="wsd" entityIdent="synsets-836" entityResource="
>> http://example.com/myWordnet">bank</span>
>> tries to capture
>> http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#disambiguation
>>
>>  2)
>> <span entityType="ne" entityIdent="Person" entityResource="
>> http://www.schema.org/">Mike Jones</span>
>> tries to capture
>> http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#namedEntity
>>
>>  3)
>> <span entityType="term" entityIdent="lexEntry473" entityResource="
>> http://example.com/myLexion">language technology</span>
>> tries to capture
>> http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#terminology_2
>>
>>  Does above merging make sense? One motivation for me is to propose as
>> less attributes as possible - in that way we can
>> Also, some general questions / comments:
>> - I assume that 1) and 2) could be automatically generated by tools, but
>> 3) not?
>> - to allow people to re-use existing annotations (e.g. from schema.org),
>> we could define global rules like this:
>> <its:entity Rule selector="//div[@itemtype='Person']" entityResource="
>> http://www.schema.org/" entityType="ne"/>
>>
>>  Felix
>>
>>
>>  2012/6/19 Tadej Stajner <tadej.stajner@ijs.si>
>>
>>>  Hi, Felix,
>>> I've cleaned up the Terminology section in the requirements document
>>> with regard to recent discussions on the list and in Dublin. What kind of
>>> worklow do we have in order to update the draft, to post recommendations,
>>> examples, etc? Is the Requirements wiki page the right place for this?
>>>
>>>
>>> http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#Terminology
>>>
>>>  -- Tadej
>>>
>>>
>>>
>>>
>>> On 6/19/2012 12:09 PM, Maxime Lefrançois wrote:
>>>
>>> Hi,
>>>
>>>  The taskforce is on the HTML to RDFa algorithm.
>>> It should be ready by tomorrow afternoon for review.
>>>
>>>  Maxime
>>>
>>> ------------------------------
>>>
>>> *De: *"Felix Sasaki" <fsasaki@w3.org> <fsasaki@w3.org>
>>> *À: *"Jirka Kosek" <jirka@kosek.cz> <jirka@kosek.cz>
>>> *Cc: *public-multilingualweb-lt@w3.org
>>> *Envoyé: *Mardi 19 Juin 2012 12:00:25
>>> *Objet: *Re: [All] ITS 2.0 first draft, please review by Thursday
>>>
>>>
>>>
>>> 2012/6/19 Jirka Kosek <jirka@kosek.cz>
>>>
>>>> On 19.6.2012 5:48, Felix Sasaki wrote:
>>>>
>>>> > Thanks for the reminder  - just changed this.
>>>> >
>>>> > I also created a section including examples
>>>> >
>>>> http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#usage-in-html5
>>>> > and
>>>> >
>>>> http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#selection-global-html5
>>>> > please have a look.
>>>>
>>>>  Looks good. Except small typo:
>>>>
>>>> <link href="EX-translateRule-html5-1.xml" type="itsRules"/>
>>>>
>>>> Should read as:
>>>>
>>>> <link href="EX-translateRule-html5-1.xml" rel="itsRules"/>
>>>>
>>>> Also I think that for consistency we should use lower-case letters in
>>>> rel value, either type="itsrules" or type="its-rules".
>>>>
>>>
>>>  Thanks, fixed.
>>>
>>>  Felix
>>>
>>>
>>>>
>>>>                        Jirka
>>>> --
>>>> ------------------------------------------------------------------
>>>>  Jirka Kosek      e-mail: jirka@kosek.cz      http://xmlguru.cz
>>>> ------------------------------------------------------------------
>>>>       Professional XML consulting and training services
>>>>  DocBook customization, custom XSLT/XSL-FO document processing
>>>> ------------------------------------------------------------------
>>>>  OASIS DocBook TC member, W3C Invited Expert, ISO JTC1/SC34 member
>>>> ------------------------------------------------------------------
>>>>
>>>>
>>>
>>>
>>>  --
>>> Felix Sasaki
>>> DFKI / W3C Fellow
>>>
>>>
>>>
>>>
>>
>>
>>  --
>> Felix Sasaki
>> DFKI / W3C Fellow
>>
>>
>>
>
>
>  --
> Felix Sasaki
> DFKI / W3C Fellow
>
>
>


-- 
Felix Sasaki
DFKI / W3C Fellow
Received on Thursday, 28 June 2012 07:46:35 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 9 June 2013 00:24:56 UTC