W3C home > Mailing lists > Public > public-multilingualweb-lt@w3.org > June 2012

Re: Terminology merging ? (Re: [All] ITS 2.0 first draft, please review by Thursday)

From: Felix Sasaki <fsasaki@w3.org>
Date: Mon, 25 Jun 2012 10:32:45 +0200
Message-ID: <CAL58czqYgoDNewDbbinbXsu7s9JTXWnSZwa-Rh=0+6FSza1fTA@mail.gmail.com>
To: Tadej Štajner <tadej.stajner@ijs.si>
Cc: public-multilingualweb-lt@w3.org
Hi Tadej,

sorry for the late reply. So this sounds like we would have an "entity"
data category instead of "disambiguation". Disambiguation would then be one
usage scenario for "entity".

I had proposed at
http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Jun/0133.html
that you, Tadej, write a "disambiguation" section, but maybe it makes sense
to have an "entity" section with use cases (and markup) for "named entity"
and "word sense disambiguation". The "terminology" aspect (linking to a
term lexicon) could be realized by updating the existing terminology data
category with a lexicon link.

What do you or others think?

Best,

Felix


2012/6/21 Tadej Štajner <tadej.stajner@ijs.si>

>  Hi,
> this is feasible. The rationale behind my decision was that having
> individual attributes for different relationships is less verbose, at the
> expense of having more attributes in the spec. If minimising the latter is
> higher priority, then I agree with this way.
>
> Some points: in example 2, this syntax has now way to simultaneously
> express that the "Mike Jones" can actually be described with an pointer to
> a resource (let's say, http://dbpedia.org/resource/Mike_Jones_(poet)).
> So, basically, saying both that he is a Person and that he's actually some
> concrete person. This entails introducing this distinction:
>
> for unknown but detected entities:
> <span entityType="ne-type" entityIdent="Person" entityResource="
> http://www.schema.org/">Mike Jones</span>
>
> for known entities:
> <span entityType="ne-ref" entityIdent="http://dbpedia.org/resource/
> Mike_Jones_(poet)" entityResource="http://dbpedia.org/">Mike Jones</span>
>
> which is not ideal and reduces expressivity, since we're unable to assert
> both at the same time within the same element. I guess nesting the elemets
> could work, but that's introducing complexities in markup. In a global
> selector setting, it's probably fine.
>
> And re your comments.
> - that's the current state, of the software, yes. Automation of 3) is
> possible provided that a term lexicon is specified.
> - agree, but there can be a pretty big number of such rules following this
> example, especially since we'd have to explicitly state every type mapping,
> since the selector doesn't reason that a itemtype=Musician (for example) is
> also a Person. Is this something that is worth maintaining?
>
> -- Tadej
>
>
> On 20. 06. 2012 20:41, Felix Sasaki wrote:
>
>  Tadej, all,
>
>  I was looking at
>
> http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#Terminology
> and I'm wondering whether your proposal can be merged. Let me start with
> examples bottom-up
>
>  1)
> <span entityType="wsd" entityIdent="synsets-836" entityResource="
> http://example.com/myWordnet">bank</span>
> tries to capture
> http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#disambiguation
>
>  2)
> <span entityType="ne" entityIdent="Person" entityResource="
> http://www.schema.org/">Mike Jones</span>
> tries to capture
> http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#namedEntity
>
>  3)
> <span entityType="term" entityIdent="lexEntry473" entityResource="
> http://example.com/myLexion">language technology</span>
> tries to capture
> http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#terminology_2
>
>  Does above merging make sense? One motivation for me is to propose as
> less attributes as possible - in that way we can
> Also, some general questions / comments:
> - I assume that 1) and 2) could be automatically generated by tools, but
> 3) not?
> - to allow people to re-use existing annotations (e.g. from schema.org),
> we could define global rules like this:
> <its:entity Rule selector="//div[@itemtype='Person']" entityResource="
> http://www.schema.org/" entityType="ne"/>
>
>  Felix
>
>
>  2012/6/19 Tadej Stajner <tadej.stajner@ijs.si>
>
>>  Hi, Felix,
>> I've cleaned up the Terminology section in the requirements document with
>> regard to recent discussions on the list and in Dublin. What kind of
>> worklow do we have in order to update the draft, to post recommendations,
>> examples, etc? Is the Requirements wiki page the right place for this?
>>
>>
>> http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#Terminology
>>
>>  -- Tadej
>>
>>
>>
>>
>> On 6/19/2012 12:09 PM, Maxime Lefrançois wrote:
>>
>> Hi,
>>
>>  The taskforce is on the HTML to RDFa algorithm.
>> It should be ready by tomorrow afternoon for review.
>>
>>  Maxime
>>
>> ------------------------------
>>
>> *De: *"Felix Sasaki" <fsasaki@w3.org> <fsasaki@w3.org>
>> *À: *"Jirka Kosek" <jirka@kosek.cz> <jirka@kosek.cz>
>> *Cc: *public-multilingualweb-lt@w3.org
>> *Envoyé: *Mardi 19 Juin 2012 12:00:25
>> *Objet: *Re: [All] ITS 2.0 first draft, please review by Thursday
>>
>>
>>
>> 2012/6/19 Jirka Kosek <jirka@kosek.cz>
>>
>>> On 19.6.2012 5:48, Felix Sasaki wrote:
>>>
>>> > Thanks for the reminder  - just changed this.
>>> >
>>> > I also created a section including examples
>>> >
>>> http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#usage-in-html5
>>> > and
>>> >
>>> http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#selection-global-html5
>>> > please have a look.
>>>
>>>  Looks good. Except small typo:
>>>
>>> <link href="EX-translateRule-html5-1.xml" type="itsRules"/>
>>>
>>> Should read as:
>>>
>>> <link href="EX-translateRule-html5-1.xml" rel="itsRules"/>
>>>
>>> Also I think that for consistency we should use lower-case letters in
>>> rel value, either type="itsrules" or type="its-rules".
>>>
>>
>>  Thanks, fixed.
>>
>>  Felix
>>
>>
>>>
>>>                        Jirka
>>> --
>>> ------------------------------------------------------------------
>>>  Jirka Kosek      e-mail: jirka@kosek.cz      http://xmlguru.cz
>>> ------------------------------------------------------------------
>>>       Professional XML consulting and training services
>>>  DocBook customization, custom XSLT/XSL-FO document processing
>>> ------------------------------------------------------------------
>>>  OASIS DocBook TC member, W3C Invited Expert, ISO JTC1/SC34 member
>>> ------------------------------------------------------------------
>>>
>>>
>>
>>
>>  --
>> Felix Sasaki
>> DFKI / W3C Fellow
>>
>>
>>
>>
>
>
>  --
> Felix Sasaki
> DFKI / W3C Fellow
>
>
>


-- 
Felix Sasaki
DFKI / W3C Fellow
Received on Monday, 25 June 2012 08:33:17 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 9 June 2013 00:24:56 UTC