W3C home > Mailing lists > Public > public-multilingualweb-lt@w3.org > July 2012

Re: mlw-lt-track-ISSUE-35 (named-entity-syntax-could-use-rdfa-and-microdata): Named entity syntax could use microdata and RDFa in HTML5, and a dedicated syntax in XML only [MLW-LT Standard Draft]

From: Tadej Stajner <tadej.stajner@ijs.si>
Date: Tue, 10 Jul 2012 10:37:11 +0200
Message-Id: <85A1B38E-4489-41CB-AF79-4B2A7A85C1F6@ijs.si>
Cc: "public-multilingualweb-lt@w3.org" <public-multilingualweb-lt@w3.org>
To: MultilingualWeb-LT Working Group <public-multilingualweb-lt@w3.org>
Hi, the reasoning for this particular was separating the entity and the annotation of the content as two distinct elements. I like the idea of keeping the type info in ITS XML space, it would make for cleaner transforms. 

-- Tadej

On 5. jul. 2012, at 23:48, MultilingualWeb-LT Working Group Issue Tracker <sysbot+tracker@w3.org> wrote:

> mlw-lt-track-ISSUE-35 (named-entity-syntax-could-use-rdfa-and-microdata): Named entity syntax could use microdata and RDFa in HTML5, and a dedicated syntax in XML only  [MLW-LT Standard Draft]
> 
> http://www.w3.org/International/multilingualweb/lt/track/issues/35
> 
> Raised by: Felix Sasaki
> On product: MLW-LT Standard Draft
> 
> Hi Tadej esp. and all,
> 
> today I looked at some automatic annotation output I got from Michael. It was created by Enrycher. As I understand it this is experimental, but I wanted to bring one aspect to your attention. The below is simplified:
> 
> Input:
> 
>            <p>After a century of near domination from the likes of Italy and Germany, international soccer is entering the era of the Cinderella. Russia's Yuri Zhirkov ...</p>
> 
> Output:
> 
>            <p>After a century of near domination from the likes of <span itsx-lexicalizes="dbr:Italy" itsx-entity-type="http://schema.org/Place">Italy</span> and <span itsx-lexicalizes="dbr:Germany" itsx-entity-type="http://schema.org/Place">Germany</span>, international soccer is entering the era of the Cinderella. Russia's <span itsx-lexicalizes="dbr:Yuri_Zhirkov" itsx-entity-type="http://schema.org/Person">Yuri Zhirkov</span> ...</p>
> 
> 
> It strikes me that we are probably re-inventing the wheel: large parts of the web community are now heading towards RDFa (light) and microdata for named entities, and we are inventing a new syntax.
> 
> So I am wondering whether we shouldn't just describe a best practice to create something like this out of an automatic annotation process:            
> 
>            <p>After a century of near domination from the likes of <span itemscope='' itemtype="http://schema.org/Place" itemprop="name">Italy</span> and <span itemscope='' itemtype="http://schema.org/Place" itemprop="name">Germany</span>, international soccer is entering the era of the Cinderella. Russia's <span itemscope='' itemtype="http://schema.org/Person" itemprop="name">Yuri Zhirkov</span> ...</p>    
> 
> 
> For this, we then already can expect uptake from search engines, and lot's of tools http://schema.rdfs.org/tools.html
> 
> I still see a use case for a dedicated "named entity" data category, but rather in a localization chain and in XML, in a workflow like this:
> 
> 1) HTML is enriched with the microdata result described above, or its RDFa 1.1. light counterpart. 
> 
> 2) We specify dedicated local markup for entities only in XML, e.g. its:entityType
> 
> 3) To "glue" 1) and 2) together, we when have a mapping rule like
> 
> <its:namedEntityRule selector="//*[@itemtype]" entityTypePointer="@itemtype"/>
> 
> No. 1) would also help us with our charter issue, btw.
> 
> This approach would also relate to
> 
> ISSUE-2 microdata mapping, since we won't map for named entities to microdata and RDFa - they would be available as these from the beginning. 
> ISSUE-18 dropping RDFa, since: we won't drop it, but actually do it, at least RDFa light 1.1. 
> ISSUE-29 ITS and RDF, since we do 
> 
> Thoughts?
> 
> 
> 
Received on Tuesday, 10 July 2012 08:34:42 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:31:47 UTC