Re: URL for chart of workflow and metadata from Tadej Stajner on 2012-04-12 (public-multilingualweb-lt@w3.org from April 2012)

From: Tadej Stajner <tadej.stajner@ijs.si>
Date: Thu, 12 Apr 2012 15:33:53 +0200
To: public-multilingualweb-lt@w3.org
Message-ID: <4F86D9C1.6050209@ijs.si>

On 4/12/2012 1:10 PM, Dave Lewis wrote:
> Arle,
> This is a great tool for helping us organise and consolidate the data categories. The
> hierarchical structure is helpful also.
>
> Some initial comments:
>
>> A few notes:
>>
>>   1. I think that the scenario here is probably missing a few steps
>>      (like integration of QA/revision with existing linguistic
>>      resources). I did add a few that weren't covered, but we may want
>>      to add more.
>>
> Yes, I think we need to add at least:
> - an MT engine training step. Declan, Dan, Pedro, any thoughts on the
> best name for this? Also should we distinguish between SMT training and
> EBMT 'training'/configuration?
> -  a postediting step under 'translation', since this may have some
> distinct meta-data requirements from human- and machine-only translation.
>
> We could also consider:
> - a text analytics training/configuration step - Tadej, any thoughts on
> this?

I think that TA training has a similar role in the architecture than MT 
training, they both make sense as auxiliary workflow steps (whose output 
is some model for annotation/translation).

Usually the best practice for this in TA training is to train on the 
same format as you would normally output, whatever that may be. The data 
categories' data model can therefore stay the same, the additional 
column would be simple: whatever the training could consume, the 
generation would generate. So we could just say that the same component 
should also consume the term/named entity/genre/domain/etc annotation in 
order to train itself.

-- Tadej

>>   1. I'm really quite uncertain about what to put in these two
>>      sections: Annotate w/process&  qual data, Annotate provenance of
>>      lang. res. Anyone who knows more than I do, please feel free to
>>      fix those columns
>>
> Declan, Pedro, Phil, could you provide some guidance here?
>
>>   1. I know I really should use the wiki for this, but editing and
>>      maintaining the content as HTML code in the wiki would be almost
>>      impossible. When we have a final working version, I will be happy
>>      to do a*ONE-TIME*  conversion into HTML code for the wiki, but I
>>      do not want to make a habit of working with that HTML code or
>>      cleaning it up if it gets broken (which it would, given the
>>      complexity). If there is another easy-to-use resource for
>>      maintaining this that I should use (other than a Google doc),
>>      please let me know.
>>
> I tend to agree maintaining this in wikitext could be tricky, but we
> should try and use some automated conversion so the public wiki has an
> uptodate version, and we people can comment via the mailing list for us
> to make changes.
>
> Lets discuss this further on thursday's WG call.
> Regards,
> Dave

Received on Thursday, 12 April 2012 13:34:43 UTC