Re: LD4LT Teleconference, Today 3pm CET, on Linghub

John,
Prov-o would be one way of capturing this, using prov:wasDerivedFrom and 
optionally additional activity meta-data about the harvesting process 
itself.

cheers,
Dave


On 20/03/2015 13:18, Khalid Choukri wrote:
> Hi John
> Thanks for the clarification,
> This is an essential and tricky issue, we should insist that harvested 
> data is labeled as such and hence prevent people form harvesting 
> things from secondary sources.
> I am not sure you can, at this stage, filter duplicate records.
>
> For us (ELRA)  all the records were provided to you including via 
> Meta-share, in the worst case some thing like "ELRA (via META-SHARE)" 
> could be OK  (though we think  ELRA  should be the sole source for 
> ELRA catalogued resources).
>
> Best regards
> Khalid
>
>
> On 20/03/2015 12:41, John P. McCrae wrote:
>> Hi Khalid,
>>
>> The source property is intended to indicate where *we* got the record 
>> from, in this case actually from CLARIN! Would it be better if I 
>> clarify it by making it something like, for example "Meertens 
>> Institute (via CLARIN VLO)", or "ELRA (via CLARIN VLO)"?
>>
>> Regards,
>> John
>>
>> On Thu, Mar 19, 2015 at 7:03 PM, Khalid Choukri <choukri@elda.org 
>> <mailto:choukri@elda.org>> wrote:
>>
>>     Hi John
>>
>>     I am so sorry I missed the telco
>>     I manage to review the slides and the web site and I realise that
>>     you harvested so many sources which also harvest other sources in
>>     a very long loop; and leading to many wrong labelling  (but at
>>     least I understand why you mention that our community has over
>>     100K resources)
>>
>>     As examples searching the titles with "Speecon" , a resource
>>     available only from ELRA catalogue I see that it is labeled as: 
>>     Souce : CLARIN
>>
>>>     Thai Speecon database
>>>     <http://linghub.org/clarin/European_Language_Resources_Association/oai_catalogue_elra_info_ELRA_S0288>
>>>     Description <http://purl.org/dc/elements/1.1/description>
>>>     Desktop/Microphone
>>>     Language <http://purl.org/dc/terms/language>  Thai
>>>     <http://www.lexvo.org/id/iso639-3/tha>
>>>     Source <http://purl.org/dc/elements/1.1/source>  CLARIN
>>>     Title <http://purl.org/dc/elements/1.1/title>  Thai Speecon
>>>     database
>>>
>>>     Czech Speecon database
>>>     <http://linghub.org/lremap/efd68ccbda3ae46c3f4c04db49d34989>
>>>     Language <http://purl.org/dc/terms/language>  Czech
>>>     <http://www.lexvo.org/id/iso639-3/ces>
>>>     Title <http://purl.org/dc/elements/1.1/title>  Czech Speecon
>>>     database
>>>     Type <http://purl.org/dc/terms/type>  Corpus
>>>     <http://babelnet.org/rdf/s00022825n>
>>>
>>>     Czech Speecon database
>>>     <http://linghub.org/clarin/European_Language_Resources_Association/oai_catalogue_elra_info_ELRA_S0298>
>>>     Description <http://purl.org/dc/elements/1.1/description>
>>>     Desktop/Microphone
>>>     Language <http://purl.org/dc/terms/language>  Czech
>>>     <http://www.lexvo.org/id/iso639-3/ces>
>>>     Source <http://purl.org/dc/elements/1.1/source>  CLARIN
>>>     Title <http://purl.org/dc/elements/1.1/title>  Czech Speecon
>>>     database
>>>
>>
>>
>>     The same applies to Eurom1:
>>>     EUROM1_fr
>>>     <http://linghub.org/clarin/Speech_and_Language_Data_Repository/oai_sldr_org_sldr000035>
>>>     Contributor <http://purl.org/dc/elements/1.1/contributor>  SAM_A
>>>     European project
>>>     Creator <http://purl.org/dc/elements/1.1/creator>  Institut de
>>>     la communication parlée (ICP, Grenoble FR)
>>>     Description <http://purl.org/dc/elements/1.1/description>  The
>>>     EUROM1 database contains recordings of 60 speakers in eleven
>>>     European Languages: Danish, Dutch, British English, French,
>>>     German, Norwegian, Swedish, Dutch, Greek, Portuguese and
>>>     Spanish. It was explicitly designed to aid the phonetic
>>>     comparison of languages, with similar materials and recording
>>>     protocols in all languages.<br />Only French EUROM1 is
>>>     accessible here. It was used as a resource for the MULTEXT
>>>     project.<br />This version has been reformatted for compliance
>>>     with long-term preservation specifications.
>>>     Rights <http://purl.org/dc/elements/1.1/rights>
>>>     info:eu-repo/date/submitted/2008-09-01
>>>     Source <http://purl.org/dc/elements/1.1/source>  CLARIN
>>>     Subject <http://purl.org/dc/elements/1.1/subject>  
>>>     Title <http://purl.org/dc/elements/1.1/title>  EUROM1_fr
>>>
>>
>>     and to others ....
>>>     GlobalPhone Korean
>>>     <http://linghub.org/clarin/European_Language_Resources_Association/oai_catalogue_elra_info_ELRA_S0200>
>>>     Description <http://purl.org/dc/elements/1.1/description>
>>>     Desktop/Microphone
>>>     Language <http://purl.org/dc/terms/language>  Korean
>>>     <http://www.lexvo.org/id/iso639-3/kor>
>>>     Source <http://purl.org/dc/elements/1.1/source>  CLARIN
>>>     Title <http://purl.org/dc/elements/1.1/title>  GlobalPhone Korean
>>>
>>
>>
>>     as you can imagine this is misleading, and I am wondering if we
>>     can help you correct this.
>>
>>     All the best
>>     Khalid
>>
>>
>>
>>
>>
>>
>>
>>     On 19/03/2015 15:08, John P. McCrae wrote:
>>>     For those who have not yet joined the link to the GotoMeeting is
>>>     here
>>>
>>>     https://global.gotomeeting.com/join/360074461
>>>
>>>     Regards,
>>>     John
>>>
>>>     On Thu, Mar 19, 2015 at 2:16 PM, John P. McCrae
>>>     <jmccrae@cit-ec.uni-bielefeld.de
>>>     <mailto:jmccrae@cit-ec.uni-bielefeld.de>> wrote:
>>>
>>>         Dear all,
>>>
>>>         In the teleconference this afternoon we will present Linghub
>>>         <http://linghub.org/>, the work of several members of this
>>>         group and the LIDER project.
>>>
>>>         Here are some slides I will present to start the discussion:
>>>
>>>         https://docs.google.com/presentation/d/1ZDzHYcgHvqzp_zK77vGFZ36kEMEmNt9rw7kBJmrhetQ/edit?usp=sharing
>>>
>>>         Regards,
>>>         John P. McCrae
>>>
>>>
>>
>>     -- 
>>
>>     *************************************************
>>     *Khalid CHOUKRI *
>>     ELRA General Secretary & ELDA CEO
>>     email: choukri@elda.org <mailto:choukri@elda.org> ; Web:
>>     www.elra.info <http://www.elra.info> www.elda.org
>>     <http://www.elda.org>
>>     Tel. +33 1 43 13 33 33 <tel:%2B33%201%2043%2013%2033%2033> - Fax.
>>     +33 1 43 13 33 30 <tel:%2B33%201%2043%2013%2033%2030>
>>     ***************************************************
>>     ** *Info on LREC: www.lrec-conf.org <http://www.lrec-conf.org> *
>>     **************************************************** **
>>
>>     ****
>>
>>
>
> -- 
>
> *************************************************
> *Khalid CHOUKRI *
> ELRA General Secretary & ELDA CEO
> email: choukri@elda.org ; Web: www.elra.info www.elda.org
> Tel. +33 1 43 13 33 33 - Fax. +33 1 43 13 33 30
> ***************************************************
> ** *Info on LREC: www.lrec-conf.org *
> **************************************************** **
>
> ****

-- 
Director - Knowledge and Data Engineering Group
The CNGL Centre for Global Intelligent Content
School of Computer Science and Statistics
Trinity College Dublin

Received on Wednesday, 25 March 2015 00:50:15 UTC