- From: Khalid Choukri <choukri@elda.org>
- Date: Fri, 20 Mar 2015 14:18:31 +0100
- To: "John P. McCrae" <jmccrae@cit-ec.uni-bielefeld.de>
- CC: public-ld4lt@w3.org
- Message-ID: <550C1E27.8010403@elda.org>
Hi John Thanks for the clarification, This is an essential and tricky issue, we should insist that harvested data is labeled as such and hence prevent people form harvesting things from secondary sources. I am not sure you can, at this stage, filter duplicate records. For us (ELRA) all the records were provided to you including via Meta-share, in the worst case some thing like "ELRA (via META-SHARE)" could be OK (though we think ELRA should be the sole source for ELRA catalogued resources). Best regards Khalid On 20/03/2015 12:41, John P. McCrae wrote: > Hi Khalid, > > The source property is intended to indicate where *we* got the record > from, in this case actually from CLARIN! Would it be better if I > clarify it by making it something like, for example "Meertens > Institute (via CLARIN VLO)", or "ELRA (via CLARIN VLO)"? > > Regards, > John > > On Thu, Mar 19, 2015 at 7:03 PM, Khalid Choukri <choukri@elda.org > <mailto:choukri@elda.org>> wrote: > > Hi John > > I am so sorry I missed the telco > I manage to review the slides and the web site and I realise that > you harvested so many sources which also harvest other sources in > a very long loop; and leading to many wrong labelling (but at > least I understand why you mention that our community has over > 100K resources) > > As examples searching the titles with "Speecon" , a resource > available only from ELRA catalogue I see that it is labeled as: > Souce : CLARIN > >> Thai Speecon database >> <http://linghub.org/clarin/European_Language_Resources_Association/oai_catalogue_elra_info_ELRA_S0288> >> Description <http://purl.org/dc/elements/1.1/description> >> Desktop/Microphone >> Language <http://purl.org/dc/terms/language> Thai >> <http://www.lexvo.org/id/iso639-3/tha> >> Source <http://purl.org/dc/elements/1.1/source> CLARIN >> Title <http://purl.org/dc/elements/1.1/title> Thai Speecon database >> >> Czech Speecon database >> <http://linghub.org/lremap/efd68ccbda3ae46c3f4c04db49d34989> >> Language <http://purl.org/dc/terms/language> Czech >> <http://www.lexvo.org/id/iso639-3/ces> >> Title <http://purl.org/dc/elements/1.1/title> Czech Speecon >> database >> Type <http://purl.org/dc/terms/type> Corpus >> <http://babelnet.org/rdf/s00022825n> >> >> Czech Speecon database >> <http://linghub.org/clarin/European_Language_Resources_Association/oai_catalogue_elra_info_ELRA_S0298> >> Description <http://purl.org/dc/elements/1.1/description> >> Desktop/Microphone >> Language <http://purl.org/dc/terms/language> Czech >> <http://www.lexvo.org/id/iso639-3/ces> >> Source <http://purl.org/dc/elements/1.1/source> CLARIN >> Title <http://purl.org/dc/elements/1.1/title> Czech Speecon >> database >> > > > The same applies to Eurom1: >> EUROM1_fr >> <http://linghub.org/clarin/Speech_and_Language_Data_Repository/oai_sldr_org_sldr000035> >> Contributor <http://purl.org/dc/elements/1.1/contributor> SAM_A >> European project >> Creator <http://purl.org/dc/elements/1.1/creator> Institut de la >> communication parlée (ICP, Grenoble FR) >> Description <http://purl.org/dc/elements/1.1/description> The >> EUROM1 database contains recordings of 60 speakers in eleven >> European Languages: Danish, Dutch, British English, French, >> German, Norwegian, Swedish, Dutch, Greek, Portuguese and Spanish. >> It was explicitly designed to aid the phonetic comparison of >> languages, with similar materials and recording protocols in all >> languages.<br />Only French EUROM1 is accessible here. It was >> used as a resource for the MULTEXT project.<br />This version has >> been reformatted for compliance with long-term preservation >> specifications. >> Rights <http://purl.org/dc/elements/1.1/rights> >> info:eu-repo/date/submitted/2008-09-01 >> Source <http://purl.org/dc/elements/1.1/source> CLARIN >> Subject <http://purl.org/dc/elements/1.1/subject> >> Title <http://purl.org/dc/elements/1.1/title> EUROM1_fr >> > > and to others .... >> GlobalPhone Korean >> <http://linghub.org/clarin/European_Language_Resources_Association/oai_catalogue_elra_info_ELRA_S0200> >> Description <http://purl.org/dc/elements/1.1/description> >> Desktop/Microphone >> Language <http://purl.org/dc/terms/language> Korean >> <http://www.lexvo.org/id/iso639-3/kor> >> Source <http://purl.org/dc/elements/1.1/source> CLARIN >> Title <http://purl.org/dc/elements/1.1/title> GlobalPhone Korean >> > > > as you can imagine this is misleading, and I am wondering if we > can help you correct this. > > All the best > Khalid > > > > > > > > On 19/03/2015 15:08, John P. McCrae wrote: >> For those who have not yet joined the link to the GotoMeeting is here >> >> https://global.gotomeeting.com/join/360074461 >> >> Regards, >> John >> >> On Thu, Mar 19, 2015 at 2:16 PM, John P. McCrae >> <jmccrae@cit-ec.uni-bielefeld.de >> <mailto:jmccrae@cit-ec.uni-bielefeld.de>> wrote: >> >> Dear all, >> >> In the teleconference this afternoon we will present Linghub >> <http://linghub.org/>, the work of several members of this >> group and the LIDER project. >> >> Here are some slides I will present to start the discussion: >> >> https://docs.google.com/presentation/d/1ZDzHYcgHvqzp_zK77vGFZ36kEMEmNt9rw7kBJmrhetQ/edit?usp=sharing >> >> Regards, >> John P. McCrae >> >> > > -- > > ************************************************* > *Khalid CHOUKRI * > ELRA General Secretary & ELDA CEO > email: choukri@elda.org <mailto:choukri@elda.org> ; Web: > www.elra.info <http://www.elra.info> www.elda.org > <http://www.elda.org> > Tel. +33 1 43 13 33 33 <tel:%2B33%201%2043%2013%2033%2033> - Fax. > +33 1 43 13 33 30 <tel:%2B33%201%2043%2013%2033%2030> > *************************************************** > ** *Info on LREC: www.lrec-conf.org <http://www.lrec-conf.org> * > **************************************************** ** > > **** > > -- ************************************************* *Khalid CHOUKRI * ELRA General Secretary & ELDA CEO email: choukri@elda.org ; Web: www.elra.info www.elda.org Tel. +33 1 43 13 33 33 - Fax. +33 1 43 13 33 30 *************************************************** ** *Info on LREC: www.lrec-conf.org * **************************************************** ** ****
Received on Monday, 23 March 2015 12:21:21 UTC