- From: John P. McCrae <jmccrae@cit-ec.uni-bielefeld.de>
- Date: Fri, 20 Mar 2015 15:33:18 +0100
- To: Khalid Choukri <choukri@elda.org>
- Cc: public-ld4lt@w3.org
- Message-ID: <CAC5njqr6-sSk6NW6L6ZBJHKvZrrqTucwuwL61FBz8TS0KRDR9w@mail.gmail.com>
Hi Khalid, We will actually pick up duplicates, this should be push to production in the next few weeks. I will mark the records from the CLARIN VLO as "ELRA (via CLARIN VLO)" (Tracker: https://github.com/liderproject/linghub/issues/18) For META-SHARE we don't really have any information for most records as to where they came from before META-SHARE, e.g., http://metashare.elda.org/repository/browse/eurom1e-english/f01a4c96de6811e2b1e400259011f6eaf6ec06978e9b4d5e89cd122f3f96961a/ http://linghub.org/metashare/f01a4c96de6811e2b1e400259011f6eaf6ec06978e9b4d5e89cd122f3f96961a Regards, John On Fri, Mar 20, 2015 at 2:18 PM, Khalid Choukri <choukri@elda.org> wrote: > Hi John > Thanks for the clarification, > This is an essential and tricky issue, we should insist that harvested > data is labeled as such and hence prevent people form harvesting things > from secondary sources. > I am not sure you can, at this stage, filter duplicate records. > > For us (ELRA) all the records were provided to you including via > Meta-share, in the worst case some thing like "ELRA (via META-SHARE)" could > be OK (though we think ELRA should be the sole source for ELRA > catalogued resources). > > Best regards > Khalid > > > > On 20/03/2015 12:41, John P. McCrae wrote: > > Hi Khalid, > > The source property is intended to indicate where *we* got the record > from, in this case actually from CLARIN! Would it be better if I clarify it > by making it something like, for example "Meertens Institute (via CLARIN > VLO)", or "ELRA (via CLARIN VLO)"? > > Regards, > John > > On Thu, Mar 19, 2015 at 7:03 PM, Khalid Choukri <choukri@elda.org> wrote: > >> Hi John >> >> I am so sorry I missed the telco >> I manage to review the slides and the web site and I realise that you >> harvested so many sources which also harvest other sources in a very long >> loop; and leading to many wrong labelling (but at least I understand why >> you mention that our community has over 100K resources) >> >> As examples searching the titles with "Speecon" , a resource available >> only from ELRA catalogue I see that it is labeled as: Souce : CLARIN >> >> Thai Speecon database >> <http://linghub.org/clarin/European_Language_Resources_Association/oai_catalogue_elra_info_ELRA_S0288> >> Description <http://purl.org/dc/elements/1.1/description> >> Desktop/Microphone Language <http://purl.org/dc/terms/language> Thai >> <http://www.lexvo.org/id/iso639-3/tha> Source >> <http://purl.org/dc/elements/1.1/source> CLARIN Title >> <http://purl.org/dc/elements/1.1/title> Thai Speecon database >> Czech Speecon database >> <http://linghub.org/lremap/efd68ccbda3ae46c3f4c04db49d34989> >> Language <http://purl.org/dc/terms/language> Czech >> <http://www.lexvo.org/id/iso639-3/ces> Title >> <http://purl.org/dc/elements/1.1/title> Czech Speecon database Type >> <http://purl.org/dc/terms/type> Corpus >> <http://babelnet.org/rdf/s00022825n> >> Czech Speecon database >> <http://linghub.org/clarin/European_Language_Resources_Association/oai_catalogue_elra_info_ELRA_S0298> >> Description <http://purl.org/dc/elements/1.1/description> >> Desktop/Microphone Language <http://purl.org/dc/terms/language> Czech >> <http://www.lexvo.org/id/iso639-3/ces> Source >> <http://purl.org/dc/elements/1.1/source> CLARIN Title >> <http://purl.org/dc/elements/1.1/title> Czech Speecon database >> >> >> >> The same applies to Eurom1: >> >> EUROM1_fr >> <http://linghub.org/clarin/Speech_and_Language_Data_Repository/oai_sldr_org_sldr000035> >> Contributor <http://purl.org/dc/elements/1.1/contributor> SAM_A >> European project Creator <http://purl.org/dc/elements/1.1/creator> Institut >> de la communication parlée (ICP, Grenoble FR) Description >> <http://purl.org/dc/elements/1.1/description> The EUROM1 database >> contains recordings of 60 speakers in eleven European Languages: Danish, >> Dutch, British English, French, German, Norwegian, Swedish, Dutch, Greek, >> Portuguese and Spanish. It was explicitly designed to aid the phonetic >> comparison of languages, with similar materials and recording protocols in >> all languages.<br />Only French EUROM1 is accessible here. It was used as a >> resource for the MULTEXT project.<br />This version has been reformatted >> for compliance with long-term preservation specifications. Rights >> <http://purl.org/dc/elements/1.1/rights> >> info:eu-repo/date/submitted/2008-09-01 Source >> <http://purl.org/dc/elements/1.1/source> CLARIN Subject >> <http://purl.org/dc/elements/1.1/subject> >> Title <http://purl.org/dc/elements/1.1/title> EUROM1_fr >> >> >> and to others .... >> >> GlobalPhone Korean >> <http://linghub.org/clarin/European_Language_Resources_Association/oai_catalogue_elra_info_ELRA_S0200> >> Description <http://purl.org/dc/elements/1.1/description> >> Desktop/Microphone Language <http://purl.org/dc/terms/language> Korean >> <http://www.lexvo.org/id/iso639-3/kor> Source >> <http://purl.org/dc/elements/1.1/source> CLARIN Title >> <http://purl.org/dc/elements/1.1/title> GlobalPhone Korean >> >> >> >> as you can imagine this is misleading, and I am wondering if we can help >> you correct this. >> >> All the best >> Khalid >> >> >> >> >> >> >> >> On 19/03/2015 15:08, John P. McCrae wrote: >> >> For those who have not yet joined the link to the GotoMeeting is here >> >> https://global.gotomeeting.com/join/360074461 >> >> Regards, >> John >> >> On Thu, Mar 19, 2015 at 2:16 PM, John P. McCrae < >> jmccrae@cit-ec.uni-bielefeld.de> wrote: >> >>> Dear all, >>> >>> In the teleconference this afternoon we will present Linghub >>> <http://linghub.org/>, the work of several members of this group and >>> the LIDER project. >>> >>> Here are some slides I will present to start the discussion: >>> >>> >>> https://docs.google.com/presentation/d/1ZDzHYcgHvqzp_zK77vGFZ36kEMEmNt9rw7kBJmrhetQ/edit?usp=sharing >>> >>> Regards, >>> John P. McCrae >>> >> >> >> -- >> >> ************************************************* >> * Khalid CHOUKRI * >> ELRA General Secretary & ELDA CEO >> email: choukri@elda.org ; Web: www.elra.info www.elda.org >> Tel. +33 1 43 13 33 33 <%2B33%201%2043%2013%2033%2033> - Fax. +33 1 43 >> 13 33 30 <%2B33%201%2043%2013%2033%2030> >> *************************************************** >> ** >> * Info on LREC: www.lrec-conf.org <http://www.lrec-conf.org> >> **************************************************** * >> > > > -- > > ************************************************* > * Khalid CHOUKRI * > ELRA General Secretary & ELDA CEO > email: choukri@elda.org ; Web: www.elra.info www.elda.org > Tel. +33 1 43 13 33 33 - Fax. +33 1 43 13 33 30 > *************************************************** > ** > * Info on LREC: www.lrec-conf.org <http://www.lrec-conf.org> > **************************************************** * >
Received on Friday, 20 March 2015 14:33:50 UTC