- From: <dave.lewis@cs.tcd.ie>
- Date: Tue, 24 Mar 2015 16:48:25 +0000
- To: public-ld4lt@w3.org
- Message-ID: <55119559.1090205@cs.tcd.ie>
John, Prov-o would be one way of capturing this, using prov:wasDerivedFrom and optionally additional activity meta-data about the harvesting process itself. cheers, Dave On 20/03/2015 13:18, Khalid Choukri wrote: > Hi John > Thanks for the clarification, > This is an essential and tricky issue, we should insist that harvested > data is labeled as such and hence prevent people form harvesting > things from secondary sources. > I am not sure you can, at this stage, filter duplicate records. > > For us (ELRA) all the records were provided to you including via > Meta-share, in the worst case some thing like "ELRA (via META-SHARE)" > could be OK (though we think ELRA should be the sole source for > ELRA catalogued resources). > > Best regards > Khalid > > > On 20/03/2015 12:41, John P. McCrae wrote: >> Hi Khalid, >> >> The source property is intended to indicate where *we* got the record >> from, in this case actually from CLARIN! Would it be better if I >> clarify it by making it something like, for example "Meertens >> Institute (via CLARIN VLO)", or "ELRA (via CLARIN VLO)"? >> >> Regards, >> John >> >> On Thu, Mar 19, 2015 at 7:03 PM, Khalid Choukri <choukri@elda.org >> <mailto:choukri@elda.org>> wrote: >> >> Hi John >> >> I am so sorry I missed the telco >> I manage to review the slides and the web site and I realise that >> you harvested so many sources which also harvest other sources in >> a very long loop; and leading to many wrong labelling (but at >> least I understand why you mention that our community has over >> 100K resources) >> >> As examples searching the titles with "Speecon" , a resource >> available only from ELRA catalogue I see that it is labeled as: >> Souce : CLARIN >> >>> Thai Speecon database >>> <http://linghub.org/clarin/European_Language_Resources_Association/oai_catalogue_elra_info_ELRA_S0288> >>> Description <http://purl.org/dc/elements/1.1/description> >>> Desktop/Microphone >>> Language <http://purl.org/dc/terms/language> Thai >>> <http://www.lexvo.org/id/iso639-3/tha> >>> Source <http://purl.org/dc/elements/1.1/source> CLARIN >>> Title <http://purl.org/dc/elements/1.1/title> Thai Speecon >>> database >>> >>> Czech Speecon database >>> <http://linghub.org/lremap/efd68ccbda3ae46c3f4c04db49d34989> >>> Language <http://purl.org/dc/terms/language> Czech >>> <http://www.lexvo.org/id/iso639-3/ces> >>> Title <http://purl.org/dc/elements/1.1/title> Czech Speecon >>> database >>> Type <http://purl.org/dc/terms/type> Corpus >>> <http://babelnet.org/rdf/s00022825n> >>> >>> Czech Speecon database >>> <http://linghub.org/clarin/European_Language_Resources_Association/oai_catalogue_elra_info_ELRA_S0298> >>> Description <http://purl.org/dc/elements/1.1/description> >>> Desktop/Microphone >>> Language <http://purl.org/dc/terms/language> Czech >>> <http://www.lexvo.org/id/iso639-3/ces> >>> Source <http://purl.org/dc/elements/1.1/source> CLARIN >>> Title <http://purl.org/dc/elements/1.1/title> Czech Speecon >>> database >>> >> >> >> The same applies to Eurom1: >>> EUROM1_fr >>> <http://linghub.org/clarin/Speech_and_Language_Data_Repository/oai_sldr_org_sldr000035> >>> Contributor <http://purl.org/dc/elements/1.1/contributor> SAM_A >>> European project >>> Creator <http://purl.org/dc/elements/1.1/creator> Institut de >>> la communication parlée (ICP, Grenoble FR) >>> Description <http://purl.org/dc/elements/1.1/description> The >>> EUROM1 database contains recordings of 60 speakers in eleven >>> European Languages: Danish, Dutch, British English, French, >>> German, Norwegian, Swedish, Dutch, Greek, Portuguese and >>> Spanish. It was explicitly designed to aid the phonetic >>> comparison of languages, with similar materials and recording >>> protocols in all languages.<br />Only French EUROM1 is >>> accessible here. It was used as a resource for the MULTEXT >>> project.<br />This version has been reformatted for compliance >>> with long-term preservation specifications. >>> Rights <http://purl.org/dc/elements/1.1/rights> >>> info:eu-repo/date/submitted/2008-09-01 >>> Source <http://purl.org/dc/elements/1.1/source> CLARIN >>> Subject <http://purl.org/dc/elements/1.1/subject> >>> Title <http://purl.org/dc/elements/1.1/title> EUROM1_fr >>> >> >> and to others .... >>> GlobalPhone Korean >>> <http://linghub.org/clarin/European_Language_Resources_Association/oai_catalogue_elra_info_ELRA_S0200> >>> Description <http://purl.org/dc/elements/1.1/description> >>> Desktop/Microphone >>> Language <http://purl.org/dc/terms/language> Korean >>> <http://www.lexvo.org/id/iso639-3/kor> >>> Source <http://purl.org/dc/elements/1.1/source> CLARIN >>> Title <http://purl.org/dc/elements/1.1/title> GlobalPhone Korean >>> >> >> >> as you can imagine this is misleading, and I am wondering if we >> can help you correct this. >> >> All the best >> Khalid >> >> >> >> >> >> >> >> On 19/03/2015 15:08, John P. McCrae wrote: >>> For those who have not yet joined the link to the GotoMeeting is >>> here >>> >>> https://global.gotomeeting.com/join/360074461 >>> >>> Regards, >>> John >>> >>> On Thu, Mar 19, 2015 at 2:16 PM, John P. McCrae >>> <jmccrae@cit-ec.uni-bielefeld.de >>> <mailto:jmccrae@cit-ec.uni-bielefeld.de>> wrote: >>> >>> Dear all, >>> >>> In the teleconference this afternoon we will present Linghub >>> <http://linghub.org/>, the work of several members of this >>> group and the LIDER project. >>> >>> Here are some slides I will present to start the discussion: >>> >>> https://docs.google.com/presentation/d/1ZDzHYcgHvqzp_zK77vGFZ36kEMEmNt9rw7kBJmrhetQ/edit?usp=sharing >>> >>> Regards, >>> John P. McCrae >>> >>> >> >> -- >> >> ************************************************* >> *Khalid CHOUKRI * >> ELRA General Secretary & ELDA CEO >> email: choukri@elda.org <mailto:choukri@elda.org> ; Web: >> www.elra.info <http://www.elra.info> www.elda.org >> <http://www.elda.org> >> Tel. +33 1 43 13 33 33 <tel:%2B33%201%2043%2013%2033%2033> - Fax. >> +33 1 43 13 33 30 <tel:%2B33%201%2043%2013%2033%2030> >> *************************************************** >> ** *Info on LREC: www.lrec-conf.org <http://www.lrec-conf.org> * >> **************************************************** ** >> >> **** >> >> > > -- > > ************************************************* > *Khalid CHOUKRI * > ELRA General Secretary & ELDA CEO > email: choukri@elda.org ; Web: www.elra.info www.elda.org > Tel. +33 1 43 13 33 33 - Fax. +33 1 43 13 33 30 > *************************************************** > ** *Info on LREC: www.lrec-conf.org * > **************************************************** ** > > **** -- Director - Knowledge and Data Engineering Group The CNGL Centre for Global Intelligent Content School of Computer Science and Statistics Trinity College Dublin
Received on Wednesday, 25 March 2015 00:50:15 UTC