W3C home > Mailing lists > Public > public-vocabs@w3.org > March 2020

Re: CfP: Semantic Web Challenge (ISWC2020) - Mining the Web of HTML-embedded Product Data

From: Phil Archer <phil.archer@gs1.org>
Date: Wed, 4 Mar 2020 14:16:15 +0000
To: Anna Primpeli <anna@informatik.uni-mannheim.de>, "semantic-web@w3.org" <semantic-web@w3.org>, "public-vocabs@w3.org" <public-vocabs@w3.org>, "public-schemaorg@w3.org" <public-schemaorg@w3.org>
Message-ID: <DM6PR08MB4972BBC8F041FF2F73D0211CB7E50@DM6PR08MB4972.namprd08.prod.outlook.com>
I'm trying to follow up on this as GS1 has some code lists and our
schema.org extension (https://gs1.org/voc) that might be of direct
interest (and we may even have some ground truth as well but it's tied
up in licensing so may not be useable here).

But https://ir-ischool-uos.github.io/mwpd/ returns a security warning.
Can you fix that please Anna?

We have more than a passing interest in seeing what comes out of this



Phil Archer
Director, Web Solutions, GS1

Meet GS1 Digital Link Developers at

+44 (0)7887 767755
Skype: philarcher
A word on abbreviations I sometimes use in email:

On 04/03/2020 12:33, Anna Primpeli wrote:
> Dear Colleagues,
> you are cordially invited to participate in the Semantic Web Challenge: Mining
> the Web of HTML-embedded Product Data  collocated with ISWC 2020. Please also
> kindly forward the following CfP to your professional network.
> *Call for Participation:*
> *Mining the Web of HTML-embedded Product Data*
> *(co-located with ISWC2020)*
> *1. Overview*
> The Semantic Web Challenge on Mining the Web of HTML-embedded Product Data is
> co-located with the 19th International Semantic Web Conference
> (https://iswc2020.semanticweb.org/, 2-6 Nov 2020 at Athens, Greece). The
> challenge organises two shared tasks related to product data mining on the Web:
> (1) product matching and (2) product classification. This event is organised by
> The University of Sheffield, The University of Mannheim and Amazon, and is open
> to anyone. Systems successfully beating the baseline of the respective task,
> will be invited to write a paper describing their method and system and present
> the method as a poster (and potentially also a short talk) at the ISWC2020
> conference. Winners of each task will be awarded 500 euro as prize (partly
> sponsored by Peak Indicators, https://www.peakindicators.com/).
> *2. Challenge website*
> For details of the challenge please visit https://ir-ischool-uos.github.io/mwpd/
> *3. Important dates*
> 02 March 2020: Google support group open. Please join the group at
> https://groups.google.com/forum/#!forum/mwpd2020if you wish to take part in this
> event
> 16 March 2020: Release of the training and validation sets
> 01 June 2020: Release of the test set (without ground truth)
> 15 June 2020: Submission of system output
> 08 July 2020: Publication of system results and notification of acceptance for
> presentation
> *4. Task and dataset brief*
> The challenge organises two tasks, product matching and product categorisation.
> *i) Product Matching*deals with identifying product offers on different websites
> that refer to the same real-world product (e.g., the same iPhone X model offered
> using different names/offer titles as well as different descriptions on various
> websites). A multi-million product offer corpus (16M) containing product offer
> clusters is released for the generation of training data. A validation set
> containing 1.1K offer pairs and a test set of 600 offer pairs will also be
> released. The goal of this task is to classify if the offer pairs in these
> datasets are match (i.e., referring to the same product) or non-match.
> *ii) Product classification*deals with assigning predefined product category
> labels (which can be multiple levels) to product instances (e.g., iPhone X is a
> ‘SmartPhone’, and also ‘Electronics’). A training dataset containing 10K product
> offers, a validation set of 3K product offers and a test set of 3K product
> offers will be released. Each dataset contains product offers with their
> metadata (e.g., name, description, URL) and three classification labels each
> corresponding to a level in the GS1 Global Product Classification taxonomy. The
> goal is to classify these product offers into the pre-defined category labels.
> All datasets are built based on structured data that was extracted from the
> Common Crawl (https://commoncrawl.org/) by the Web Data Commons project
> (http://webdatacommons.org/).
> *5. Resources and tools*
> The challenge will also release utility code (in Python) for processing the
> above datasets and scoring the system outputs. In addition, the following
> language resources for product-related data mining tasks:
> ·A text corpus of 150 million product offer descriptions
> ·Word embeddings trained on the above corpus
>        6. Organizing committee
> ·Dr Ziqi Zhang (Information School, The University of Sheffield)
> ·Prof. Christian Bizer (Institute of Computer Science and Business Informatics,
> The Mannheim University)
> ·Dr Haiping Lu (Department of Computer Science, The University of Sheffield)
> ·Dr Jun Ma (Amazon Inc. Seattle, US)
> ·Prof. Paul Clough (Information School, The University of Sheffield & Peak
> Indicators)
> ·Ms Anna Primpeli (Institute of Computer Science and Business Informatics, The
> Mannheim University)
> ·Mr Ralph Peeters (Institute of Computer Science and Business Informatics, The
> Mannheim University)
> ·Mr. Abdulkareem Alqusair (Information School, The University of Sheffield)
> *7. Contact*
> To contact the organising committee please use the Google discussion group
> https://groups.google.com/forum/#!forum/mwpd2020

CONFIDENTIALITY / DISCLAIMER: The contents of this e-mail are  confidential and are not to be regarded as a contractual offer or acceptance from GS1 (registered in Belgium). 
If you are not the addressee, or if this has been copied or sent to you in error, you must not use data herein for any purpose, you must delete it, and should inform the sender. 
GS1 disclaims liability for accuracy or completeness, and opinions expressed are those of the author alone. 
GS1 may monitor communications. 
Third party rights acknowledged. 
(c) 2020.
Received on Wednesday, 4 March 2020 14:16:36 UTC

This archive was generated by hypermail 2.4.0 : Wednesday, 4 March 2020 14:16:37 UTC