- From: Madalina Croitoru <madalina.croitoru@lirmm.fr>
- Date: Sun, 30 Jun 2024 21:44:39 +0200
- To: semantic-web@w3.org
- Cc: Konstantin Todorov <konstantin.todorov@lirmm.fr>, Madalina Croitoru <croitoru@lirmm.fr>
- Message-Id: <212BFB6A-197D-42A7-9321-9A94A73AD594@lirmm.fr>
LLMs Beyond the Cutoff: 1st International Workshop on Computational Methods Beyond the Temporal Borders of Training Data https://llmsbeyondthecutoff2024.wordpress.com Collocated with CIKM 2024 October 25, 2024 — Boise (Idaho), USA * July 29, 2024: Paper submission deadline * August 30, 2024: Paper acceptance notification * September 15, 2024: Camera ready versions submission * October 25, 2024: Workshop date SUMMARY LLMs are trained on large amounts of web data that spread temporally up to a specific moment in time. For instance, chatGPT’s LLM “knows” the world before May 2023 with no real time access to information beyond this limit, other than a browsing tool similar to a search engine enabling simple lookup. However, in many scenarios, being able to analyze and reason with novel emerging events and topics is crucial to face the challenges of rapidly evolving landscapes of information. The workshop provides an interdisciplinary forum for discussing the temporal limitations of LLMs and proposing technical solutions of how to apply and develop LLMs beyond their cutoff dates. We explore two prominent scenarios, where contexts tend to evolve faster than the LLMs that are used to analyze them: (1) journalism and (2) industry. In terms of (1) the goal is to propose methods of detecting, classifying and reasoning with emerging topics that infuse public discourse on social or mainstream media. An example of such a topic is COVID-19 at the dawn of the pandemics outbreak. Downstream tasks of interest are fake news detection and fact-checking on novel topics, including claim analysis, opinion mining and narratives extraction. With regard to (2), the goal is to shed light on the limits of LLMs for companies in sectors such as international geopolitical monitoring and corporate intelligence, finance and stock market trading or insurance, where companies need to track their interests and products in real time. This does not address the inclusion of corporate data into the LLMs, but rather proposes solutions by using publicly available and constantly growing data. An overarching problem that will be studied is that of the cross-language and cross-country specificities of emerging data, where novel information in underrepresented languages or contexts may be more challenging to analyze. We welcome insights and parallels from the field of knowledge representation, where the similar problem with cutoff dates of knowledge graphs (dynamics and regular updates) is well understood. The expected outcomes are: 1) insights on the temporal limitations of LLMs, where the workshop will outline concrete challenges and bottlenecks in the identified scenarios; 2) novel methodological and technical solutions in terms of (incremental) machine learning models when dealing with (reasoning, extracting and classifying) information beyond the cutoff dates of current LLMs. TOPICS OF INTEREST * Methods for few-shot or zero-shot learning * Analysis of emerging topics and events, including counterfactual/what-if reasoning * Large language models for online discourse * Large language models for corporate near real-time data analysis * Large language models for multimodal understanding and generation * Multilingual and cross-country emerging information extraction * Computational journalism, disinformation spread, fact-checking and fake news detection * Stance and viewpoint discovery for novel information * Detection and classification of claims within emerging narratives * Social, ethical and legal aspects of LLMs up-to-dateness * Interpretability / explainability of computational methods beyond the cut off * Linking and enrichment of data beyond LLM cut off * Foundational models for knowledge graph building and entity alignment * Recommender systems for novel information * Quality, provenance, uncertainty and trust of emerging information and data * Use-cases, applications and cross-community interfaces * Evaluation frameworks and benchmarks SUBMISSION We welcome the following types of contributions: * Full papers (up to 12 pages including references): contain original research. * Short papers (up to 8 pages including references): contain original research in progress. * Demo papers (up to 8 pages including references): contain descriptions of prototypes, demos or software systems. * Data papers (up to 8 pages including references): contain descriptions of resources related to the workshop topics, such as datasets, knowledge graphs, corpora, annotation protocols, etc. * Position papers (up to 8 pages including references): discuss vision statements or research directions. Workshop papers must be self-contained and in English. They should not have been previously published, should not be considered for publication, and should not be under review for another workshop, conference, or journal. Manuscripts should be submitted to CIKM 2024 Easychair site (link to follow in the next CFP) in PDF format, using the Springer LNCS format. The review of manuscripts will be double-blind. Papers will be evaluated according to their significance, originality, technical content, style, clarity, and relevance to the workshop. At least one author of each accepted contribution must register for the workshop and present the paper. Pre-prints of all contributions will be made available during the conference. For any enquiries, please send an email to the workshop organizers: todorov@lirmm.fr, rettinger@uni-trier.de, jmgomez@expert.ai, croitoru@lirmm.fr, IMPORTANT DATES * July 29, 2024: Paper submission deadline * August 30, 2024: Paper acceptance notification * September 15, 2024: Camera ready versions submission * October 25, 2024: Workshop date All submission deadlines are end-of-day in the Anywhere on Earth (AoE) time zone. KEYNOTES * TBA AWARD * All contributions are eligible for the "Best Paper" award ORGANIZING COMMITTEE * Konstantin Todorov (University of Montpellier, CNRS, LIRMM, France) * José Manuel Gomèz Perèz (Expert.ai, Spain) * Madalina Croitoru (University of Montpellier, CNRS, LIRMM, France) * Achim Rettinger (University of Trier, Germany) PROGRAM COMMITTEE * Preslav Nakov, MBZUAI, United Arabe Emirates * Serena Villata, I3S, CNRS, France * Ronald Denaux, Amazon, USA * Filip Ilievski, Vrije Universiteit Amsterdam, The Netherlands * Sandra Bringay, University Paul Valéry, France * Ioana Manolescu, Inria Saclay, France * Dino Ienco, INRAE, France * Colin Porlezza, Univ. della Svizzera Italiana, Switzerland * Katarina Boland, Heinrich Heine Universität, Germany * Gabriella Lapesa, GESIS, Germany * Jonas Fegert, FZI, Germany * Michael Färber, TU-Dresden, Germany * Salim Hafid, University of Montpellier, France * Pavlos Fafalios, FORTH, Greece * Sarah Labelle, University Paul Valéry, France — Prof. Madalina Croitoru, LIRMM, Faculty of Science, University of Montpellier
Received on Sunday, 30 June 2024 19:44:56 UTC