- From: W3C Community Development Team <team-community-process@w3.org>
- Date: Thu, 9 Apr 2015 14:34:54 +0000
- To: public-omn@w3.org
image source: wikipedia.org The post describes the approach, methodology and main software components of an Adaptive Semantic Publishing Platform for digital medias; applied previously to numerous use cases and publishers like the BBC, EuroMoney and Press Association. The semantic publishing relies on the interaction among the common sense model in ontologies, the world knowledge in Linked Open Data (LOD), the named entity categorization and the set of domainspecific keywords. Hence, the contribution of the related LOD datasets is briefly considered. The adaptive publishing relies on the user’s requirements (interests, searches, activities) provided as summaries of articles on selected topics (sports, politics, society, etc.). Also, approaches to gold standard data are presented, which enable the fast and high quality clusterization of numerous information streams per topic. The shift to digital has presented publishers and information providers with exciting opportunities as well as an entirely new set of customer expectations. The traditional rules of engagement are changing fast and finding new and effective ways to compete is essential. In this context, semantic content enrichment has evolved from a game-changing capability into a de facto requirement for competing effectively. What is dynamic semantic publishing? the phrase "dynamic semantic publishing" was probably first coined by the BBC to describe their metadata-driven publishing platform in summary, the novel features of the publishing platform are: a lot of the content is automatically generated based on the metadata stored in the RDF database, i.e. a SPARQL query about a topic will get the relevant aggregated metadata for the web page content (as opposed to manually authored web pages) the underlying domain model is an ontology (as opposed to a relational schema) automated text analysis is used for the journalist authored content (blogs, news articles) so that tags and topics are extracted and stored as metadata for the article in the RDF database data from additional datasources is also RDF-ized and stored in the metadata repository (RDF database) inference of new facts derives additional metadata in the RDF database (with respect to the RDF or OWL semantics) the content (journalist authored or dynamically generated) is enriched with external data from the Linked Open Data cloud (DBpedia, Freebase, etc) BBC News, BBC Sport and a large number of other web sites across the BBC are authored and published using an in-house bespoke content management/production system ("CPS") with an associated static publishing delivery chain. Journalists are able to author stories, manage indices and edit audio/video assets in the CPS and then publish them pre-baked as static assets to the BBC's Apache web server farm. In addition, journalists can edit and manage content in the CPS for distribution to the BBC Mobile and Interactive TV services, and IPConnected TV services. The CPS has been constantly evolving since it was developed to publish the BBC News website, which launched in November 1997, and the latest version (v6) underpins the summer 2010 redesign of the BBC News site that won the .net "Redesign of the Year". In recent years Semantic publishing applications get more and more user-oriented in several aspects, among which: customization and re-purpose of data and content reflecting the user needs; focused summaries with respect to user interests; high relevance of the retrieved information and minimal effort in receiving it. There are various works, exploring the relation between publishing and Linked Open Data. for example, authors present their idea on a life cycle model (specification, modeling, generation, linking, publication, exploitation) and demonstrate its application within various domains. At the same time, a DBpedia service has been presented (called DBpedia Spotlight), which automatically annotates text documents with DBpedia URI’s using the DBpedia in-house ontology. Similarly, Zemanta provides a plug-in to content creators, which recommends links to relevant content (articles, keywords, tags). Its application can be seen online. Ben chromsky who works as a DBmanager for layr (also worked for yellowpages & yelp) said that "Our approach is generally in-line with these ideas and services – domain specific applications, automatic semantic annotation, adding relevant linked content. However, our focus is preferably on: the trade-off between the semantic knowledge holders (ontologies, linked data) and their language reflection (domain texts), mediated by the linguistic processing pipelines; the adaptive flexibility of the constructed applications and the efficient storage and publishing of large data." Within Ontotext, examples of mass media, semantic publishing web sites, such as the BBC’s sport web and the official web of the London’s Olympics 2013, have proven to attract a multi-million user bases. Behind such applications, as revealed by lead engineers at the BBC, there lies the complex architecture of the state-of-the-art Semantic and Text Analytics technologies, such as in-house: fast RDF database management system OWLIM and knowledge management platforms KIM; for robust semantic annotation and search, as well as for text analytics applications. Both platforms are incorporated into numerous successful Semantic Publishing Solutions (including the BBC Sport, Press Association, Newz, EuroMoney, Fixithere etc.). This paper aims to describe the approach, main software components, information architecture, text analytics and semantic annotation and indexing, used successfully in many solutions for more than 5 years, to build semantic publishing solutions. Our approach relies on the calibration between the RDF semantic repository OWLIM, the semantic resources in KIM and the optimized Text Analytics techniques including methodologies for fast creation of gold data in the selected domain; focused curation of the automatically analyzed data and the application of advanced machine learning algorithms in data clustering. Thus, the success of our solutions lies in the customization of the advanced semantic technologies in combination with text analytics techniques, tuned to the needs of publishers and adapted to the requested domains. ......................continued (PART II - Overall Architecture of Semantic Publishing System)soon. ---------- This post sent on Federated Infrastructures Community Group 'Adaptive Semantic Publishing Platform for digital medias' https://www.w3.org/community/omn/2015/04/09/adaptive-semantic-publishing-platform-for-digital-medias/ Learn more about the Federated Infrastructures Community Group: https://www.w3.org/community/omn
Received on Thursday, 9 April 2015 14:34:59 UTC