Adaptive Semantic Publishing Platform for digital medias [via Federated Infrastructures Community Group]

image source: wikipedia.org
The post describes the approach, methodology and main software components of an
Adaptive Semantic Publishing Platform for digital medias; applied previously to
numerous use cases and publishers like the BBC, EuroMoney and Press Association.
The semantic publishing relies on the interaction among the common sense model
in ontologies, the world knowledge in Linked Open Data (LOD), the named entity
categorization and the set of domainspecific keywords.
Hence, the contribution of the related LOD datasets is briefly considered. The
adaptive publishing relies on the user’s requirements (interests, searches,
activities) provided as summaries of articles on selected topics (sports,
politics, society, etc.). Also, approaches to gold standard data are presented,
which enable the fast and high quality clusterization of numerous information
streams per topic.
The shift to digital has presented publishers and information providers with
exciting opportunities as well as an entirely new set of customer expectations.
The traditional rules of engagement are changing fast and finding new and
effective ways to compete is essential. In this context, semantic content
enrichment has evolved from a game-changing capability into a de facto
requirement for competing effectively.






What is dynamic semantic publishing?
the phrase "dynamic semantic publishing" was probably first coined by the BBC to
describe their metadata-driven publishing platform  in summary, the novel
features of the publishing platform are:








 a lot of the content is automatically generated based on the metadata stored in
the RDF database, i.e. a SPARQL query about a topic will get the relevant
aggregated metadata for the web page content (as opposed to manually authored
web pages)
 the underlying domain model is an ontology (as opposed to a relational schema)
 automated text analysis is used for the journalist authored content (blogs,
news articles) so that tags and topics are extracted and stored as metadata for
the article in the RDF database
 data from additional datasources is also RDF-ized and stored in the metadata
repository (RDF database)
 inference of new facts derives additional metadata in the RDF database (with
respect to the RDF or OWL semantics)
 the content (journalist authored or dynamically generated) is enriched with
external data from the Linked Open Data cloud (DBpedia, Freebase, etc)









BBC News, BBC Sport and a large number of other web sites across the BBC are
authored and published using an in-house bespoke content management/production
system ("CPS") with an associated static publishing delivery chain. Journalists
are able to author stories, manage indices and edit audio/video assets in the
CPS and then publish them pre-baked as static assets to the BBC's Apache web
server farm. In addition, journalists can edit and manage content in the CPS for
distribution to the BBC Mobile and Interactive TV services, and IPConnected TV
services. The CPS has been constantly evolving since it was developed to publish
the BBC News website, which launched in November 1997, and the latest version
(v6) underpins the summer 2010 redesign of the BBC News site that won the .net
"Redesign of the Year".




In recent years Semantic publishing applications get more and more user-oriented
in several aspects, among which: customization and re-purpose of data and
content reflecting the user needs; focused summaries with respect to user
interests; high relevance of the retrieved information and minimal effort in
receiving it.
There are various works, exploring the relation between publishing and Linked
Open Data.  for example, authors present their idea on a life cycle model
(specification, modeling, generation, linking, publication, exploitation) and
demonstrate its application within various domains. At the same time,  a
DBpedia service has been presented (called DBpedia Spotlight), which
automatically annotates text documents with DBpedia URI’s using the DBpedia
in-house ontology. Similarly, Zemanta provides a plug-in to content creators,
which recommends links to relevant content (articles, keywords, tags). Its
application can be seen online. Ben chromsky who works as a DBmanager for layr
(also worked for yellowpages & yelp) said that "Our approach is generally
in-line with these ideas and services – domain specific applications,
automatic semantic annotation, adding relevant linked content. However, our
focus is preferably on: the trade-off between the semantic knowledge holders
(ontologies, linked data) and their language reflection (domain texts), mediated
by the linguistic processing pipelines; the adaptive flexibility of the
constructed applications and the efficient storage and publishing of large
data."
Within Ontotext, examples of mass media, semantic publishing web sites, such as
the BBC’s sport web and the official web of the London’s Olympics 2013, have
proven to attract a multi-million user bases. Behind such applications, as
revealed by lead engineers at the BBC, there lies the complex architecture of
the state-of-the-art Semantic and Text Analytics technologies, such as in-house:
fast RDF database management system OWLIM and knowledge management platforms
KIM; for robust semantic annotation and search, as well as for text analytics
applications.
Both platforms are incorporated into numerous successful Semantic Publishing
Solutions (including the BBC Sport, Press Association, Newz, EuroMoney,
Fixithere etc.). This paper aims to describe the approach, main software
components, information architecture, text analytics and semantic annotation and
indexing, used successfully in many solutions for more than 5 years, to build
semantic publishing solutions.
Our approach relies on the calibration between the RDF semantic repository
OWLIM, the semantic resources in KIM and the optimized Text Analytics techniques
including methodologies for fast creation of gold data in the selected domain;
focused curation of the automatically analyzed data and the application of
advanced machine learning algorithms in data clustering. Thus, the success of
our solutions lies in the customization of the advanced semantic technologies in
combination with text analytics techniques, tuned to the needs of publishers and
adapted to the requested domains.
......................continued (PART II - Overall Architecture of Semantic
Publishing System)soon.



----------

This post sent on Federated Infrastructures Community Group



'Adaptive Semantic Publishing Platform for digital medias'

https://www.w3.org/community/omn/2015/04/09/adaptive-semantic-publishing-platform-for-digital-medias/



Learn more about the Federated Infrastructures Community Group: 

https://www.w3.org/community/omn

Received on Thursday, 9 April 2015 14:34:59 UTC