ABSTRACT

In the modern digital age methodologies for professional Big Data analytics represent a strategic necessity to make information resources available to professional journalists and media producers in a more effective and efficient way and enabling new forms of production like data-driven journalism.

The challenge lies in the ability of collecting, connecting and presenting heterogeneous content streams accessible through different sources, such as digital TV, the Internet, social networks and media archives, and published through different media modalities, such as audio, speech, text and video, in an organic and semantics-driven way.

Rai Active News is a portal for professional information services that addresses these challenges with a uniform and holistic approach.

At the core of the system there is a set of artificial intelligence techniques and advanced statistical tools to automate tasks such as information extraction and multimedia content analysis targeted at discovering semantic links between resources, providing users with text, graphics and video news organized according to their individual interests.

INTRODUCTION

The exponential growth of digital resources availability is enabling new forms of content creation, sharing, and delivery.

Methodologies for aggregation and presentation of heterogeneous content are needed to make these resources effective and easily available to the final users.

Here, the challenge lies in the ability of collecting, connecting and presenting data streams from different media sources, e.g. television, press, the Internet, and of different media types such as audio, speech, text and video.

Rai Active News is a portal for professional information services that addresses these challenges with a uniform and holistic approach.

At the core of the system there is a set of artificial intelligence techniques and advanced statistical tools to automate tasks such as information extraction and multimedia content analysis targeted at discovering semantic links between resources, providing users with text, graphics and video news organized according to their individual interests.

The system allows to define customized search profiles that are automatically and dynamically updated with the relevant contents found in the monitored information sources, which include Web feeds, television channels and specialized circuits such as the Eurovision News Exchange Network (EVN), or legacy archives.

The system also provides a recommendation service based on the analysis of social activities in blogs which dynamically models the user’s interest and exploits it to recommend appropriate media content.

This paper presents the underlying infrastructure and technology on which Rai Active News is built.

The paper is organized as follows. As a first step we overview challenges and opportunities for knowledge integration in modern news production workflows. Then we describe the architecture developed to address these needs, as well as the services built on top of it. Finally, we conclude with considerations for future work.

CHALLENGES FOR KNOWLEDGE INTEGRATION IN NEWS PRODUCTION

News production is a complex and dynamic process which requires creating content in a fast-paced way and using a wide variety of media, including written texts, images, speech and videos.

Contrary to the past, when the life cycle of news items (from sourcing of content to distribution and consumption of products) was typically linear and isolated, we are nowadays dealing with more dynamic and interactive ways to produce, publish and consume news items.

Thanks to the proliferation of social networks and open data sources, not only professional journalists, but also individuals are currently taking part in the so called “data journalism” phenomenon.

Data journalism is the process of collecting, filtering and structuring big data for storytelling and reporting.

In this context the “event”, i.e. any relevant fact happening at some time and place, and the “topic”, i.e. real word entities or things like people, organizations, places or themes, become the basic units around which contents are produced and organized.

A topic can include either closed in time or still open events. As an example, the topic about a naval disaster might contain events about the shipwreck, the rescue operations or the ship demolition.

The task of data journalism is to extract, track and visualize such hidden information from available data. Machine learning, data mining and semantic Web techniques can be used to gather and analyze data in an unsupervised fashion, thus greatly ensuring productivity and efficiency through all the steps of the process.

As an example the News Storyline Ontology1 is a generic model to describe and organize the stories told by news organizations.

DOWNLOAD THE FULL TECH PAPER BELOW