The second Social News On the Web (SNOW) workshop took place this week (Tue 8 April) in Seoul, in the context of the WWW 2014 conference. The workshop featured several very insightful talks revolving around the increasingly interesting topic of social online news. It is noteworthy that SNOW was featured by seen.co.
The workshop was structured around two main parts, the research papers session and the SNOW Data Challenge. After the opening by Luca Aiello, the first presented research paper focused on the topic of news diffusion in social media. In particular, Minkyoung Kim discussed how user behaviour from multiple social media platforms collectively forms complex information pathways on the Web. Minkyoung provided insights with respect to the interplay between news sites, Social Network Sites (SNS) and blogs. More specifically, a first key observation was that SNS and Blog users are less active but more reactive for real-world news than for other arbitrary topics. Moreover, active news media turned out to be tightly connected, enhancing the opportunity to be exposed to other social systems. In addition, it was found that the most active news category in each system corresponds to the most reactive news category and that larger diffusion exhibits higher heterogeneity.
Next, Eva Jaho presented the Alethiometer, a framework for assessing trustworthiness and content validity in social media (in particular for Twitter). According to Eva, the key principle of Alethiometer is determined around three axes: Contributor, Content and Context. The analysis of the validity of Contributor concerns parameters such as trust, reputation and influence of an information source (i.e. Twitter account). Content validity is expressed through parameters such as the language used, the history and possible manipulations performed on the content. And finally, Context analysis examines whether the ’what’, ’when’ and ’where’ of an online publication concur with each other. For the analysis of each framework category, Eva and her colleagues defined a set of related parameters, which are termed modalities. Modalities concerning a contributor include the reputation, history of valid contributions, popularity, influence, and account validity. Modalities referring to the posted content include the importance and reputation of the contained web links, the content popularity, influence, its originality, authenticity and objectivity. Finally, analysis of context refers to cross-checking for similar reports in different social media, the coherence between the content and tags, attached links and multimedia, and the coherence between reference location/time and publication location/time.
Next, Raphael Troncy presented some recent advances of the LinkedTV project on describing and contextualizing events in TV news shows. In particular, the presented framework made use of textual metadata around a streaming TV show (including subtitles) with the help of a number of Named Entity extraction APIs along with a sophisticated query mechanism to Wikipedia, with the goal of enriching the viewed TV news content with structured information that can be conveniently accessible by means of a second screen.
Then, Jochen Spangenberg from Deutsche Welle gave a talk on the recent phenomenon of Grassroots and Collaborative Journalism. In particular, Jochen took a closer look at two new forms of audience involvement and the impacts this has on news and information production. We label these concepts (1) grassroots journalism and (2) collaborative journalism. Grassroots journalism is defined as the collection, dissemination and analysis of news and information by the general public, especially by means of the Internet. Collaborative journalism, in turn, labels ways in which media organizations and professional journalists involve external parties in the production of information, thereby making audience contributions part of the storytelling process or the story itself. Recent investigations in Deutsche Welle showed that grassroots and collaborative journalism will continue to grow: it can be expected that accelerating technological developments, audience’s eagerness to “get involved” and increasing Internet access will motivate even more people to participate in the process of news gathering and information dissemination. At the same time, further strategies that meet the emerging challenges need to be developed in order to maintain (or improve) the quality of grassroots/collaborative news coverage: All this is of great importance for the prospering of the media landscape as a whole, and thereby the functioning of democratic societies.
The first keynote speaker was Alejandro Jaimes from Yahoo! Research, who presented an integrated view of the research conducted by his team and him on the topic of Data-driven Journalism, with an emphasis on leveraging Big Data to transform news narratives and to increase readers' engagement. The proposed framework was structured on top of three main pillars: user-centered design, multimedia analysis and data mining.
The last research paper of SNOW, presented by Symeon Papadopoulos, focused on the problem of fake image verification in social media. In particular, Symeon emphasized the fact that previous works tend to overestimate the performance of supervised learning approaches by training models that unintendedly contain information from the test set as a result of the large amount of redundancy in social media. Symeon proposed two stricter modes of supervised learning that ensure full separation between training and test set and that simulate in a more realistic way a real-world image verification setting. The proposed framework was complemented by an empirical study on two datasets collected around the Hurricane Sandy and the Boston Marathon bombing incidents.
The second workshop keynote speaker was Jure Leskovec, who focused on the problem of creating, placing, and presenting social media content in order to maximize user engagement. In addition to the quality of the content itself, Jure pointed that several factors, such as the way the content is presented (the title), the community it is posted to, whether it has been seen before, and the time it is posted, determine its success. Jure discussed how a computational perspective can be applied to questions involving the dynamics of information flows through such networks, including the analysis of massive data as well as mathematical models that seek to abstract some of the underlying phenomena.
The next session of SNOW focused on the topic detection Data Challenge, which attracted the interest of several researchers working on the problem of topic detection on noisy streams of data. The Data Challenge organizers provided participants with a Twitter dataset and a specific problem definition, and then systematically evaluated their submissions according to a variety of criteria, such as the recall of newsworthy topics, the readability of the produced headlines, the coherence of topic elements (tweets, tags) and the diversity of the topic tweets.
Carlos Martin-Dancause presented a method based on bursty n-grams, i.e. sets of keywords whose frequency suddenly increases in the current time interval. Then, Olfa Nasaroui presented an approach grounded on distributed LDA topic modelling and topic agglomeration in a latent space. Then, Steven van Canneyt presented the approach of his team that was based on newsworthy cotnent classification. Georgiana Ifrim proposed an event detection method using aggressive filtering and hierarchical tweet clustering. Dimitris Milioris detailed an information theoretic approach for topic detection based on joint complexity. Finally, Symeon Papadopoulos presented an approach on two-level message clustering.
The winners of the challenge were the teams by Georgiana Ifrim, Carlos Martin-Dancausa and Dimitris Milioris.
The Challenge proceedings are available on ceur-ws.org.