This module is responsible for the collection of Items (tweets, posts, etc.) and MediaItems from Online Social Networks in two ways:
- Using the streaming API of Twitter, the stream-manager continuously monitors a set of Twitter accounts (e.g. lists of newshounds) producing a real-time stream of Items as input to the system. In V1, a single static list of Twitter accounts was monitored. In V2, multiple lists are supported and the provenance information (which list each Item came from) is maintained. In addition, V2 will support dynamic updates of the monitored lists (offering appropriate UI controls to end users).
- Using the REST API of Twitter, Facebook and other media sharing platforms (Flickr, Tumblr, YouTube, Google+ and Instagram), the stream-manager collects MediaItems in a targeted way after a DySCO is created (using the DySCO fields such as entities and keywords to form appropriate queries) and associates them with the input DySCO. In addition to this, in V2 the stream-manager will also collect Items (from Twitter and Facebook) using the REST API and provide them as input to the DySCO generation process.
The module architecture is illustrated below. The Item Collector takes care of the Item collection from the Streaming API of Twitter, while the Search Manager performs the targeted collection of content following the generation of DySCOs. The MediaItem Extractor is responsible for the extraction of MediaItems from the collected Items. The stream-manager stores the collected Items and MediaItems metadata in mongoDB. In V1, the stream-manager was also responsible for invoking their indexing in Solr. However, in V2 for efficiency reasons, this responsibility was moved to the orchestrators.
The project source code is available in the socialsensor-stream-manager GitHub project.