storage-indexing

on .

The storage-indexing components of SocialSensor include the following:

  • mongoDB: This stores the metadata of Items, MediaItems and WebPages, as well as auxiliary data, such as the Twitter accounts to monitor, and the URLs to fetch.
  • Solr: This hosts the Items after having them populated with their metadata as well as the DySCOs  in two separate fully searchable collections. Also, in a separate collection, it stores MediaItems associated with a set of properties in order to be searchable by a full text search query. Finally, for the purposes of the n-gram analysis, which is involved in the Dysco Creation, the TopicDetectionItems collection has been created. The latter is not a permanent storage but a temporary repository for the processed Items of each timeslot.
  • mm-index: This index is dedicated to the indexing of image features to enable fast and scalable similarity-based search. The underlying indexing mechanism is documented in D4.2 and thorough evaluation results are available in D4.3. Its source code is also available in the multimedia-indexing GitHub project.
  • infotainmentDB: This is a database, dedicated to the infotainment use case, storing the schema and content around an event of interest (e.g. for a film festival: film program, film details, directors, etc.).

Access to those is possible through methods of the socialsensor-framework-client as well as through a REST API (e.g. the infotainment API).