Enhancing User Rating Database Consistency through Pruning

Dionisis Margaris and Costas Vassilakis
Transactions on Large-Scale Data- and Knowledge-Centered Systems, special issue on Consistency and Inconsistency in Data-centric Applications, Springer
Abstract:

Recommender systems are based on information about users' past behavior to formulate recommendations about their future actions. However, as time goes by the interests and likings of people may change: people listen to different singers or even different types of music, watch different types of movies, read different types of books and so on. Due to this type of changes, an amount of inconsistency is introduced in the database since a portion of it does not reflect the current preferences of the user, which is its intended purpose.
In this paper, we present a pruning technique that removes old aged user behavior data from the ratings database, which are bound to correspond to invalidated preferences of the user. Through pruning (1) inconsistencies are removed and data quality is upgraded, (2) better rating prediction generation times are achieved and (3) the ratings database size is reduced. We also propose an algorithm for determining the amount of pruning that should be performed, allowing the tuning and operation of the pruning algorithm in an unsupervised fashion.
The proposed technique is evaluated and compared against seven aging algorithms, which reduce the importance of aged ratings, and a state-of-the-art pruning algorithm, using datasets with varying characteristics. It is also validated using two distinct rating prediction computation strategies, namely collaborative filtering and matrix factorization. The proposed technique needs no extra information concerning the items' characteristics (e.g. categories that they belong to or attributes' values), can be used in all rating databases that include a timestamp and has been proved to be effective in any size of users-items database and under two rating prediction computation strategies.

Note: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

Year: 
Research area: