Improving Collaborative Filtering’s Rating Prediction Coverage in Sparse Datasets through the Introduction of Virtual Near Neighbors

TitleImproving Collaborative Filtering’s Rating Prediction Coverage in Sparse Datasets through the Introduction of Virtual Near Neighbors
Publication TypeConference Paper
Year of Publication2019
AuthorsMargaris D, Vasilopoulos D, Vassilakis C, Spiliotopoulos D
Conference NameProceedings of the 10th International Conference on Information, Intelligence, Systems and Applications (IISA2019)
Date Publishedjul
Keywordscollaborative filtering, Cosine Similarity, Evaluation, Pearson Correlation Coefficient, Sparse Datasets, Virtual Near Neighbors
AbstractCollaborative filtering creates personalized recommendations by considering ratings entered by users. Collaborative filtering algorithms initially detect users whose likings are alike, by ex-ploring the similarity between ratings that have insofar been submitted. Users having a high degree of similarity regarding their ratings are termed near neighbors, and in order to formu-late a recommendation for a user, her near neighbors’ ratings are extracted and form the basis for the recommendation. Col-laborative filtering algorithms however exhibit the problem commonly referred to as “gray sheep”: this pertains to the case where for some users no near neighbors can be identified, and hence no personalized recommendations can be computed. The “gray sheep” problem is more severe in sparse datasets, i.e. datasets where the number of ratings is small, compared to the number of items and users. In this paper, we address the “gray sheep” problem by introducing the concept of virtual near neighbors and a related algorithm for their creation on the basis of the existing ones. We evaluate the proposed algorithm, which is termed as CFVNN, using eight widely used datasets and consid-ering two correlation metrics which are widely used in Collabo-rative Filtering research, namely the Pearson Correlation Coef-ficient and the Cosine Similarity. The results show that the pro-posed algorithm considerably leverages the capability of a Col-laborative Filtering system to compute personalized recommen-dations in the context of sparse datasets, tackling thus efficiently the “gray sheep” problem. In parallel, the CFVNN algorithm achieves improvements in rating prediction quality, as this is expressed through the Mean Absolute Error and the Root Mean Square Error metrics.