| Title | A Machine Learning Framework for Harvesting and Harmonizing Cultural and Touristic Data |
| Publication Type | Journal Article |
| Year of Publication | 2025 |
| Authors | Deligiannis K, Tryfonopoulos C, Raftopoulou P, Vassilakis C, Kaffes V, Skiadopoulos S |
| Journal | Information |
| Volume | 16 |
| Pagination | 1038 |
| ISSN | 2078-2489 |
| Keywords | Cultural heritage, data augmentation, data homogenization, digital heritage, Machine Learning, named entity recognition, social media analysis, tourism analytics, trajectory extraction, web scraping |
| Abstract | Cultural and touristic information is increasingly available through a multitude of heterogeneous sources, including official repositories, community platforms, and open data initiatives. While prominent landmarks are typically covered across sources, less-known attractions are also documented with varying degrees of detail, resulting in fragmented, overlapping, or complementary content. To enable integrated access to this wealth of information, harvesting and consolidation mechanisms are required to collect, reconcile, and unify distributed content referring to the same entities. This paper presents a machine learning-driven framework for harvesting, homogenizing, and augmenting cultural and touristic data across multilingual sources. Our approach addresses entity resolution, duplication detection, and content harmonization, laying the foundation for enriched, unified representations of attractions and points of interest. The framework is designed to support scalable integration pipelines and can be deployed in applications aimed at tourism promotion, digital heritage, and smart travel services. |
| URL | https://www.mdpi.com/2078-2489/16/12/1038 |
| DOI | 10.3390/info16121038 |