| Title | A crawler architecture for harvesting the clear, social, and dark web for IoT-related cyber-threat intelligence |  
 | Publication Type | Conference Paper |  
 | Year of Publication | 2016 |  
 | Authors | Koloveas P, Chantzios T, Tryfonopoulos C, Skiadopoulos S |  
 | Conference Name | In Proceedings of the IEEE Workshop on Cyber Security & Resilience in the Internet of Things (CSRIoT @ IEEE Services) |  
 | Date Published | July 2016 |  
 | Publisher | IEEE |  
 | Conference Location | Milan, Italy |  
 | Keywords | crawling architecture, cyber-security, cyber-threat intelligence, IoT, language models, Machine Learning |  
 | Abstract | The clear, social, and dark web have lately been identified as rich sources of valuable cyber-security information that –given the appropriate tools and methods– may be identified, crawled and subsequently leveraged to actionable cyber- threat intelligence. In this work, we focus on the information gathering task, and present a novel crawling architecture for transparently harvesting data from security websites in the clear web, security forums in the social web, and hacker fo- rums/marketplaces in the dark web. The proposed architecture adopts a two-phase approach to data harvesting. Initially a machine learning-based crawler is used to direct the harvesting towards websites of interest, while in the second phase state- of-the-art statistical language modelling techniques are used to represent the harvested information in a latent low-dimensional feature space and rank it based on its potential relevance to the task at hand. The proposed architecture is realised using exclusively open-source tools, and a preliminary evaluation with crowdsourced results demonstrates its effectiveness. |  
 | URL | http://users.uop.gr/ trifon/papers/pdf/csriot19-KCTS.pdf |