Phishing detection on webpages in European non-English languages based on machine learning

  • 0Department of Telecommunications, Brno University of Technology, Brno, Czech Republic. komosny@vut.cz.

|

|

Summary

This summary is machine-generated.

This study enhances machine learning phishing detection for minor European languages, achieving 99% accuracy and significantly reducing false positives on local webpages. The new method boosts cybersecurity for underserved language communities.

Area Of Science

  • Cybersecurity
  • Machine Learning
  • Natural Language Processing

Background

  • Current machine learning phishing detection is effective for English but lacks accuracy for minor languages.
  • Zero-day phishing attacks pose a significant threat, necessitating robust detection methods.

Purpose Of The Study

  • To improve phishing detection accuracy for webpages in minor European languages.
  • To reduce the false positive rate in detecting phishing attempts on local language websites.

Main Methods

  • Development of a novel language-based phishing detection model.
  • Testing the model on approximately two million local webpages from 16 European countries.
  • Statistical validation using Shapiro-Wilk and Paired T-tests to ensure robustness.

Main Results

  • Achieved 99% accuracy for phishing detection on local webpages in 12 European countries.
  • Reduced the false positive rate by up to a factor of 10 for minor language webpages.
  • Demonstrated statistically significant and robust performance across diverse webpage sets.

Conclusions

  • The proposed language-based phishing detection significantly outperforms existing methods for minor languages.
  • This advancement enhances global cybersecurity by providing more inclusive protection.
  • Publicly releasing code and data promotes reproducibility and further research.