Fake news detection using text similarity approach

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
G.B. Pant University of Agriculture and Technology, Pantnagar - 263145 (Uttarakhand)
The present work proposes a methodology for detecting Fake News in online news media using Text Similarity Approach. In this era of digitization, most of the people now get news from internet and often it can be difficult to tell whether stories are credible or not. Information overload and a general lack of understanding about how the internet works by people has also contributed to an increase in fake news or hoax stories. Traditionally we got our news from trusted sources, journalists and media outlets that are required to follow strict codes of practice. However, the internet has enabled a completely new way to publish, share and consume information and news with very little regulation or editorial standards. The present work is aimed to develop an automatic fake news detection system for analysing the credibility of online news. So that the reader become aware about the news that is factually incorrect and optimized for sharing. News articles are nothing but a piece of text. Hence, the proposed work can be divided into two subtasks; Text Analysis and Performance Evaluation. Text analysis is done for the transformation of text into numerical features. These numerical features are then used for matching the similarity between queried article and other articles. For articles similarity I have used hybrid of three text similarity approaches, two methods from lexical similarity features (N-grams (Character Based) and Cosine Similarity method (Corpus Based) and one from semantic similarity feature (Explicit Semantic Analysis (ESA) - TF*IDF (Term Based Similarity)). Python 3.5 is used for programming. System is tested for 100 news articles and analysed that if more than three articles have matching with matching value ≥ 0.70 and < 0.80, then it will result to truthiness of the input article. Our proposed system has gained the accuracy of 91.67%.