Fuzzy based semantic clustering of news articles

dc.contributor.advisorJoshi, Sanjay
dc.contributor.authorPriyanka
dc.date.accessioned2019-01-21T09:13:02Z
dc.date.available2019-01-21T09:13:02Z
dc.date.issued2018-10
dc.description.abstractText mining is a process that uses data mining approaches to extract valuable information held in the hidden form in textual data. In this paper, a framework for fuzzy clustering of news articles is proposed. These news articles originate on different news portals on the web. The data sets are fetched from two different Indian news portals, The Hindu archive and Times Of India archive. Six data sets are used for implementation and evaluation: 4 news articles Times of India, 150 news articles Times of India, 1000 news articles Times of India, 4 news articles The Hindu, 150 news articles The Hindu, 1000 news articles The Hindu. The fetched data is stored in a central database and then preprocessing reduces the noise. Tokenization is done to split the text content into separate words. Stop words are removed from the text data as they have no significance for cluster discrimination. Then lemmatization technique is applied. Tf-idf is calculated for the data set and saved in the word frequency vector. On these vectors, distance measure or similarity measure function is used to find the similarity between articles. Tf-idf with cosine similarity measure gives semantic similarity between articles. One article may belong to more than one cluster so fuzzy membership values must be generated. The articles are clustered using two clustering algorithms k-means clustering and fuzzy c-means clustering. The similar documents are grouped into same cluster and dissimilar documents are put into different clusters. The proposed framework shows that fuzzy clustering does not restrict each news article to belong exactly to one cluster. Therefore this framework when applied to information retrieval systems or other application systems, system gives better performance and relevance to the users.en_US
dc.identifier.urihttp://krishikosh.egranth.ac.in/handle/1/5810091836
dc.keywordsfuzzy logic, semantics, clusteringen_US
dc.language.isoenen_US
dc.pages115en_US
dc.publisherG.B. Pant University of Agriculture and Technology, Pantnagar - 263145 (Uttarakhand)en_US
dc.research.problemFuzzy Logicen_US
dc.subInformation Technologyen_US
dc.subjectnullen_US
dc.themeSementicsen_US
dc.these.typeM.Tech.en_US
dc.titleFuzzy based semantic clustering of news articlesen_US
dc.typeThesisen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Priyanka.pdf
Size:
4.9 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description:
Collections