Integrating OCR and NLP Techniques for Accurate Text Extraction and Plagiarism Detection in Image-Based Content

Dr. Palvadi Srinivas Kumar, Dr. Krishna Prasad

doi:10.48165/bapas.2024.44.2.1

PDF

Published: Sep 20, 2024

DOI: https://doi.org/10.48165/bapas.2024.44.2.1

Keywords:

Optical Character Recognition (OCR), Natural Language Processing (NLP), Image-Based Text Extraction, Plagiarism Detection, Text Plagiarism in Images, Automated Content Verification, Image Analysis, Document Authenticity, Content Originality, Image-to-Text Conversion

Dr. Palvadi Srinivas Kumar, Dr. Krishna Prasad

Abstract

In the digital age, images often contain valuable text-based information, including numbers, symbols, and other data. Efficient extraction and verification of this content is critical, particularly in academic and content-driven domains where originality is paramount. This paper presents a novel approach to detecting plagiarism in text embedded within images. The proposed method leverages Optical Character Recognition (OCR) to extract text from images and applies Natural Language Processing (NLP) techniques to evaluate the originality of the extracted content. By comparing the text against a comprehensive database of existing sources, the system is capable of identifying potential plagiarism while distinguishing between original and copied content. This approach ensures that not only text in conventional documents but also in images is scrutinized for authenticity, enhancing the reliability of plagiarism detection in diverse content formats. The proposed solution offers an efficient and automated pipeline for image-based text extraction and plagiarism detection, applicable in educational, legal, and content creation environments.

Issue

Vol. 44 No. 3 (2024): LIB PRO. 44(3), JUL-DEC 2024

Section

Articles

Article Sidebar

Main Article Content

Abstract

Article Details