Tuesday, February 28, 2023

Fetch Text from Image & PDF Using Selenium Java

In this blog, we will learn how we can fetch data from images and PDFs.

This Blog Contains:

Read Text From Image Using OCR with Tesseract (tess4j)
Reading PDF Text Using PDFUtil
Save PDF as Image Using PDFUtil
Extract Images From PDF Using PDFUtil

Fetch Text From Image In Selenium

To get a text from the Image in selenium, we use Optical Character Recognition (OCR) with Tesseract (tess4j). Tesseract Supports UTF-8 Unicode.

First, we need to create a folder with the name “tesseract” in our project and put trained data in that folder. You can find trained data for any language from the below URL:

https://github.com/tesseract-ocr/tessdata

Just Download eng. trained data for English Language and put it into Tesseract Folder for your project.

Add below is maven dependency for tesseract (tess4j):

<dependency>
<groupId>net.sourceforge.tess4j</groupId>
<artifactId>tess4j</artifactId>
<version>4.5.4</version>
</dependency>

Below is the Java code to fetch text from the image:

ITesseract image = new Tesseract();
image.setDatapath(“Location for TessData Folder”);
image.setLanguage(“eng”);
String str1 = image.doOCR(new File(“Location Of Image”));

Read Also:- Process Java Script Executor in Selenium Test Automation

Fetch Text From PDF

Add Below Maven Dependency For PDFUtil

<dependency>
<groupId>com.testautomationguru.pdfutil</groupId>
<artifactId>pdf-util</artifactId>
<version>0.0.3</version>
</dependency>

Below Java Code is used to Read Text From PDF

String pdfLocation = “Location where we have PDF File”;
PDFUtil pdfUtil = new PDFUtil();
String text = pdfUtil.getText(pdfLocation);

Below Java Code is used to Save PDF as an Image

String folderLocation = “Location Where we need to save Image”;
String pdfLocation = “Location where we have PDF File”;
PDFUtil pdfUtil = new PDFUtil();
pdfUtil.setImageDestinationPath(folderLocation);
pdfUtil.savePdfAsImage(pdfLocation);

Below Java Code is used to Fetch Image From PDF

String folderLocation = “Location Where we need to save Image”;
String pdfLocation = “ Location where we have PDF File”;
PDFUtil pdfUtil = new PDFUtil();
pdfUtil.setImageDestinationPath(folderLocation);
pdfUtil.extractImages(pdfLocation);

Original Source:-https://www.devstringx.com/using-selenium-java

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)