How to get Text from Image using Tesseract ?
- Tesseract is an OCR engine which recognize characters in an various types of Images(png,jpg etc.,.)
- Tesseract is considered as one of the most accurate open source OCR engine.
- Google provides an easy way to convert Image and PDF files into editable text for free of cost by using its built-in OCR feature.
Tesseract |
Now we will see a program how to do it using Tesseract with Java:
Firstly, Add Tesseract for Java as maven dependency to your project.
<dependency>
<groupId>net.sourceforge.tess4j</groupId>
<artifactId>tess4j</artifactId>
<version>4.4.1</version>
</dependency>
After adding maven dependency,below is code we can use:
import java.io.File;
import net.sourceforge.tess4j.ITesseract;
import net.sourceforge.tess4j.Tesseract;
public class CaptchaTesseract { public static void main(String args[]) throws Exception { ITesseract image = new Tesseract(); String str = image.doOCR(new File("D:\\Imagepath\\image_text.png")); System.out.println("Extract Text from Image using Tesseract:"); System.out.println(str); } }
Where image_text.png is my input image(below image) where i will extract text.
Run the code then we see output like below:Extract Text from Image using Tesseract: You are always stronger than you think you are Note: We can also achieve this using Tesseract with python
Please comment below to feedback or ask questions.
No comments:
Post a Comment
Please comment below to feedback or ask questions.