Pages

Get text from Image using Tesseract

How to get Text from Image using Tesseract ?
  • Tesseract is an OCR engine which recognize characters in an various types of Images(png,jpg etc.,.)
  • Tesseract is considered as one of the most accurate open source OCR engine.
  • Google provides an easy way to convert Image and PDF files into editable text for free of cost by using its built-in OCR feature.
Tesseract
Tesseract
Now we will see a program how to do it using Tesseract with Java:

Firstly, Add Tesseract for Java as maven dependency to your project.
<dependency>
    <groupId>net.sourceforge.tess4j</groupId>
    <artifactId>tess4j</artifactId>
    <version>4.4.1</version>
</dependency>
After adding maven dependency,below is code we can use:
import java.io.File;
import net.sourceforge.tess4j.ITesseract;
import net.sourceforge.tess4j.Tesseract;
public class CaptchaTesseract {
    public static void main(String args[]) throws Exception {
        ITesseract image = new Tesseract();
        String str = image.doOCR(new File("D:\\Imagepath\\image_text.png"));
        System.out.println("Extract Text from Image using Tesseract:");
        System.out.println(str);
    }
}
Where image_text.png is my input image(below image) where i will extract text.
Run the code then we see output like below:
Extract Text from Image using Tesseract:
You are always
stronger than
you think you
are
Note: We can also achieve this using Tesseract with python
Please comment below to feedback or ask questions.

No comments:

Post a Comment

Please comment below to feedback or ask questions.