Skip to content

Tesseract

Tesseract

Extract Text from an Image

  • Cd to the folder with the image/screenshots you wish to extract text from.
  • Run tesseract on an image to test, here we extract the text of a Screenshot to a file called test.txt using a resolution of 150 dpi
    1
    tesseract Screenshot_2022-08-16-21-27-12-14_1ce46c7c043b13bd654694576893861e.jpg test --dpi 150
    

Batch Extract Text

  • If Extracting Text from an Image was successful, you’re ready to batch process it.
  • First cd to the folder with the images/screenshots you wish to extract text from
  • Now run the following command

    1
    2
    3
    4
    5
    6
    mkdir text
    for f in `ls *.jpg`
    do
        BN=$(echo ${f%.*})
        tesseract $f ./text/${BN} --dpi 150
    done
    
  • The above command loops over all jpegs in the current folder, removes the .jpg extension and stores that to BN, then uses the tesseract command to process the image and output it as ${BN}.txt with a resolution of 150dpi.