Vis2 - OCR(), ImageIdentify()

Post your working scripts, libraries and tools
renmacro
Posts: 15
Joined: 05 Mar 2018, 23:30

Re: Vis2 - OCR(), ImageIdentify()

15 May 2018, 15:53

iseahound wrote:Training is possible, but I haven't included the necessary command line utilities. You're right in that training may help, but unless you have some wacky, specific font, training is unlikely to beat the pre-trained model that is tesseract_best. Since your 0 just has a line through it which is pretty common for a zero, it's probably the upscaling step that's ruining it. I suspect it's 0 or 8 about half the time? The image is upscaled 3.5x before processing, maybe changing it to 2 or 2.5 may help. I think you can control F for 3.5. Finally there's a lot of improvements that can be made. I intend to improve it sometime in the future.


Thank you for the reply! Would changing static scaleFactor := 3.5 (line 2169) be all that is needed to reduce the magnification? I will give this a try, and also I am also going to mess with the application's scaling/font size to see if it helps.

EDIT: On first run it read 100% accurate at 2.5, I'm going to have it do a ton of random reads and check the logs for variance. Thank you again!

EDIT2: Getting still some bad reads, but a lot better, playing with the numbers and UI settings.

EDIT3: Works well enough to just sample a bunch of times and use majority consensus. Thank you!
iseahound
Posts: 222
Joined: 13 Aug 2016, 21:04
GitHub: iseahound

Re: Vis2 - OCR(), ImageIdentify()

15 May 2018, 18:52

static scaleFactor := 3.5 (line 2169) is all that is needed. You can use Control + Space in Advanced UI mode to bring up the preprocessed image preview. If you get good results with specific settings, let me know so I can improve Vis2 in the future.
renmacro
Posts: 15
Joined: 05 Mar 2018, 23:30

Re: Vis2 - OCR(), ImageIdentify()

15 May 2018, 19:16

iseahound wrote:static scaleFactor := 3.5 (line 2169) is all that is needed. You can use Control + Space in Advanced UI mode to bring up the preprocessed image preview. If you get good results with specific settings, let me know so I can improve Vis2 in the future.


At this rate, with the testing code I made, I am sure I can do a running percentage of successful hits vs unsuccessful and just let it run for a while and adjust the values. Then I can test across like 2-3.5 in .1 increments and let you know what my specific test found most accurate.

This is assuming my application's values stay at 180,000.XX roughly and can continue to test tomorrow when I can run this for you.
renmacro
Posts: 15
Joined: 05 Mar 2018, 23:30

Re: Vis2 - OCR(), ImageIdentify()

15 May 2018, 20:34

Ok so I just threw it together, and this was over about 1,000 samples per test, unless it just failed miserably then I stopped it after a couple hundred.
3.5 - failed hard, something around 50%, would fail my majority consensus from time to time
3.1 - failing like 2.8
3.0 - ~85.5%, never failed majority
2.9 - ~90%, never failed majority
2.8 - passed consensus every time but like 2 or 3 times, otherwise about 68% accuracy
2.75 - failed like 3.5
2.5 - ~80.5%, never failed majority
2.0 - failed 100%, but was very consistent. It consistently interpreted one 0 as a 9, and was about 76% accurate at it. It didn't get the right consensus once.

I thought these results were very... odd, so I reran some of them and they came in at essentially the same percentages... My font size is very crisp and basic and should be very easily readable and is about 10px tall. I'm not sure if the different magnification brings the OCR into different "recognition points" or what? Or possibly the scaling is more crisp at different points?

I also ran out of time but kind of want to run more readings.
iseahound
Posts: 222
Joined: 13 Aug 2016, 21:04
GitHub: iseahound

Re: Vis2 - OCR(), ImageIdentify()

15 May 2018, 21:18

It's binarizing first so (black & white image) then upscaling that for speed. Which I don't like because it causes "jagged edges" similar to aliased font. The upscaling algorithm is bad and is a core weakness at this point. When the image is doubled like 1x, 2x, 4x, it's not really doing anything. But if it's anything else, the upscaling "adds" information into the image, which can be good or bad depending on the algorithm.

The data is informative. Apologies if this response feels like a run-on sentence. For your information text that is 10px is known to be too small and inaccurate for tesseract. Google Cloud Vision can handle everything better but it's a paid service. It's available for you to try if you download the meta branch at https://github.com/iseahound/Vis2/archive/meta.zip . Note that you'll call Tesseract via Vis2.provider.Tesseract.TextRecognize() and Google via Vis2.provider.Google.TextRecognize() Some features like a convenient OCR() wrapper aren't available yet on the meta branch.
renmacro
Posts: 15
Joined: 05 Mar 2018, 23:30

Re: Vis2 - OCR(), ImageIdentify()

15 May 2018, 23:07

iseahound wrote:It's binarizing first so (black & white image) then upscaling that for speed. Which I don't like because it causes "jagged edges" similar to aliased font. The upscaling algorithm is bad and is a core weakness at this point. When the image is doubled like 1x, 2x, 4x, it's not really doing anything. But if it's anything else, the upscaling "adds" information into the image, which can be good or bad depending on the algorithm.

The data is informative. Apologies if this response feels like a run-on sentence. For your information text that is 10px is known to be too small and inaccurate for tesseract. Google Cloud Vision can handle everything better but it's a paid service. It's available for you to try if you download the meta branch at https://github.com/iseahound/Vis2/archive/meta.zip . Note that you'll call Tesseract via Vis2.provider.Tesseract.TextRecognize() and Google via Vis2.provider.Google.TextRecognize() Some features like a convenient OCR() wrapper aren't available yet on the meta branch.


Interesting! I did give Capture2Text a try and it seemed like that was a similar process. The Google Cloud Vision would be VERY expensive for what I am doing :shock:

For my use, setting it to 2.9 and then sampling 10 times and going with the majority result will work perfect for what I am doing. Great software, thanks again for making and sharing it!
renmacro
Posts: 15
Joined: 05 Mar 2018, 23:30

Re: Vis2 - OCR(), ImageIdentify()

16 May 2018, 14:08

So I modified my macro to run through a stepping of the upscaling and here are the results I got after about 150 runs of each (ended up giving me a roughly accurate number quicker):

Spoiler


So I think it may be a neat feature to add in a function to draw a box around an example of text you will be OCRing often and then have it go through this stepping process and spit out the results of what would be most accurate, with the user confirming that the OCR was correct. Here is my basic code I used to accomplish this (I am not an experienced programmer :) ). I just used a desktop file to transfer the values back and forth between your Vis2 script. I'm sure you would have a more elegant way.

I also threw in some random +/- 1 pixel x/y and w/h because I noticed tesseract would sometimes give different reads when the box was slightly different, so I wanted to factor for that in the majority (instead of just getting the wrong read all 10 times because the image reading area was the same).

Code: [Select all] [Expand] [Download] GeSHi © Codebox Plus



My log output would look like this, the value of course I was OCRing was 180,000.21:


I also changed your script to use the text document value at line 2169:

Code: [Select all] [Download] GeSHi © Codebox Plus

		  FileRead, tempSCALEFACTOR, <your user>\Desktop\scalefactor.txt
scaleFactor := tempSCALEFACTOR
iseahound
Posts: 222
Joined: 13 Aug 2016, 21:04
GitHub: iseahound

Re: Vis2 - OCR(), ImageIdentify()

16 May 2018, 18:53

Fantastic. This confirms what I've long suspected, small variances like 2.05x vs 2x give the best results. I'll implement a Lanczos (3-tap) image upscaler soon, while only using Leptonica for image binariazation. Lanczos3 uses a sinc(x) function and is commonly used to upscale SD to HD video. Windows has a built in Bi-cubic, but Lanczos is much better. Keep in mind that OCR still has a long way to go... but the best usage of OCR right now is to copy text on-screen when one of those websites prevents you from highlighting text.

Return to “Scripts and Functions”

Who is online

Users browsing this forum: arcticir, kunkel321 and 10 guests