Vis2 - Image to Text OCR()

Post your working scripts, libraries and tools for AHK v1.1 and older
renmacro
Posts: 22
Joined: 05 Mar 2018, 23:30

Re: Vis2 - OCR(), ImageIdentify()

15 May 2018, 15:53

iseahound wrote:Training is possible, but I haven't included the necessary command line utilities. You're right in that training may help, but unless you have some wacky, specific font, training is unlikely to beat the pre-trained model that is tesseract_best. Since your 0 just has a line through it which is pretty common for a zero, it's probably the upscaling step that's ruining it. I suspect it's 0 or 8 about half the time? The image is upscaled 3.5x before processing, maybe changing it to 2 or 2.5 may help. I think you can control F for 3.5. Finally there's a lot of improvements that can be made. I intend to improve it sometime in the future.
Thank you for the reply! Would changing static scaleFactor := 3.5 (line 2169) be all that is needed to reduce the magnification? I will give this a try, and also I am also going to mess with the application's scaling/font size to see if it helps.

EDIT: On first run it read 100% accurate at 2.5, I'm going to have it do a ton of random reads and check the logs for variance. Thank you again!

EDIT2: Getting still some bad reads, but a lot better, playing with the numbers and UI settings.

EDIT3: Works well enough to just sample a bunch of times and use majority consensus. Thank you!
iseahound
Posts: 1434
Joined: 13 Aug 2016, 21:04
Contact:

Re: Vis2 - OCR(), ImageIdentify()

15 May 2018, 18:52

static scaleFactor := 3.5 (line 2169) is all that is needed. You can use Control + Space in Advanced UI mode to bring up the preprocessed image preview. If you get good results with specific settings, let me know so I can improve Vis2 in the future.
renmacro
Posts: 22
Joined: 05 Mar 2018, 23:30

Re: Vis2 - OCR(), ImageIdentify()

15 May 2018, 19:16

iseahound wrote:static scaleFactor := 3.5 (line 2169) is all that is needed. You can use Control + Space in Advanced UI mode to bring up the preprocessed image preview. If you get good results with specific settings, let me know so I can improve Vis2 in the future.
At this rate, with the testing code I made, I am sure I can do a running percentage of successful hits vs unsuccessful and just let it run for a while and adjust the values. Then I can test across like 2-3.5 in .1 increments and let you know what my specific test found most accurate.

This is assuming my application's values stay at 180,000.XX roughly and can continue to test tomorrow when I can run this for you.
renmacro
Posts: 22
Joined: 05 Mar 2018, 23:30

Re: Vis2 - OCR(), ImageIdentify()

15 May 2018, 20:34

Ok so I just threw it together, and this was over about 1,000 samples per test, unless it just failed miserably then I stopped it after a couple hundred.
3.5 - failed hard, something around 50%, would fail my majority consensus from time to time
3.1 - failing like 2.8
3.0 - ~85.5%, never failed majority
2.9 - ~90%, never failed majority
2.8 - passed consensus every time but like 2 or 3 times, otherwise about 68% accuracy
2.75 - failed like 3.5
2.5 - ~80.5%, never failed majority
2.0 - failed 100%, but was very consistent. It consistently interpreted one 0 as a 9, and was about 76% accurate at it. It didn't get the right consensus once.

I thought these results were very... odd, so I reran some of them and they came in at essentially the same percentages... My font size is very crisp and basic and should be very easily readable and is about 10px tall. I'm not sure if the different magnification brings the OCR into different "recognition points" or what? Or possibly the scaling is more crisp at different points?

I also ran out of time but kind of want to run more readings.
iseahound
Posts: 1434
Joined: 13 Aug 2016, 21:04
Contact:

Re: Vis2 - OCR(), ImageIdentify()

15 May 2018, 21:18

It's binarizing first so (black & white image) then upscaling that for speed. Which I don't like because it causes "jagged edges" similar to aliased font. The upscaling algorithm is bad and is a core weakness at this point. When the image is doubled like 1x, 2x, 4x, it's not really doing anything. But if it's anything else, the upscaling "adds" information into the image, which can be good or bad depending on the algorithm.

The data is informative. Apologies if this response feels like a run-on sentence. For your information text that is 10px is known to be too small and inaccurate for tesseract. Google Cloud Vision can handle everything better but it's a paid service. It's available for you to try if you download the meta branch at https://github.com/iseahound/Vis2/archive/meta.zip . Note that you'll call Tesseract via Vis2.provider.Tesseract.TextRecognize() and Google via Vis2.provider.Google.TextRecognize() Some features like a convenient OCR() wrapper aren't available yet on the meta branch.
renmacro
Posts: 22
Joined: 05 Mar 2018, 23:30

Re: Vis2 - OCR(), ImageIdentify()

15 May 2018, 23:07

iseahound wrote:It's binarizing first so (black & white image) then upscaling that for speed. Which I don't like because it causes "jagged edges" similar to aliased font. The upscaling algorithm is bad and is a core weakness at this point. When the image is doubled like 1x, 2x, 4x, it's not really doing anything. But if it's anything else, the upscaling "adds" information into the image, which can be good or bad depending on the algorithm.

The data is informative. Apologies if this response feels like a run-on sentence. For your information text that is 10px is known to be too small and inaccurate for tesseract. Google Cloud Vision can handle everything better but it's a paid service. It's available for you to try if you download the meta branch at https://github.com/iseahound/Vis2/archive/meta.zip . Note that you'll call Tesseract via Vis2.provider.Tesseract.TextRecognize() and Google via Vis2.provider.Google.TextRecognize() Some features like a convenient OCR() wrapper aren't available yet on the meta branch.
Interesting! I did give Capture2Text a try and it seemed like that was a similar process. The Google Cloud Vision would be VERY expensive for what I am doing :shock:

For my use, setting it to 2.9 and then sampling 10 times and going with the majority result will work perfect for what I am doing. Great software, thanks again for making and sharing it!
renmacro
Posts: 22
Joined: 05 Mar 2018, 23:30

Re: Vis2 - OCR(), ImageIdentify()

16 May 2018, 14:08

So I modified my macro to run through a stepping of the upscaling and here are the results I got after about 150 runs of each (ended up giving me a roughly accurate number quicker):
Spoiler
So I think it may be a neat feature to add in a function to draw a box around an example of text you will be OCRing often and then have it go through this stepping process and spit out the results of what would be most accurate, with the user confirming that the OCR was correct. Here is my basic code I used to accomplish this (I am not an experienced programmer :) ). I just used a desktop file to transfer the values back and forth between your Vis2 script. I'm sure you would have a more elegant way.

I also threw in some random +/- 1 pixel x/y and w/h because I noticed tesseract would sometimes give different reads when the box was slightly different, so I wanted to factor for that in the majority (instead of just getting the wrong read all 10 times because the image reading area was the same).

Code: Select all

	scaleFACTORTEXT := 1.5
	FileDelete, <your user>\Desktop\scalefactor.txt
	FileAppend, %scaleFACTORTEXT%, <your user>\Desktop\scalefactor.txt
	Loop
	{
		loopCOUNT := 0
		runningPERCENT := 0
		Loop, 15
		{
			ocrONE := 0
			ocrONESUCCESS := 0
			ocrTWO := 0
			ocrTWOSUCCESS := 0
			ocrTHREE := 0
			ocrTHREESUCCESS := 0
			ocrFOUR := 0
			ocrFOURSUCCESS := 0
			ocrFIVE := 0
			ocrFIVESUCCESS := 0
			loopCOUNT += 1
			Loop, 10
			{
				Random, randYESNO, 0, 3
				Random, randYESNO2, 0, 1
				Random, randYESNO3, 0, 1
				Random, randYESNO4, 0, 1
				Random, randYESNO5, 0, 1
				if randYESNO = 0
				{
					ocrTEXT := OCR([10 + randYESNO2, 10 + randYESNO3, 50 + randYESNO4, 20 + randYESNO5])
				}
				if randYESNO = 1
				{
					ocrTEXT := OCR([10 + randYESNO2, 10 - randYESNO3, 50 + randYESNO4, 20 - randYESNO5])
				}
				if randYESNO = 2
				{
					ocrTEXT := OCR([10 - randYESNO2, 10 + randYESNO3, 50 - randYESNO4, 20 + randYESNO5])
				}
				if randYESNO = 3
				{
					ocrTEXT := OCR([10 - randYESNO2, 10 - randYESNO3, 50 - randYESNO4, 20 - randYESNO5])
				}
				if ocrONE = 0
				{
					ocrONE := ocrTEXT
					ocrONESUCCESS += 1
				}
				else if (ocrTEXT = ocrONE)
				{
					ocrONESUCCESS += 1
				}
				else if (ocrTWO = 0)
				{
					ocrTWO := ocrTEXT
					ocrTWOSUCCESS += 1
				}
				else if (ocrTEXT = ocrTWO)
				{
					ocrTWOSUCCESS += 1
				}
				else if (ocrTHREE = 0)
				{
					ocrTHREE := ocrTEXT
					ocrTHREESUCCESS += 1
				}
				else if (ocrTEXT = ocrTHREE)
				{
					ocrTHREESUCCESS += 1
				}
				else if (ocrFOUR = 0)
				{
					ocrFOUR := ocrTEXT
					ocrFOURSUCCESS += 1
				}
				else if (ocrTEXT = ocrFOUR)
				{
					ocrFOURSUCCESS += 1
				}
				else if (ocrFIVE = 0)
				{
					ocrFIVE := ocrTEXT
					ocrFIVESUCCESS += 1
				}
				else if (ocrTEXT = ocrFIVE)
				{
					ocrFIVESUCCESS += 1
				}
				if (ocrONESUCCESS > ocrTWOSUCCESS) && (ocrONESUCCESS > ocrTHREESUCCESS) && (ocrONESUCCESS > ocrFOURSUCCESS) && (ocrONESUCCESS > ocrFIVESUCCESS)
				{
					majorityVALUE := ocrONE
					correctPERCENT := ocrONESUCCESS / 10
				}
				if (ocrTWOSUCCESS > ocrONESUCCESS) && (ocrTWOSUCCESS > ocrTHREESUCCESS) && (ocrTWOSUCCESS > ocrFOURSUCCESS) && (ocrTWOSUCCESS > ocrFIVESUCCESS)
				{
					majorityVALUE := ocrTWO
					correctPERCENT := ocrTWOSUCCESS / 10
				}
				if (ocrTHREESUCCESS > ocrTWOSUCCESS) && (ocrTHREESUCCESS > ocrONESUCCESS) && (ocrTHREESUCCESS > ocrFOURSUCCESS) && (ocrTHREESUCCESS > ocrFIVESUCCESS)
				{
					majorityVALUE := ocrTHREE
					correctPERCENT := ocrTHREESUCCESS / 10
				}
				if (ocrFOURSUCCESS > ocrTWOSUCCESS) && (ocrFOURSUCCESS > ocrTHREESUCCESS) && (ocrFOURSUCCESS > ocrONESUCCESS) && (ocrFOURSUCCESS > ocrFIVESUCCESS)
				{
					majorityVALUE := ocrFOUR
					correctPERCENT := ocrFOURSUCCESS / 10
				}
				if (ocrFIVESUCCESS > ocrTWOSUCCESS) && (ocrFIVESUCCESS > ocrTHREESUCCESS) && (ocrFIVESUCCESS > ocrFOURSUCCESS) && (ocrFIVESUCCESS > ocrONESUCCESS)
				{
					majorityVALUE := ocrFIVE
					correctPERCENT := ocrFIVESUCCESS / 10
				}
			}
			runningPERCENT := ((runningPERCENT * (loopCOUNT-1)) + correctPERCENT) / loopCOUNT
			SendLOG(scaleFACTORTEXT " - "runningPERCENT "% - "majorityVALUE " from "ocrONE "-" ocrONESUCCESS " " ocrTWO "-" ocrTWOSUCCESS " " ocrTHREE "-" ocrTHREESUCCESS " " ocrFOUR "-" ocrFOURSUCCESS " " ocrFIVE "-" ocrFIVESUCCESS)
		}
		scaleFACTORTEXT += 0.05
		FileDelete, <your user>\Desktop\scalefactor.txt
		FileAppend, %scaleFACTORTEXT%, <your user>\Desktop\scalefactor.txt
	}
My log output would look like this, the value of course I was OCRing was 180,000.21:
I also changed your script to use the text document value at line 2169:

Code: Select all

		  FileRead, tempSCALEFACTOR, <your user>\Desktop\scalefactor.txt
		  scaleFactor := tempSCALEFACTOR
iseahound
Posts: 1434
Joined: 13 Aug 2016, 21:04
Contact:

Re: Vis2 - OCR(), ImageIdentify()

16 May 2018, 18:53

Fantastic. This confirms what I've long suspected, small variances like 2.05x vs 2x give the best results. I'll implement a Lanczos (3-tap) image upscaler soon, while only using Leptonica for image binariazation. Lanczos3 uses a sinc(x) function and is commonly used to upscale SD to HD video. Windows has a built in Bi-cubic, but Lanczos is much better. Keep in mind that OCR still has a long way to go... but the best usage of OCR right now is to copy text on-screen when one of those websites prevents you from highlighting text.
num2005

Re: Vis2 - OCR(), ImageIdentify()

27 May 2018, 15:16

Hello,

Total noob here,

I really like your software BUT i need to copy the image text to my clipboard without selecting it with a mouse , the text is always at the same place.

so i read about this : text := OCR([0, 0, 430, 150]) BUT i have no idea where to put this? do I add it into the DEMo.ahk? or the Vis2.ahk and WHERE!?!
iseahound
Posts: 1434
Joined: 13 Aug 2016, 21:04
Contact:

Re: Vis2 - OCR(), ImageIdentify()

27 May 2018, 16:02

In demo.ahk write

OCR([0, 0, 430, 150]).clipboard()

there will be no pop-up saying "Saved to clipboard." But it is really saved to your clipboard.

AHK has clipboard variable as well so this works too

clipboard := OCR([0, 0, 430, 150])
User avatar
Cerberus
Posts: 172
Joined: 12 Jan 2016, 15:46

Re: Vis2 - OCR(), ImageIdentify()

28 May 2018, 15:22

renmacro wrote: Interesting! I did give Capture2Text a try and it seemed like that was a similar process. The Google Cloud Vision would be VERY expensive for what I am doing :shock:

For my use, setting it to 2.9 and then sampling 10 times and going with the majority result will work perfect for what I am doing. Great software, thanks again for making and sharing it!
This script is very interesting indeed! I'll try it for sure. Is it more accurate / better than Capture2Text, in your experience?
renmacro
Posts: 22
Joined: 05 Mar 2018, 23:30

Re: Vis2 - OCR(), ImageIdentify()

30 May 2018, 23:54

Cerberus wrote:
renmacro wrote: Interesting! I did give Capture2Text a try and it seemed like that was a similar process. The Google Cloud Vision would be VERY expensive for what I am doing :shock:

For my use, setting it to 2.9 and then sampling 10 times and going with the majority result will work perfect for what I am doing. Great software, thanks again for making and sharing it!
This script is very interesting indeed! I'll try it for sure. Is it more accurate / better than Capture2Text, in your experience?
I never used Capture2Text enough to really say, I didn't get past testing. With the consensus I can tweak it to be accurate enough to do the work I need
iseahound wrote:Fantastic. This confirms what I've long suspected, small variances like 2.05x vs 2x give the best results. I'll implement a Lanczos (3-tap) image upscaler soon, while only using Leptonica for image binariazation. Lanczos3 uses a sinc(x) function and is commonly used to upscale SD to HD video. Windows has a built in Bi-cubic, but Lanczos is much better. Keep in mind that OCR still has a long way to go... but the best usage of OCR right now is to copy text on-screen when one of those websites prevents you from highlighting text.
Is there any way to get it to read single digits? It seems to just return nothing when reading single digits?
iseahound
Posts: 1434
Joined: 13 Aug 2016, 21:04
Contact:

Re: Vis2 - OCR(), ImageIdentify()

31 May 2018, 02:16

You must set the "page segmentation mode" to "single char".

-psm 10

Old

Code: Select all

            static q := Chr(0x22)
            _cmd .= q this.tesseract q " --tessdata-dir " q fast q " " q in q " " q SubStr(out, 1, -4) q
            _cmd .= (this.language) ? " -l " q this.language q : ""
            _cmd := ComSpec " /C " q _cmd q
            RunWait % _cmd,, Hide
Modified (untested)

Code: Select all

            static q := Chr(0x22)
            _cmd .= q this.tesseract q " --tessdata-dir " q fast q " " q in q " " q SubStr(out, 1, -4) q
            _cmd .= (this.language) ? " -l " q this.language q : ""
            _cmd .= " -psm 10 "                                                                         ; ADD THIS
            _cmd := ComSpec " /C " q _cmd q
            RunWait % _cmd,, Hide
User avatar
Cerberus
Posts: 172
Joined: 12 Jan 2016, 15:46

Re: Vis2 - OCR(), ImageIdentify()

31 May 2018, 13:25

renmacro wrote:
Cerberus wrote:
renmacro wrote: I never used Capture2Text enough to really say, I didn't get past testing. With the consensus I can tweak it to be accurate enough to do the work I need
Ah, OK, neither did I. By consensus you mean, when using a method where several attempts agree about the result, Vis2 works well enough for your needs?
renmacro
Posts: 22
Joined: 05 Mar 2018, 23:30

Re: Vis2 - OCR(), ImageIdentify()

31 May 2018, 18:52

iseahound wrote:You must set the "page segmentation mode" to "single char".

-psm 10

Old

Code: Select all

            static q := Chr(0x22)
            _cmd .= q this.tesseract q " --tessdata-dir " q fast q " " q in q " " q SubStr(out, 1, -4) q
            _cmd .= (this.language) ? " -l " q this.language q : ""
            _cmd := ComSpec " /C " q _cmd q
            RunWait % _cmd,, Hide
Modified (untested)

Code: Select all

            static q := Chr(0x22)
            _cmd .= q this.tesseract q " --tessdata-dir " q fast q " " q in q " " q SubStr(out, 1, -4) q
            _cmd .= (this.language) ? " -l " q this.language q : ""
            _cmd .= " -psm 10 "                                                                         ; ADD THIS
            _cmd := ComSpec " /C " q _cmd q
            RunWait % _cmd,, Hide

Works amazingly, ty again!
renmacro
Posts: 22
Joined: 05 Mar 2018, 23:30

Re: Vis2 - OCR(), ImageIdentify()

31 May 2018, 18:54

iseahound wrote:You must set the "page segmentation mode" to "single char".

-psm 10

Old

Code: Select all

            static q := Chr(0x22)
            _cmd .= q this.tesseract q " --tessdata-dir " q fast q " " q in q " " q SubStr(out, 1, -4) q
            _cmd .= (this.language) ? " -l " q this.language q : ""
            _cmd := ComSpec " /C " q _cmd q
            RunWait % _cmd,, Hide
Modified (untested)

Code: Select all

            static q := Chr(0x22)
            _cmd .= q this.tesseract q " --tessdata-dir " q fast q " " q in q " " q SubStr(out, 1, -4) q
            _cmd .= (this.language) ? " -l " q this.language q : ""
            _cmd .= " -psm 10 "                                                                         ; ADD THIS
            _cmd := ComSpec " /C " q _cmd q
            RunWait % _cmd,, Hide
Works amazingly, ty again!
Cerberus wrote:
renmacro wrote:
Cerberus wrote:
renmacro wrote: I never used Capture2Text enough to really say, I didn't get past testing. With the consensus I can tweak it to be accurate enough to do the work I need
Ah, OK, neither did I. By consensus you mean, when using a method where several attempts agree about the result, Vis2 works well enough for your needs?
Yes, I make a loop that scans and records how many times each result comes up and goes with the most popular result across three different values.
renmacro
Posts: 22
Joined: 05 Mar 2018, 23:30

Re: Vis2 - OCR(), ImageIdentify()

31 May 2018, 18:55

iseahound wrote:You must set the "page segmentation mode" to "single char".

-psm 10

Old

Code: Select all

            static q := Chr(0x22)
            _cmd .= q this.tesseract q " --tessdata-dir " q fast q " " q in q " " q SubStr(out, 1, -4) q
            _cmd .= (this.language) ? " -l " q this.language q : ""
            _cmd := ComSpec " /C " q _cmd q
            RunWait % _cmd,, Hide
Modified (untested)

Code: Select all

            static q := Chr(0x22)
            _cmd .= q this.tesseract q " --tessdata-dir " q fast q " " q in q " " q SubStr(out, 1, -4) q
            _cmd .= (this.language) ? " -l " q this.language q : ""
            _cmd .= " -psm 10 "                                                                         ; ADD THIS
            _cmd := ComSpec " /C " q _cmd q
            RunWait % _cmd,, Hide
Works amazingly, ty again! What is your default?
Cerberus wrote:
renmacro wrote:
Cerberus wrote:
renmacro wrote: I never used Capture2Text enough to really say, I didn't get past testing. With the consensus I can tweak it to be accurate enough to do the work I need
Ah, OK, neither did I. By consensus you mean, when using a method where several attempts agree about the result, Vis2 works well enough for your needs?
Yes, I make a loop that scans and records how many times each result comes up and goes with the most popular result across three different scaling values. I'm using I think 2.05, 2.3 and 2.9, as they each had different... "personalities"
User avatar
Cerberus
Posts: 172
Joined: 12 Jan 2016, 15:46

Re: Vis2 - OCR(), ImageIdentify()

31 May 2018, 21:34

Sounds good!
Archandrion
Posts: 31
Joined: 26 May 2018, 22:23

Re: Vis2 - OCR(), ImageIdentify()

02 Jun 2018, 20:54

ISO6393LanguageCode := "chi_sim"
How do I change it so that fast is also used for TextToTranslate := OCR([TLX, TLY, BW, BH],ISO6393LanguageCode)

Saw that you stated only for the GUI but trying to use it as part of a script for OCR on subtitles in live video. With the GUI I noticed that keeping the right mouse button down after selecting the region did what I quickly doing OCR on the text when it showed up. In my script DefineBox(TLX, TLY, BRX, BRY, BW, BH) is used once to store the subtitle region and a couple of PixelSearches return when a new subtitle shows up.

How can I retrieve results just as fast as the GUI demo as that could possibly allow for some time to translate and update a separate GUI text variable displayed below the original. The script can detect the exact point the subtitle shows up so I don't really need to continuously monitor the area but just be very fast at doing the OCR when the subtitle is actually detected.
iseahound
Posts: 1434
Joined: 13 Aug 2016, 21:04
Contact:

Re: Vis2 - OCR(), ImageIdentify()

03 Jun 2018, 14:26

Archandrion wrote:How do I change it so that fast is also used for TextToTranslate := OCR([TLX, TLY, BW, BH],ISO6393LanguageCode)
At the moment it is not possible with Vis2. If you really want to do this, put the fast chi_sim.traineddata in the tessdata_best folder.

Archandrion wrote:Saw that you stated only for the GUI but trying to use it as part of a script for OCR on subtitles in live video. With the GUI I noticed that keeping the right mouse button down after selecting the region did what I quickly doing OCR on the text when it showed up. In my script DefineBox(TLX, TLY, BRX, BRY, BW, BH) is used once to store the subtitle region and a couple of PixelSearches return when a new subtitle shows up.
You can press Ctrl to enter advanced mode to save you from holding the mouse button down... I have considered adding a subtitle recording function but that would take a lot of time and effort on my part. You can access the text shown in the GUI subtitle box using Vis2.obj.data When I release the next version.

Archandrion wrote:How can I retrieve results just as fast as the GUI demo as that could possibly allow for some time to translate and update a separate GUI text variable displayed below the original. The script can detect the exact point the subtitle shows up so I don't really need to continuously monitor the area but just be very fast at doing the OCR when the subtitle is actually detected.
If real-time performance is not needed, you can save screenshots of the video and process them later. This is the best solution in my opinion.

Finally I do have many ideas regarding this project. However it is probably not the best use of my time. I do intend to release a version that is much faster, better, more features, without making it slower/different. But it may be a year before this happens.

Return to “Scripts and Functions (v1)”

Who is online

Users browsing this forum: No registered users and 120 guests