Page 1 of 9

Easy OCR

Posted: 23 Apr 2023, 23:18
by Descolada
This is a UWP OCR library in AHK v2. No extra installations needed (except perhaps Windows language packs)
Credit to malcev's work who's OCR function I heavily relied on, and special thanks to feiyue whose FindText library has been on great help.


The library is available here: https://github.com/Descolada/OCR

Some examples

Example 1
Displays all text found on the desktop, then highlights the results line by line.

Code: Select all

#Requires AutoHotkey v2
#include OCR.ahk

result := OCR.FromDesktop()
MsgBox "All text from desktop: `n" result.Text

MsgBox "Press OK to highlight all found lines for 3 seconds."
for line in result.Lines
    result.Highlight(line, -3000)
ExitApp
Example 2
Finds some text in Notepad and selects it with MouseClickDrag.

Code: Select all

#include OCR.ahk

Run "notepad.exe"
WinWaitActive "ahk_exe notepad.exe"
Send "Lorem ipsum "
Sleep 40

result := OCR.FromWindow("A",,2)
try found := result.FindString("Lorem")
if !IsSet(found) {
    MsgBox '"Lorem" was not found in Notepad!'
    ExitApp
}

result.Highlight(found)

CoordMode "Mouse", "Window"
MouseClickDrag("Left", found.x, found.y, found.x + found.w, found.y + found.h)
Example 3
Reads text from under the cursor and displays it in real time.

Code: Select all

#Requires AutoHotkey v2
#include OCR.ahk

CoordMode "Mouse", "Screen"
CoordMode "ToolTip", "Screen"

DllCall("SetThreadDpiAwarenessContext", "ptr", -3) ; Needed for multi-monitor setups with differing DPIs

global w := 150, h := 50, minsize := 5, step := 3
Loop {
    MouseGetPos(&x, &y)
    Highlight(x-w//2, y-h//2, w, h)
    ToolTip(OCR.FromRect(x-w//2, y-h//2, w, h, "en-us").Text, , y+h//2+10)
}

Right::global w+=step
Left::global w-=(w < minsize ? 0 : step)
Up::global h+=step
Down::global h-=(h < minsize ? 0 : step)

Highlight(x?, y?, w?, h?, showTime:=0, color:="Red", d:=2) {
	static guis := []

	if !IsSet(x) {
        for _, r in guis
            r.Destroy()
        guis := []
		return
    }
    if !guis.Length {
        Loop 4
            guis.Push(Gui("+AlwaysOnTop -Caption +ToolWindow -DPIScale +E0x08000000"))
    }
	Loop 4 {
		i:=A_Index
		, x1:=(i=2 ? x+w : x-d)
		, y1:=(i=3 ? y+h : y-d)
		, w1:=(i=1 or i=3 ? w+2*d : d)
		, h1:=(i=2 or i=4 ? h+2*d : d)
		guis[i].BackColor := color
		guis[i].Show("NA x" . x1 . " y" . y1 . " w" . w1 . " h" . h1)
	}
	if showTime > 0 {
		Sleep(showTime)
		Highlight()
	} else if showTime < 0
		SetTimer(Highlight, -Abs(showTime))
}
Example 4
Tries to find a search phrase from the active window.

Code: Select all

#Requires AutoHotkey v2
#include OCR.ahk

CoordMode "Mouse", "Window"
Loop {
    ib := InputBox("Insert search phrase to find from active window: ", "OCR")
    Sleep 100 ; Small delay to wait for the InputBox to close
    if ib.Result != "OK"
        ExitApp
    result := OCR.FromWindow("A",,2)
    try found := result.FindString(ib.Value)
    catch {
        MsgBox 'Phrase "' ib.Value '" not found!'
        continue
    }
    ; MouseMove is set to CoordMode Window, so no coordinate conversion necessary
    MouseMove found.x, found.y
    result.Highlight(found)
    break
}
Example 5
Shows how to wait for text, search keywords in the results object, and click results.

Code: Select all

#Requires AutoHotkey v2
#include OCR.ahk

Run "https://www.w3schools.com/tags/att_input_type_checkbox.asp"
WinWaitActive "HTML input type",,10
if !WinActive("HTML input type") {
    MsgBox "Failed to find test window!"
    ExitApp
}

; Wait for text "Yourself" to appear, case-insensitive search, indefinite wait. Search only the active window.
result := OCR.WaitText("Yourself",, OCR.FromWindow.Bind(OCR, "A"))
; Find the Word for "Yourself" in the result, and click it.
result.Click(result.FindString("Yourself"))
; Wait for text to appear, that matches RegExMatch with needle "I have a bike(\s|$)". 
; RegEx matching is used here to accept either a space at the end or the end of string, because
; it might be in the middle of the found text or at the end.
; Search only the active window.
result := OCR.WaitText("I have a bike(\s|$)",, OCR.FromWindow.Bind(OCR,"A"),,RegExMatch)
; Here we don't have to use RegEx, because the string will be split by spaces and compared word-by-word.
result.Click(result.FindString("I have a bike"))
If you have any suggestions or comments about what should be changed or improved, please leave a comment. Also feel free to create Pull requests in GitHub.
There are probably multiple improvements to be made in the FromWindow method, since I'm not too familiar with Gdi+...

Edit history:

Code: Select all

13.08.2024: updated examples

Re: Easy OCR alpha

Posted: 24 Apr 2023, 16:50
by flyingDman
Thank you @Descolada I am glad that OCR script made to V2. You gave us that and then some... Great work :thumbup:
So to get the text with the original line breaks you would do something like:

Code: Select all

	for x,y in OCR.FromRect(X1, Y1, W1, H1, "en").Lines
		res .= y.text "`n"
	msgbox res
rather than:

Code: Select all

msgbox OCR.FromRect(X1, Y1, W1, H1, "en").Text
?
Or is there a better way?

Re: Easy OCR alpha

Posted: 24 Apr 2023, 18:44
by pedro45_vs
This is Amazing :dance: It's great to see so many useful tools with v2

Re: Easy OCR alpha

Posted: 24 Apr 2023, 23:01
by Descolada
@flyingDman, that is exactly correct. And to enumerate words, either loop over Result.Words for all words, or Line.Words for only a certain line.

Re: Easy OCR alpha

Posted: 03 May 2023, 12:56
by FanaticGuru
OCR appears to have a limit that the image cannot be smaller than 40 pixels in any dimension whether as a bitmap or file.

Is this a limitation of the base API?

If so, could an option to fill smaller images with white space or scale up to the minimum be added to FromFile and FromBitmap?

FG

Re: Easy OCR alpha

Posted: 04 May 2023, 23:34
by Descolada
@FanaticGuru, yes, this appears to be a limitation of the base Windows.Media.Ocr. I've implemented a fix for FromBitmap so the bitmap is resized to at least 40x40 if necessary, and filled with a solid background color. Changing the size of a RandomAccessStream proved to be too difficult for me, so I added a fix to scale the stream appropriately (though this slows down the script noticeably). Perhaps somebody knows how to enlarge the bounds of a RandomAccessStream? Or convert it to hBitmap?

Re: Easy OCR alpha

Posted: 05 May 2023, 03:17
by malcev
You can try like this:

Code: Select all

ScaledWidth := 100
ScaledHeight := 100

CreateClass("Windows.Graphics.Imaging.BitmapTransform",, BitmapTransform)
DllCall(NumGet(NumGet(BitmapTransform+0)+7*A_PtrSize), "ptr", BitmapTransform, "int", ScaledWidth)   ; put_ScaledWidth
DllCall(NumGet(NumGet(BitmapTransform+0)+9*A_PtrSize), "ptr", BitmapTransform, "int", ScaledHeight)   ; put_ScaledHeight

DllCall(NumGet(NumGet(BitmapFrame+0)+8*A_PtrSize), "ptr", BitmapFrame, "uint*", BitmapPixelFormat)   ; get_BitmapPixelFormat
DllCall(NumGet(NumGet(BitmapFrame+0)+9*A_PtrSize), "ptr", BitmapFrame, "uint*", BitmapAlphaMode)   ; get_BitmapAlphaMode
DllCall(NumGet(NumGet(BitmapFrameWithSoftwareBitmap+0)+8*A_PtrSize), "ptr", BitmapFrameWithSoftwareBitmap, "uint", BitmapPixelFormat, "uint", BitmapAlphaMode, "ptr", BitmapTransform, "uint", IgnoreExifOrientation := 0, "uint", DoNotColorManage := 0, "ptr*", SoftwareBitmap)   ; GetSoftwareBitmapTransformedAsync

CreateClass(string, interface := "", ByRef Class := "")
{
   CreateHString(string, hString)
   if (interface = "")
      result := DllCall("Combase.dll\RoActivateInstance", "ptr", hString, "ptr*", Class, "uint")
   else
   {
      VarSetCapacity(GUID, 16)
      DllCall("ole32\CLSIDFromString", "wstr", interface, "ptr", &GUID)
      result := DllCall("Combase.dll\RoGetActivationFactory", "ptr", hString, "ptr", &GUID, "ptr*", Class, "uint")
   }
   if (result != 0)
   {
      if (result = 0x80004002)
         msgbox No such interface supported
      else if (result = 0x80040154)
         msgbox Class not registered
      else
         msgbox error: %result%
      ExitApp
   }
   DeleteHString(hString)
}

Re: Easy OCR alpha

Posted: 05 May 2023, 08:14
by Descolada
@malcev, thanks, that works much more smoothly. I totally missed that GetSoftwareBitmapAsync has such an overloaded method :P

Re: Easy OCR alpha

Posted: 05 May 2023, 12:58
by malcev
You are welcome.
I think that instead of this

Code: Select all

 this.BitmapEncoderStatics := OCR.CreateClass("Windows.Graphics.Imaging.BitmapEncoder", IBitmapEncoderStatics := "{A74356A7-A4E4-4EB9-8E40-564DE7E1CCB2}")
You can write

Code: Select all

this.BitmapTransform := OCR.CreateClass("Windows.Graphics.Imaging.BitmapTransform")
and use it as static.

Re: Easy OCR alpha

Posted: 05 May 2023, 15:47
by FanaticGuru
Descolada wrote:
04 May 2023, 23:34
@FanaticGuru, yes, this appears to be a limitation of the base Windows.Media.Ocr. I've implemented a fix for FromBitmap so the bitmap is resized to at least 40x40 if necessary, and filled with a solid background color. Changing the size of a RandomAccessStream proved to be too difficult for me, so I added a fix to scale the stream appropriately (though this slows down the script noticeably). Perhaps somebody knows how to enlarge the bounds of a RandomAccessStream? Or convert it to hBitmap?

Seems to have fixed the problem well. I just snipped some 6 point text that I could barely read on my 3840x2160 monitor into an 18 pixel high image, and it OCR to text fine.

Thanks for providing this native OCR for AHK v2.

FG

Re: Easy OCR

Posted: 12 Jul 2023, 07:42
by Krd
How to limit the search area to part of a window?
For example to limit the search to 1600, 200, 1800, 600.

Code: Select all

    result := OCR.FromWindow("A",,2)
    try found := result.FindString('Search Word')
    catch
		{
        MsgBox 'No match'
		}
    Click found.x, found.y
    break

Re: Easy OCR

Posted: 12 Jul 2023, 10:28
by Descolada
@Krd, one way would be to use FromRect and provide coordinates calculated from WinGetPos.

Another way would be to crop the result object with something like this:

Code: Select all

result := CropResult(OCR.FromDesktop(), 0, 0, 500, 80)
for line in result.Lines
    ToolTip(line.Text), result.Highlight(line)

CropResult(result, x, y, w, h) {
    result := result.Clone()
    croppedLines := [], croppedWords := [], text := ""
    for line in result.Lines {
        croppedWords := [], lineText := ""
        for word in line.Words {
            if word.x >= x && word.y >= y && (word.x+word.w) <= (x+w) && (word.y+word.h) <= (y+h)
                croppedWords.Push(word), lineText .= word.Text " ", ObjAddRef(word.ptr)
        }
        if croppedWords.Length {
            line := {Text:Trim(lineText), Words:croppedWords}
            line.base.__Class := "OCR.OCRLine"
            croppedLines.Push(line)
            text .= lineText
        }
    }
    result.DefineProp("Lines", {Value:croppedLines})
    result.DefineProp("Text", {Value:Trim(text)})
    result.DefineProp("Words", OCR.Prototype.GetOwnPropDesc("Words"))
    return result
}
EDIT: I added a searchArea argument to FindString, so now you can use try found := result.FindString('Search Word',,,,{x1:1600, y1:200, x2:1800, y2:600}).

Re: Easy OCR

Posted: 12 Jul 2023, 11:00
by Krd
Thank you for the reply.

The image search is faster as this is more complicated and slow.

Are there any plans to develop a simpler solution similar to FromWindow in the future? :D

Re: Easy OCR

Posted: 12 Jul 2023, 11:17
by Descolada
@Krd, I added a searchArea parameter to FindString so you can contain the search to a specified area relative to the result object (in this case, relative to the window). That solution should be faster, but using FromRect should be faster still (unless you are doing multiple searches in multiple areas)...

Re: Easy OCR

Posted: 12 Jul 2023, 11:40
by Krd
Edit:

I notice that it skips the word I'm searching for, which could be a limitation due to the minimum size requirement.

Re: Easy OCR

Posted: 12 Jul 2023, 11:55
by Descolada
@Krd, x1 and y1 should be the coordinates to top left corner of the search area, and x2 and y2 coordinates for bottom right corner. This means that for example {x1:0,y1:0,x2:200,y2:200} would only look in the top left corner of the window with 200x200 size. Also as I said, the coordinates are relative to the window, not to the screen. {x1:1600, y1:900, x2:1800, y2:900} defines a line, not a rectangular search area (if y1 and y2 are the same!).

Re: Easy OCR

Posted: 12 Jul 2023, 12:01
by Krd
My bad. Yes that is what I wanted to use here. But still it jumps over the match... By jumping I mean that it actually finds it but skips without click and goes to catch.

But this is a fantastic tool. I use is a lot when I just have only one match for my strings! Much better than any other ways I know of. :)

So many edits. Is it possible to search from bottom right instead in FromWindow?

Deskolada, let's move on until it receives an update when and if you have time to add more features. I hope for something as simple as the one similar to FromWindow. :)


One more thing: How can I use this, similar to VIS2, to press a hotkey and then drag a rectangle for OCR to text?

Thank you for the replies, much appreciated!

Re: Easy OCR

Posted: 12 Jul 2023, 12:02
by RaptorX
Awesome library, will take a look later on. :)

Re: Easy OCR

Posted: 12 Jul 2023, 12:23
by Descolada
@Krd, I think Snipper with the OCR extension does what you want.
You should remove the "try-catch" block to figure out what exactly is going wrong: perhaps the error message will help you out. I think this will not get easier than FromWindow, because what you are asking inherently requires some thinking and understanding relative coordinates. The same applies to for example ImageSearch where you need to think about whether you are searching relative to the window, client, or screen.

Thanks RaptorX, it was fun to implement too :)

Re: Easy OCR

Posted: 13 Jul 2023, 02:44
by Krd
Since I already included Easy OCR into my scripts, it would be more preferable to utilize it directly instead of using Snipper, which I currently use to run when needed. But I will ask FG anyway. :)

I would do more testing on the other thing to see if my pro skills can get it working.
Regarding the recent code adjustment you made, do I still need to incorporate portions of the previously posted code by me?
Please note that I lack knowledge in many aspects, so please provide thorough examples. :D

When you say that it makes me think about how lucky we are to have built-in libraries. :)