Vis2 - Image to Text OCR()

Post your working scripts, libraries and tools
Archandrion
Posts: 31
Joined: 26 May 2018, 22:23

Re: Vis2 - OCR(), ImageIdentify()

04 Jun 2018, 00:54

Thanks for the reply. Actually played around with it before seeing your reply but found that passing the internal dialogue variable at the point where it would normally be shown in the OCR language to the translator was still too slow for the speed of the subtitles in the video with the added time taken up for translation. There was a problem too in that some languages translate rather poorly without human input as there are too many phrases that have some cultural context which subtitle groups usually find a way to convey.

A tool that attempts to translate subtitles in real time for live video with OCR however imperfect would still be extremely useful. I've seen word lens but there does not appear to be a good desktop alternative. I'm looking into OpenCV, Tesseract and FFmpeg to do OCR on the video while it is playing with audio as the native imshow appears to be for video frames only but that's python so it's outside of the this forum's scope. Also another question probably not specific to the main function of Vis2 but is there a way to make a Vis2.Graphics.Subtitle.Render last only until it is called again, maybe some subtitle.destroy option, as just removing the duration amount just results in overlapping subtitles? Couldn't tell how long a particular subtitle would last beforehand so couldn't use the duration option.
iseahound
Posts: 272
Joined: 13 Aug 2016, 21:04
GitHub: iseahound

Re: Vis2 - OCR(), ImageIdentify()

04 Jun 2018, 02:31

Just curious are you subtitle ripping anime? I'm assuming your source is hard coded which is quite rare these days. There's a few points I'd like to make:

1) The input image is increased by 3.5x during the preprocessing step. Some users like you want speed, and others want accuracy, so this is an okay balance for small fonts. Your hardcoded subtitle should be quite large, so if you control F the script and search 3.5, you can decrease it to 2, or even just 1. This should decrease processing times by 5x.

2) Using Python is not outside of Vis2's scope. I'd thought about adding OpenCV, but that requires my user to install an 138+ MB binary. On the bright side, maximally stable external regions can be detected with speed for full screen OCR.

3) Vis2.Graphics.Subtitle.Render has been released separately on this forum. https://autohotkey.com/boards/viewtopic.php?t=36384 (Check that for documentation.) First a subtitle object is initialized Vis2.obj.Subtitle := new Vis2.Graphics.Subtitle(). Then each time .Render() is called, what was previously rendered to the screen is overwritten. .Destroy() is a valid method, but it completely destroys the object, and doesn't clear it off the screen. Perhaps you are looking for .Hide() and its siblings .Show() and .ToggleVisible()? You can also set a time parameter to have the subtitle time out and self-destruct as in Vis2.Graphics.Subtitle.Render("This will last 5 seconds", "time: 5000")
In case you were wondering, I just copy and pasted the Subtitle class, so both versions are the same.
Archandrion
Posts: 31
Joined: 26 May 2018, 22:23

Re: Vis2 - OCR(), ImageIdentify()

04 Jun 2018, 08:25

To answer your question in part yes it is animation although not Japanese but the Chinese variety. As unable to tell the duration before hand and it being very variable can't really use the time option. With a GUI for example GuiControl,,Var, %TranslatedText% basically does the needed function. I am trying to make the following work correctly:

Code: [Select all] [Download] GeSHi © Codebox Plus

                   
if !(bypass)
Vis2.obj.Subtitle.Render(Vis2.obj.dialogue, Vis2.obj.style1_back, Vis2.obj.style1_text)
TranslatedSubtitle.Render(A_TickCount, "xCenter y760" Vis2.obj.style1_back, Vis2.obj.style1_text)



A_TickCount would be replaced by the actual translated text but for this example it shows the overlay when OCR() is called from the Demo in the GUI interactive mode. Where would I need to place TranslatedSubtitle := new Vis2.Graphics.Subtitle() in order to have the render update rather than overlay?
renmacro
Posts: 20
Joined: 05 Mar 2018, 23:30

Re: Vis2 - OCR(), ImageIdentify()

04 Jun 2018, 08:55

So I ended up making a tesseract training file as it was now messing up 1/2/7's, and even with majority consensus it still wouldn't come at the correct number. And yes, training tesseract was as bad as people make it out to be =\. But the trained one would now fail 0/6/8/9's.

I then changed my loop to run verifications with my trainingdata for half and the original eng_best for half and now it seems to be very good for what I'm trying to do. My next step was to make a font out of screenshots to vectors to train tesseract and see how that went. I hope my next post here isn't me doing that...
Last edited by renmacro on 04 Jun 2018, 11:42, edited 1 time in total.
iseahound
Posts: 272
Joined: 13 Aug 2016, 21:04
GitHub: iseahound

Re: Vis2 - OCR(), ImageIdentify()

04 Jun 2018, 09:50

You can place it with the other constructors. It's in a function called start(), under the line Vis2.obj.Subtitle := New Vis2.Graphics.Subtitle()

You should hit up the anime scene. They OCR subtitles from TV rips, but since it's a specialized process they might not have released their code on github.

@renmacro if your text is fixed width and height and numbers/letters only it might be time to pick up tensorflow.
Should be able to get >95% easy.
Archandrion
Posts: 31
Joined: 26 May 2018, 22:23

Re: Vis2 - OCR(), ImageIdentify()

04 Jun 2018, 21:21

iseahound wrote:You can place it with the other constructors. It's in a function called start(), under the line Vis2.obj.Subtitle := New Vis2.Graphics.Subtitle()


Thanks for the help, it worked.
euras
Posts: 335
Joined: 05 Nov 2015, 12:56

Re: Vis2 - OCR(), ImageIdentify()

23 Jun 2018, 10:03

iseahound wrote:You can place it with the other constructors. It's in a function called start(), under the line Vis2.obj.Subtitle := New Vis2.Graphics.Subtitle()


hi iseahound, wonderfull tool first of all! I try to understand a couple of things here.
first: how to add both language I want to use and coordinates of the screen? I have tried this way, but it doesn't work:

Code: [Select all] [Download] GeSHi © Codebox Plus

txt := OCR([232, 411, 640, 40, "nor"]).clipboard()

second: does coordinates method shares the same functions as mouseclickdrag method? because when I use coordinates mode to get the text, I get the incorrect text translation, but when I use mouseclickdrag method, then the text is converted without errors...

Code: [Select all] [Download] GeSHi © Codebox Plus

MsgBox % OCR("https://i.stack.imgur.com/sFPWe.png", , [0,330,999,400])

I get: prown dog jumped over the lazy Tox.

But that coordinate mode will be much more efficient if it works perfect...

I'm using demo.ahk file
iseahound
Posts: 272
Joined: 13 Aug 2016, 21:04
GitHub: iseahound

Re: Vis2 - OCR(), ImageIdentify()

24 Jun 2018, 23:47

Which link?

Code: [Select all] [Download] GeSHi © Codebox Plus

txt := OCR([232, 411, 640, 40], "nor").clipboard()
r2997790
Posts: 22
Joined: 02 Feb 2017, 02:46

Re: Vis2 - OCR(), ImageIdentify()

15 Aug 2018, 07:24

This script continues to be an enormous help to me. Thank you +Isahound!

I used to to 'read' text in a foreign language (and characters) and copy the content to the clipboard [as I'd never have any hope of finding the same character keys on the keyboard. It's a work of genius.

Question:

Photoshop has some evil pointer handling (eg, when you use different tools it forces a pointer change etc) and quite often it makes the OCR Vis2 script play up.

Is there any way to make the script play nicer / force override the Photoshop toolset which seems to not always play nice with it.

Thanks.
R
iseahound
Posts: 272
Joined: 13 Aug 2016, 21:04
GitHub: iseahound

Re: Vis2 - OCR(), ImageIdentify()

15 Aug 2018, 09:15

https://superuser.com/questions/1031356 ... se-pointer

To summarize, it looks like Photoshop is hiding the windows system cursor, and drawing its own! If I tried to fix it, it would probably only flicker more. (In fact, I only change the cursor once during the initial hotkey press. So I'm not even sure why the cursor flickers, maybe Photoshop is having trouble overriding the system cursor.)

If it's really annoying, you can just comment out the cursor change Vis2.stdlib.setSystemCursor(32515) with a semicolon.
paulpma
Posts: 6
Joined: 08 Sep 2018, 22:05

Re: Vis2 - OCR(), ImageIdentify()

08 Sep 2018, 22:18

Thank you iseahound very useful script. I have been trying to implemented to speed up the process of certain tasks. I ran into trouble OCR certain parts of my screen. I need to read certain numbers and script fails. I have cropped out the numbers that need reading:
30.jpg
30.jpg (487 Bytes) Viewed 167 times
and
17.jpg
17.jpg (489 Bytes) Viewed 167 times


I have tried modifying static scaleFactor to 4 and it works for 17.jpg, but not 30.jpg. I went as far as scale factor of 20 and it still didn't work for 30.jpg

This is my testing code:

Code: [Select all] [Download] GeSHi © Codebox Plus

text := OCR("30.jpg")
MsgBox % text


Any suggestions are appreciated. Thank you.

Paul.
iseahound
Posts: 272
Joined: 13 Aug 2016, 21:04
GitHub: iseahound

Re: Vis2 - OCR(), ImageIdentify()

09 Sep 2018, 09:28

If you look closely at your number 30, there is a line or "artifact" between the 3 and the 0.
paulpma
Posts: 6
Joined: 08 Sep 2018, 22:05

Re: Vis2 - OCR(), ImageIdentify()

09 Sep 2018, 23:23

5.jpg
5.jpg (393 Bytes) Viewed 129 times


I have tried with single digits as well. No luck here. I have no control of image size or the background, but they stay constant same size and same color.

Thank you in advance.
iseahound
Posts: 272
Joined: 13 Aug 2016, 21:04
GitHub: iseahound

Re: Vis2 - OCR(), ImageIdentify()

10 Sep 2018, 06:56

Set the page segmentation mode to single character. I believe this question has been asked before. You will need to modify how the script calls Tesseract. Page segmentation is set to multi line by default, detecting two or more characters. Single characters tend to give more false positives.

Also you may find that Vis2 is not the solution for your workflow. Please consider using Google's Cloud Vision API or a similar paid service. Vis2 is designed for users who may occasionally copy text off the screen, such as graphic designers and manga translation. For large scale optical recognition, you may consider customizing the open-source Tesseract project to suit your needs.
carno
Posts: 120
Joined: 20 Jun 2014, 16:48

Re: Vis2 - OCR(), ImageIdentify()

12 Sep 2018, 05:42

Could this work with Japanese characters?
iseahound
Posts: 272
Joined: 13 Aug 2016, 21:04
GitHub: iseahound

Re: Vis2 - OCR(), ImageIdentify()

12 Sep 2018, 07:00

Yes!
paulpma
Posts: 6
Joined: 08 Sep 2018, 22:05

Re: Vis2 - OCR(), ImageIdentify()

12 Sep 2018, 08:48

Thank you, iseahound. I have set segmentation mode to single character and it works now on all examples from above. Thank you for suggesting using other services, but so far VIS is the best solution that I have found and I can implement.
paulpma
Posts: 6
Joined: 08 Sep 2018, 22:05

Re: Vis2 - OCR(), ImageIdentify()

12 Sep 2018, 12:31

Iseahound, To optimize the use of Vis2 when OCR() is being called multiple times (10x+) on the same screen, with specific areas of the screen. With small amount of text and a lot of calls Vis2 can be slow. I don't know how Vis2 works in background, but I imagine that for each OCR() call it takes a screenshot, crops to specified area and then process image via Tesseract. Then it does the same thing for next OCR() call, screenshot capture,crop, send. I could be wrong here, I am not sure. I was thinking of two options to speed up the process, maybe there are better options.

1. via ahk take screenshot=screen.jpg, temp save, use screen.jpg to do ocr([x,y,w,h]) x 10 times on different areas of screen.jpg, then delete temp screen.jpg

2. I am not sure if this is supported by ocr(), but is to create a multidimensional array with ocr(), so the function would be ocr([[x,y,w,h][x,y,w,h][x,y,w,h][x,y,w,h][x,y,w,h][x,y,w,h][x,y,w,h]]) etc. I am not sure about this.

Maybe there is better method? Once again Thank you!.

Paul.
iseahound
Posts: 272
Joined: 13 Aug 2016, 21:04
GitHub: iseahound

Re: Vis2 - OCR(), ImageIdentify()

12 Sep 2018, 18:39

Hey paulpma, taking a screenshot is very fast. The slow part is actually the OCR engine. I think if you reduce the scale factor to 2x then the image that is sent to Tesseract will be smaller and thus faster. Since your image is already so small, this would not be a big improvement.

Also, there are two folders in the download, tessdata_fast and tessdata_best. Replace the eng.traineddata in tessdata_best with the one in tessdata_fast. That should speed it up.

Also try adding this line below:

Code: [Select all] [Download] GeSHi © Codebox Plus

            static q := Chr(0x22)
_cmd .= q this.tesseract q " --tessdata-dir " q fast q " " q in q " " q SubStr(out, 1, -4) q
_cmd .= (this.language) ? " -l " q this.language q : ""
_cmd .= " -c tessedit_char_whitelist=" q "0123456789-." q ; THIS LINE
_cmd := ComSpec " /C " q _cmd q
RunWait % _cmd,, Hide


It should whitelist the digits only, and ignore English characters. I don't know if that will make it faster, but it will make it more accurate.
paulpma
Posts: 6
Joined: 08 Sep 2018, 22:05

Re: Vis2 - Image to Text OCR()

15 Sep 2018, 00:29

Dear iseahound,

Thank you for your reply. Your suggestion of whitelist digits made it faster. However, I was thinking that it would work only with digits, but apparently it still reads English characters. This is good, because not all of my fields are numbers. Also, I will try to whitelist letters, since i don't use special characters and see what will happen. Bad idea?

For me accuracy is more important than speed, so I will stay with tessaract best, and I have changed to a factor that reads most accurate, so I will stay that way.

Thanks again. I will post updates

Return to “Scripts and Functions”

Who is online

Users browsing this forum: No registered users and 20 guests