get text from .docx files

Get help with using AutoHotkey and its commands and hotkeys
euras
Posts: 249
Joined: 05 Nov 2015, 12:56

get text from .docx files

19 May 2017, 03:04

let's say there is many .docx files. I want to read them With AHK and get the text from there and put the text somewhere else (let's keep the text as variable). How to do that? I searched for the answer, but no'one Works for me.

Maybe is it possible to convert .doc file into temporary .txt file and read the content from .txt file? Or is it any other solution that can work without extra Libraries?

Code: [Select all] [Download] GeSHi © Codebox Plus

Loop, read, http://team/Administrasjon/`%20tasks/`%20rapporter.docx?Web=1
last_line := A_LoopReadLine
MsgBox %last_line% ; gives nothing
return
euras
Posts: 249
Joined: 05 Nov 2015, 12:56

Re: get text from .docx files

19 May 2017, 05:32

I tried this code. If the .doc file is in my C disk, then everything Works fine, but if the .doc file is in external Directory like http:// then I get an error Message and the code doesn't work... why? the patch looks good...

Code: [Select all] [Download] GeSHi © Codebox Plus

MsgBox, % ComObjGet("C:\Users\Desktop\test.docx").Range.Text ; this one gives the text in Word file
MsgBox, % ComObjGet("http://team/Administrasjon/`%20til/`%20rapporter.docx").Range.Text ; this one gives an error Message "fail syntax in ComObjGet line"
IMEime
Posts: 411
Joined: 20 Sep 2014, 06:15

Re: get text from .docx files

19 May 2017, 05:48

If you use COM style, you have to have MS Office here and there.
euras
Posts: 249
Joined: 05 Nov 2015, 12:56

Re: get text from .docx files

19 May 2017, 06:01

IMEime wrote:If you use COM style, you have to have MS Office here and there.


so it means that the .doc file can be placed in external Directory and can be opened from there, but there needs to be installed MS Office to use COM style?
euras
Posts: 249
Joined: 05 Nov 2015, 12:56

Re: get text from .docx files

19 May 2017, 06:07

if I open the Word file from external Directory and try to run this code, it doesn't work either... :/

Code: [Select all] [Download] GeSHi © Codebox Plus

WordDoc := ComObjActive("Word.Application")
MsgBox, % WordDoc.Range.text
User avatar
Blackholyman
Posts: 1236
Joined: 29 Sep 2013, 22:57
Facebook: socialjsz
Google: +Jszapp
Location: Denmark
Contact:

Re: get text from .docx files

19 May 2017, 06:18

the ComObjGet gets the document object but ComObjActive gets the word window object so you need to tell it the document to get the range from

Code: [Select all] [Download] GeSHi © Codebox Plus

F2::
oWord := ComObjActive("Word.Application")
msgbox % oWord.ActiveDocument.range.text
return
euras
Posts: 249
Joined: 05 Nov 2015, 12:56

Re: get text from .docx files

19 May 2017, 07:14

:HeHe:
Blackholyman wrote:the ComObjGet gets the document object but ComObjActive gets the word window object so you need to tell it the document to get the range from

Code: [Select all] [Download] GeSHi © Codebox Plus

F2::
oWord := ComObjActive("Word.Application")
msgbox % oWord.ActiveDocument.range.text
return



thank you, it Works, but now the code is very lame. I need to set sleep on almost 10 Seconds, and get a Word document visible before I can read it... Is there any other solutions how I can avoid it?
my code now:

Code: [Select all] [Download] GeSHi © Codebox Plus

runwait, http://team/`%20rapporter.docx
sleep 10000
oWord := ComObjActive("Word.Application")
msgbox % oWord.ActiveDocument.range.text
return


I want to have something like this (doesn't work...)

Code: [Select all] [Download] GeSHi © Codebox Plus

oWord := ComObjCreate("Word.Application")
oWord.Visible := false
oWord.Navigate("http://team/`%20rapporter.docx")
msgbox % oWord.ActiveDocument.range.text
Guest

Re: get text from .docx files

19 May 2017, 09:30

COM FREE

NON AHK

Not AutoHotkey but if you have perl you can use this script to get text from DOCX files https://github.com/remonk/linuxsleuthin ... pen_xml.pl (source here is not the author)

If you have DOC files (the older Office files) you can get the text using a free utility called antiword http://www.winfield.demon.nl/

AHK

There is a script in the scripts section that uses a similar technique as the Perl script but I can't find it atm.
A Docx file is just a bunch of zipped files, and you can use AutoHotkey to unzip it, you need "wordfile.docx\word\document.xml" after you have the document.xml just use regex to get the text by stripping all "tags"

you're welcome :D
IMEime
Posts: 411
Joined: 20 Sep 2014, 06:15

Re: get text from .docx files

19 May 2017, 09:44

For .docx format "COM FREE" method.
They say it is a "Office Open XML" style.

Introducing the Office (2007) Open XML File Formats
https://msdn.microsoft.com/en-us/library/aa338205(v=office.12).aspx

It looks like very easy to use, because it is a plain txt file.
But, I'd rather recommend you not to use it ever.
It is simply waste of precious time.

If it is a .xlsx format, Open XML could be somewhat useful.
...and the perl script, it is too simple to use. The Open XML for Word is very complex one.
Guest

Re: get text from .docx files

19 May 2017, 10:07

@IMEime I've been using that perl script for years, works like a charm for me, same for antiword, especially in batch files they work very very fast. But if you don't like it that is fine of course :D
IMEime
Posts: 411
Joined: 20 Sep 2014, 06:15

Re: get text from .docx files

19 May 2017, 11:38

perl ?
Good for you.

If you want to talk about it any further, say it with AHK.

Regards
Guest

Re: get text from .docx files

19 May 2017, 13:53

@IMEime If you insist, I knew it was you who made that AHK script, read the XML, strip the tags using regex, https://autohotkey.com/boards/viewtopic.php?f=6&t=29423 :thumbdown: :salute: :wave: :bravo: :dance: :rainbow: :shh: :D

Return to “Ask For Help”

Who is online

Users browsing this forum: Bing [Bot], kakashi, RozRoyal, snowmind and 57 guests