Page 1 of 1

get text from .docx files

Posted: 19 May 2017, 03:04
by euras
let's say there is many .docx files. I want to read them With AHK and get the text from there and put the text somewhere else (let's keep the text as variable). How to do that? I searched for the answer, but no'one Works for me.

Maybe is it possible to convert .doc file into temporary .txt file and read the content from .txt file? Or is it any other solution that can work without extra Libraries?

Code: Select all

Loop, read, http://team/Administrasjon/`%20tasks/`%20rapporter.docx?Web=1
	last_line := A_LoopReadLine
MsgBox %last_line% ; gives nothing
return

Re: get text from .docx files

Posted: 19 May 2017, 04:55
by Guest
You need COM - see simple example here
https://autohotkey.com/board/topic/7338 ... ntry494436

Re: get text from .docx files

Posted: 19 May 2017, 05:32
by euras
I tried this code. If the .doc file is in my C disk, then everything Works fine, but if the .doc file is in external Directory like http:// then I get an error Message and the code doesn't work... why? the patch looks good...

Code: Select all

MsgBox, % ComObjGet("C:\Users\Desktop\test.docx").Range.Text ; this one gives the text in Word file
MsgBox, % ComObjGet("http://team/Administrasjon/`%20til/`%20rapporter.docx").Range.Text ; this one gives an error Message "fail syntax in ComObjGet line"

Re: get text from .docx files

Posted: 19 May 2017, 05:48
by IMEime
If you use COM style, you have to have MS Office here and there.

Re: get text from .docx files

Posted: 19 May 2017, 06:01
by euras
IMEime wrote:If you use COM style, you have to have MS Office here and there.
so it means that the .doc file can be placed in external Directory and can be opened from there, but there needs to be installed MS Office to use COM style?

Re: get text from .docx files

Posted: 19 May 2017, 06:07
by euras
if I open the Word file from external Directory and try to run this code, it doesn't work either... :/

Code: Select all

WordDoc := ComObjActive("Word.Application")
MsgBox, % WordDoc.Range.text

Re: get text from .docx files

Posted: 19 May 2017, 06:18
by Blackholyman
the ComObjGet gets the document object but ComObjActive gets the word window object so you need to tell it the document to get the range from

Code: Select all

F2::
oWord := ComObjActive("Word.Application")
msgbox % oWord.ActiveDocument.range.text
return

Re: get text from .docx files

Posted: 19 May 2017, 07:14
by euras
:HeHe:
Blackholyman wrote:the ComObjGet gets the document object but ComObjActive gets the word window object so you need to tell it the document to get the range from

Code: Select all

F2::
oWord := ComObjActive("Word.Application")
msgbox % oWord.ActiveDocument.range.text
return

thank you, it Works, but now the code is very lame. I need to set sleep on almost 10 Seconds, and get a Word document visible before I can read it... Is there any other solutions how I can avoid it?
my code now:

Code: Select all

runwait, http://team/`%20rapporter.docx
sleep 10000
oWord := ComObjActive("Word.Application")
msgbox % oWord.ActiveDocument.range.text
return
I want to have something like this (doesn't work...)

Code: Select all

oWord := ComObjCreate("Word.Application")
oWord.Visible := false 
oWord.Navigate("http://team/`%20rapporter.docx")
msgbox % oWord.ActiveDocument.range.text

Re: get text from .docx files

Posted: 19 May 2017, 09:30
by Guest
COM FREE

NON AHK

Not AutoHotkey but if you have perl you can use this script to get text from DOCX files https://github.com/remonk/linuxsleuthin ... pen_xml.pl (source here is not the author)

If you have DOC files (the older Office files) you can get the text using a free utility called antiword http://www.winfield.demon.nl/

AHK

There is a script in the scripts section that uses a similar technique as the Perl script but I can't find it atm.
A Docx file is just a bunch of zipped files, and you can use AutoHotkey to unzip it, you need "wordfile.docx\word\document.xml" after you have the document.xml just use regex to get the text by stripping all "tags"

you're welcome :D

Re: get text from .docx files

Posted: 19 May 2017, 09:44
by IMEime
For .docx format "COM FREE" method.
They say it is a "Office Open XML" style.

Introducing the Office (2007) Open XML File Formats
https://msdn.microsoft.com/en-us/librar ... e.12).aspx

It looks like very easy to use, because it is a plain txt file.
But, I'd rather recommend you not to use it ever.
It is simply waste of precious time.

If it is a .xlsx format, Open XML could be somewhat useful.
...and the perl script, it is too simple to use. The Open XML for Word is very complex one.

Re: get text from .docx files

Posted: 19 May 2017, 10:07
by Guest
@IMEime I've been using that perl script for years, works like a charm for me, same for antiword, especially in batch files they work very very fast. But if you don't like it that is fine of course :D

Re: get text from .docx files

Posted: 19 May 2017, 11:38
by IMEime
perl ?
Good for you.

If you want to talk about it any further, say it with AHK.

Regards

Re: get text from .docx files

Posted: 19 May 2017, 13:53
by Guest
@IMEime If you insist, I knew it was you who made that AHK script, read the XML, strip the tags using regex, https://autohotkey.com/boards/viewtopic.php?f=6&t=29423 :thumbdown: :salute: :wave: :bravo: :dance: :rainbow: :shh: :D