kindly move to appropriate thread if it is not.
request: as suggested in the heading, i have some documents in MS word 2010, is it possible to get an index wise list of all the words that are used in that word file.
i asked the same at IRC and got awesome help from fluffums, but it is like searching one word and you have to manually change the code for another word, which would take a hall of time may be some days :-p
anyways,
the result shall be like this:
a -> 1-10,2-20,3-100...like that
are -> 1-7, 3-10...like that..
the word " a " is used 10 times on page 1, 20 times on page 2, 100 times on page 3 ..
both the words "a" and "are" are taken the document itself.
that is the resulting index is in alphabetical order.
many many thanks in advance for time and insight as this would be difficult for most of the guys
[request] indexing all words in a word file
- smorgasbord
- Posts: 493
- Joined: 30 Sep 2013, 09:34
[request] indexing all words in a word file
John ... you working ?
Re: [request] indexing all words in a word file
I'm not sure how search each page individually, perhaps that is something that could be done with COM.
To get a count of each occurrence of a word I would parse the text into an array and use the words for keys and the number of occurrences for the values:
I made a word count function a while ago, but didn't release it because it can still be fooled depending on how the text is formatted. But maybe you will find this useful:
To get a count of each occurrence of a word I would parse the text into an array and use the words for keys and the number of occurrences for the values:
Code: Select all
Test := "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec a diam lectus. "
. "Sed sit amet ipsum mauris. Maecenas congue ligula ac quam viverra nec "
. "consectetur ante hendrerit. Donec et mollis dolor. Praesent et diam eget "
. "libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut "
. "porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a "
. "non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aenean ut "
. "gravida lorem. Ut turpis felis, pulvinar a semper sed, adipiscing id dolor. "
. "Pellentesque auctor nisi id magna consequat sagittis. Curabitur dapibus enim "
. "sit amet elit pharetra tincidunt feugiat nisl imperdiet. Ut convallis libero "
. "in urna ultrices accumsan. Donec sed odio eros. Donec viverra mi quis quam "
. "pulvinar at malesuada arcu rhoncus. Cum sociis natoque penatibus et magnis "
. "dis parturient montes, nascetur ridiculus mus. In rutrum accumsan ultricies. "
. "Mauris vitae nisi at sem facilisis semper ac in est."
Words := {}
Loop, Parse, Test, %A_Space%`t`r`n, `,.;:'`"!?/<>[]{}\|()*&^`%$#@!
Words[A_LoopField] := Words[A_LoopField] ? Words[A_LoopField] + 1 : 1
for key, val in Words
Result .= key " -> " val "`n"
MsgBox, % Result
return
Code: Select all
/*
WordCount example:
Highlight text and hit Alt+C to display the WordCount values
*/
!c::
ClipSave := ClipboardAll
Clipboard := ""
Send, ^c
ClipWait, 0.2
Text := Clipboard
Clipboard := ClipSave
if (!ErrorLevel)
{
r := WordCount(Text)
MsgBox, % "Word count = " r.WordCount
. "`nCharacter count not including spaces = " r.CharsNSP
. "`nCharacter count including spaces = " r.CharCount
. "`nSentence count = " r.Sentences
. "`nParagraph count = " r.Paragraphs
. "`nNon-blank line count = " r.NonBlankLines
. "`nTotal line count = " r.TotalLines
. "`nAverage word length = " r.AvgWordLength
. "`nAverage words per sentence = " r.AvgSentWords
. "`nAverage characters per sentence not including spaces = " r.AvgSentCharsNSP
. "`nAverage characters per sentence including spaces = " r.AvgSentChars
. "`nAverage words per paragraph = " r.AvgWordsPerPar
. "`nAverage characters per paragraph not including spaces = " r.AvgCharsPerParNSP
. "`nAverage characters per paragraph including spaces = " r.AvgCharsPerPar
}
return
/*
Function: WordCount
Returns an object with the following properties:
Result.WordCount Word count
Result.CharsNSP Character count not including spaces
Result.CharCount Character count including spaces
Result.Sentences Sentence count
Result.Paragraphs Paragraph count
Result.NonBlankLines Non blank lines count
Result.TotalLines Total lines count
Result.AvgWordLength Average word length
Result.AvgSentWords Average words per sentence
Result.AvgSentCharsNSP Average characters per sentence not including spaces
Result.AvgSentChars Average characters per sentence including spaces
Result.AvgWordsPerPar Average words per paragraph
Result.AvgCharsPerParNSP Average characters per paragraph not including spaces
Result.AvgCharsPerPar Average characters per paragraph including spaces
*/
WordCount(Text)
{
Result := {}
RegExReplace(Text, "\b\w+\b", "", x)
Result.WordCount := x
RegExReplace(Text, "[^\s]", "", y)
Result.CharsNSP := y
Result.CharCount := z := StrLen(Text)
RegExReplace(Text, "U).+(?=[!\.\?]\s|.$)", "", s)
Result.Sentences := s
RegExReplace(Text, "U).+\R|.$", "", p)
Result.Paragraphs := p
RegExReplace(Text, "Um)^.+$", "", n)
Result.NonBlankLines := n
RegExReplace(Text, "Um)^.*$", "", t)
Result.TotalLines := t
Result.AvgWordLength := y // x
Result.AvgSentWords := x // s
Result.AvgSentCharsNSP := y // s
Result.AvgSentChars := z // s
Result.AvgWordsPerPar := x // p
Result.AvgCharsPerParNSP := y // p
Result.AvgCharsPerPar := z // p
return, Result
}
- smorgasbord
- Posts: 493
- Joined: 30 Sep 2013, 09:34
Re: [request] indexing all words in a word file
@k0n
Thanks, bt i need the script to work on 100s of pages
Hopefully someone could help me out.
Thanks, bt i need the script to work on 100s of pages
Hopefully someone could help me out.
John ... you working ?
- smorgasbord
- Posts: 493
- Joined: 30 Sep 2013, 09:34
- Blackholyman
- Posts: 1293
- Joined: 29 Sep 2013, 22:57
- Location: Denmark
- Contact:
Re: [request] indexing all words in a word file
It is but if an example like the one by Kon aint the thing you are after, then it seems like you need a full functioning script and that can take more time than just Any one is willing to use...
Courses on AutoHotkey
My Autohotkey Blog
- Blackholyman
- Posts: 1293
- Joined: 29 Sep 2013, 22:57
- Location: Denmark
- Contact:
Re: [request] indexing all words in a word file
okay here is a stab at it anyway from me
hope that's more on line of your need
Code: Select all
MyDocument := {}
Words := {}
FileSelectFile, path
oWord := ComObjCreate("Word.Application")
;~ oWord.Visible := true
oWord.Documents.open(path)
Source := oWord.ActiveDocument
Pages := Source.ActiveWindow.panes(1).pages.count
Counter = 0
Clipboard := ""
While (Counter < Pages)
{
Counter := Counter + 1
DocName := "Page" . Counter
Source.Bookmarks("\Page").Range.Cut
ClipWait, 1
MyDocument[DocName] := clipboard
Clipboard := ""
}
Source.saved := true
Source.close()
oWord.quit()
for, key, val in MyDocument
{
words[key] := {}
pos = 1
While pos := RegExMatch(val,"\b\w+\b", match, pos+StrLen(match))
{
if words[key].haskey(match)
{
;~ msgbox 1
words[key, match] := words[key, match] + 1
}
else
{
;~ msgbox 2
words[key, match] := 1
}
}
}
for page, val in words
for c, times in val
list .= "the word '" c "' was found " times " times on " page "`n"
msgbox % list
Courses on AutoHotkey
My Autohotkey Blog
- smorgasbord
- Posts: 493
- Joined: 30 Sep 2013, 09:34
Re: [request] indexing all words in a word file
@Blackholyman
WOW! sheer Bliss it is to see the result.
verified the result for two three words, seems RIGHTIO
Thanks again man for your TIME+INTELLIGENCE
if i get to know any loopholes i shall tell
That thing cannot be done so easily man!!
RESPECT!
WOW! sheer Bliss it is to see the result.
verified the result for two three words, seems RIGHTIO
Thanks again man for your TIME+INTELLIGENCE
if i get to know any loopholes i shall tell
That thing cannot be done so easily man!!
RESPECT!
Last edited by smorgasbord on 29 Nov 2013, 23:50, edited 1 time in total.
John ... you working ?
- smorgasbord
- Posts: 493
- Joined: 30 Sep 2013, 09:34
Re: [request] indexing all words in a word file
@Blackholyman
wait wait wait!!
it is working for all the pages. i need to check again
wait wait wait!!
it is working for all the pages. i need to check again
John ... you working ?
- smorgasbord
- Posts: 493
- Joined: 30 Sep 2013, 09:34
Re: [request] indexing all words in a word file
One page count gets shifted somewhere in the middle
IS it because my word file has tables in it??
i forgot to tell that part, my mistake.
sorry
IS it because my word file has tables in it??
i forgot to tell that part, my mistake.
sorry
John ... you working ?
- Blackholyman
- Posts: 1293
- Joined: 29 Sep 2013, 22:57
- Location: Denmark
- Contact:
Re: [request] indexing all words in a word file
Word is not page layout software. It's a word processor. It sees text as a scroll. Each document is one long scroll of text.
Word barely knows what a page is.
Word paginates a document by constantly talking to the current printer driver. It uses information from the printer driver to know where to chop up its precious scroll if it were required to force it on to individual bits of paper.
If you change the printer driver, so that the new one can fit just a tiny bit more or less text on the page than the previous driver, then all the pagination will change.
Where a page starts and ends is constantly changing as the user adds or deletes content and as the user changes how the document is viewed.
As one demonstration of how fluid is the concept of a 'page', try doing Alt-F9. It toggles between displaying fields and displaying field results. Try it in a document with a substantial table of contents, or several linked spreadsheet tables from Excel, or a couple of large linked images, or some other fields that generate content that takes up a lot of space. The number of pages in the document, and where each starts and stops, can change dramatically.
'But Word can count the number of pages. It must be able to identify an individual page!'
Yes, Word can count the number of pages in a document. Use something like:
There's no way to get from the ComputeStatistics property to an individual page.Code: Select all
ActiveDocument.Range.ComputeStatistics(wdStatisticWords)
'But Word has a Pages collection. It must be able to identify an individual page!'
You can do something like the following:
"That works!", you say.Code: Select all
ActiveWindow.Panes(1).Pages(1).Rectangles(1).Range.Select
Yes, it appears to work the first time you try it. But it works in trivial circumstances only. It gets flummoxed by a table or a field that crosses a page boundary.
Here are some examples of problems.
If you have a table that starts on page 16 but a row in the table flows over onto page 17, then
will select page 16 and that part of the row that appears on page 17.Code: Select all
ActiveWindow.Panes(1).Pages(16).Rectangles(1).Range.Select
If a table is very big, starts on page 20 and ends on page 44, and there are rows that break across pages, then
will select all the way from page 20 to page 44!Code: Select all
ActiveWindow.Panes(1).Pages(36).Rectangles(1).Range.Select
If you have, say, a three-page table of contents starting on page 1, then
ActiveWindow.Panes(1).Pages(2).Rectangles(1).Range.Select
will select pages 1, 2 and 3.
If your aim was to cycle through each page and perform some kind of processing on each page, then, in this example, your code would have processed each page many times—in this example, up to 24 times.
Cycling through the Pages collection to process each page is not a usable pattern in most professional work.
So i'm not sure how to help you out Any more if you realy need to use the page count.
Courses on AutoHotkey
My Autohotkey Blog
Re: [request] indexing all words in a word file
If you create make it a PDF, you can then get the text from the PDF using PDFTK which will includes a form feed char between pages which would allow you to actually split / find / use each page. So you have the text per page and then you can count the words from there.
- smorgasbord
- Posts: 493
- Joined: 30 Sep 2013, 09:34
Re: [request] indexing all words in a word file
thanks a ton blackholyman.
and too ahk7
shall try to do it blackholyman
might i PM the file to you sir?
seems possible.
and too ahk7
shall try to do it blackholyman
seems possible.
John ... you working ?
- smorgasbord
- Posts: 493
- Joined: 30 Sep 2013, 09:34
Re: [request] indexing all words in a word file
@Blackholyman
Thanks again, you took so much pain for me.
@ahk7
thanks for the trick.
Here is what i did:
I made headers/Footers + insert page no.s on the word file, then i created PDF using cutepdf of the word file, then again i copied the whole text on PDF file onto a text file. Then i created a script separating every page based on page no.s and Header/Footer.
Kind of worked everytime, though the tables did create some problem ( as blackholyman already suggested ) , though i forgot about the table issue, as there was some other stuff to do.
Anyways Thanks everyone @blackholyman, @k0n and @ahk7 et al.
All hail AHKSCRIPT!!!
Thanks again, you took so much pain for me.
@ahk7
thanks for the trick.
Here is what i did:
I made headers/Footers + insert page no.s on the word file, then i created PDF using cutepdf of the word file, then again i copied the whole text on PDF file onto a text file. Then i created a script separating every page based on page no.s and Header/Footer.
Kind of worked everytime, though the tables did create some problem ( as blackholyman already suggested ) , though i forgot about the table issue, as there was some other stuff to do.
Anyways Thanks everyone @blackholyman, @k0n and @ahk7 et al.
All hail AHKSCRIPT!!!
John ... you working ?
Who is online
Users browsing this forum: Bing [Bot], ordinarier, sachalamp and 99 guests