Script to Find Repeating Words

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
User avatar
jballi
Posts: 724
Joined: 29 Sep 2013, 17:34

Script to Find Repeating Words

19 Jun 2018, 15:46

Not the best forum for this request but the other forums didn't seem to right either so here goes...

I'm looking for script that will find repeating words in text. I've seen word processors that that do do it but I did a quick search and I didn't find any scripts out there that do it. I may be using the wrong search terms.

Anyone know of have a script to do this? If not, I will be forced to write my own.

Thanks.
User avatar
TheDewd
Posts: 1510
Joined: 19 Dec 2013, 11:16
Location: USA

Re: Script to Find Repeating Words

19 Jun 2018, 16:13

You mean like this?

Code: Select all

#SingleInstance, Force

SampleText := "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Lorem ipsum dolor sit amet, consectetur adipiscing elit."

StrReplace(SampleText, "Lorem", "Lorem", Occurrence)

MsgBox, % """Lorem"" occurs exactly " Occurrence " times in this sample text."
User avatar
jballi
Posts: 724
Joined: 29 Sep 2013, 17:34

Re: Script to Find Repeating Words

19 Jun 2018, 17:04

Thank you for your interest.

No. I'm sorry, I realize that I wasn't very specific. When I say repeating words I mean words that are repeated next to each other. "The the" is a common problem that occurs in many of the documents that I write. Connectives like "with" are also common. Although there are common words, any word can be inadvertently (and incorrectly) repeated and I need to find them.
User avatar
Xtra
Posts: 2750
Joined: 02 Oct 2015, 12:15

Re: Script to Find Repeating Words

19 Jun 2018, 17:30

Code: Select all

#NoEnv

sampletext := "The the quick brown fox jumps with with excitement over the lazy dog."

RepeatedTextArray := {}

Loop, Parse, sampletext, %A_Space%
{
    if (A_LoopField = LastFoundWord)
        RepeatedTextArray[LastFoundWord] := LastFoundWord . A_Space . A_LoopField
    LastFoundWord := A_LoopField
}

for word, RepeatedWord in RepeatedTextArray
    sampletext := StrReplace(sampletext, RepeatedWord, word)

MsgBox % sampletext
User avatar
jeeswg
Posts: 6902
Joined: 19 Dec 2016, 01:58
Location: UK

Re: Script to Find Repeating Words

19 Jun 2018, 17:33

- Here's an attempt. It's essentially just a parsing loop using space as the delimiter. You then have to consider how to deal with any non-letters. I have considered commas and full stops in the example below.
- If you want to know the exact position of where the repeated match occurs, you could use my script, and then use RegExMatch on the original text to find the positions.
- RegExReplace could be used to replace secondary occurrences of words, but some are valid.

Code: Select all

q:: ;list repeated words
vText := "abc abc, def ghi def def. ghi abc abc def, def"
;note: 'def, def' won't be considered as a repeated pair, since there is a punctuation mark in the middle (such behaviour could be changed by replacing commas with spaces via StrReplace and then replacing multiple spaces with single spaces via RegExReplace)
vText := StrReplace(vText, ",", " ")
vText := StrReplace(vText, ".", " ")
vTempLast := " ", vOutput := "" ;vTemp is compared to vTempLast each time, since we're parsing using space as the delimiter, the previous item will never be a space
Loop, Parse, vText, % " "
{
	vTemp := A_LoopField
	if !(vTemp = "") && (vTemp = vTempLast)
		vOutput .= vTempLast " " vTemp "`r`n"
	vTempLast := vTemp
}
Clipboard := vOutput
MsgBox, % vOutput
return
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
User avatar
jballi
Posts: 724
Joined: 29 Sep 2013, 17:34

Re: Script to Find Repeating Words

19 Jun 2018, 17:58

I was hoping that that there was some some ready-to-go script out there but this is a good start. Thanks everybody! :thumbup:
User avatar
jballi
Posts: 724
Joined: 29 Sep 2013, 17:34

Re: Script to Find Repeating Words

19 Jun 2018, 19:26

From the "just in case you care" department...

I found a good RegEx pattern to help with this requirement from this this web site. It is far (far) from being done but this is what I have so far.

Code: Select all

#NoEnv
#SingleInstance Force

Text=
   (ltrim
    This is just a test        Test this
    This this is is
    just just a test Test of the
    the emergency broadcast center center.
    )

StartPos:=1
Loop
    {
    FoundPos:= RegExMatch(Text,"i)\b([\w]+)\s+\1\b",FoundString,StartPos)
    if FoundPos then
        {
        MsgBox Found: %FoundString%
        StartPos:=FoundPos+StrLen(FoundString)
        }
     else
        {
        MsgBox Nothing (else) found.
        Break
        }
    }

return
Guest

Re: Script to Find Repeating Words

20 Jun 2018, 06:07

jballi wrote:this this
we see what you did there... :D
swagfag
Posts: 6222
Joined: 11 Jan 2017, 17:59

Re: Script to Find Repeating Words

20 Jun 2018, 07:12

not the whole shebang but might give u a starting point

Code: Select all

#NoEnv
#SingleInstance Force
#Persistent
SetBatchLines -1

SampleText =
	(LTrim
		Lorem ipsum dolor sit amet, consectetur adipiscing elit. Lorem ipsum dolor sit amet, consectetur
		adipiscing elit. Lorem ipsum dolor sit amet, consectetur adipiscing elit.
		Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor
		incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud
		exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure
		dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
		Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt
		mollit anim id est laborum. Ut enim ad minim veniam, quis nostrud
		exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure
		dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
		Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt
		mollit anim id est laborum.
	)

Result := findAllRepeatingWords(SampleText)

Gui Display: New, +AlwaysOnTop, Results
Gui Display: Margin, 4, 4
for word, timesRepeated in Result
{
	idx := A_Index
	res .= Format("{}: {} ({})`n", (idx < 10 ? "0" idx : idx), word, timesRepeated)
}

Gui Display: Add, Edit, w200, % res
Gui Display: Show, xCenter yCenter
Return

DisplayGuiClose:
DisplayGuiEscape:
{
	ExitApp
return
}

findAllRepeatingWords(str) {
	str := RegExReplace(str, "\W", A_Space) ; replace non-word chars with space(get rid of punctuation). what about apostrophes?

	Words := StrSplit(str, A_Space)

	Reps := countRepetitions(Words)
	return pruneRepetitions(Reps)
}

; return words mapped to their occurences
countRepetitions(Arr) {
	Result := {}
	for each, word in Arr
	{
		if (word != "")
		{
			if (Result.HasKey(word))
				Result[word]++
			else
				Result[word] := 1
		}
	}

	return Result
}

; get rid of single occurence words
pruneRepetitions(Arr) {
	Result := {}
	for word, timesRepeated in Arr
	{
		if (timesRepeated != 1)
			Result[word] := timesRepeated
	}

	return Result
}
User avatar
jballi
Posts: 724
Joined: 29 Sep 2013, 17:34

Re: Script to Find Repeating Words

20 Jun 2018, 14:41

swagfag: Definitely not what I was looking for but your script provides information about a document that might be useful in the future. Thanks for sharing. :)

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: No registered users and 244 guests