Page 1 of 2

Trie Data Structure

Posted: 14 Aug 2017, 10:42
by Capn Odin
An AHK implementation of a trie. Tries are a type of tree where the values are read from root to leaf, this makes them suited for things like typing suggestions because you are able to search through only the branches that start with a specific prefix.

Source

Code: Select all

Class Trie {
	__Data := {}
	__Count := 0
	__Bool := False
	
	__New(Max := False, params*) {
		this.__Max := Max
		for i, v in params {
			this.__Set(v, "")
		}
	}
	
	__Get(key) {
		if(!__StrInList(key, ["__Data", "__Count", "__Max", "__Bool"])) {
			if(StrLen(key) > 0) {
				obj := this.__Data
				for i, char in StrSplit(key) {
					obj := obj[char]
				}
				this.__Count := 0
				return obj ? StrSplit(SubStr(this.getKeys(obj, key), 1, -1), "`n") : []
			}
		}
	}
	
    __Set(key, val) {
		if(!__StrInList(key, ["__Data", "__Count", "__Max", "__Bool"])) {
			if(StrLen(key) > 0) {
				this.__Bool := True
				obj := this.__Data
				for i, char  in StrSplit(key) {
					; Save reference to the parent of the leaf for use after loop
					preObj := obj
					
					obj := this.addCharToTrie(obj, char)
					
				}
				; Handle if word is ending inside the trie
				if(!this.objIsEmty(obj)) {
					obj["word"] := True
				}
				; If the key ends in a leaf it don't need to be an object
				if(this.objIsEmty(obj)) {
					preObj[char] := True ; (this is to save memory)
				}
			}
		}
	}
	
	remove(key) {
		if(this.__Data.HasKey(SubStr(key, 1, 1))) {
			; Get references to all the characters in the branch
			branch := [this.__Data[SubStr(key, 1, 1)]]
			for i, char in StrSplit(SubStr(key, 2)) {
				if(branch[i].HasKey(char)) {
					branch.Push(branch[i][char])
				} else {
					return
				}
			}
			
			len := branch.Length()
			
			; Handle if the key is not ending in a leaf
			branch[branch.MaxIndex()].Delete("word")
			
			; Deleat chars from the leaf to the root if nothing else will disappear as a result of the deletion
			loop % len {
				index := len + 1 - A_Index
				if(this.objIsEmty(branch[index], "")) {
					branch[index - 1].Delete(SubStr(key, index, 1))
				} else {
					if(this.objIsEmty(branch[index])) {
						branch[index - 1][SubStr(key, index, 1)] := True ; (this is to save memory)
					}
					return
				}
			}
		}
	}
	
	getKeys(obj, key) {
		if(this.objIsEmty(obj)) {
			this.__Count++
			return key "`n"
		}
		if(obj.HasKey("word")) {
			this.__Count++
			res := key "`n"
		}
		for char in obj {
			if(this.__Max && this.__Count >= this.__Max) {
				Break
			}
			if(StrLen(char) == 1) {
				res .= this.getKeys(obj[char], key char)
			}
		}
		return res
	}
	
	addCharToTrie(obj, key) {
		if(!obj.HasKey(key)) {
			; If a new leaf is added from an empty leaf, there must have been a word ending in the late leaf
			if(this.__Bool && this.objIsEmty(obj) && obj != this.__Data) {
				obj["word"] := True
			}
			this.addLeaf(obj, key)
			this.__Bool := False
		}
		
		; If it is a leaf it will not ba an object
		if(!IsObject(obj[key])){
			this.addLeaf(obj, key)
		}
		
		return obj[key]
	}
	
	addLeaf(obj, char) {
		obj[char] := {}
		obj.SetCapacity(0) ; (this is to save memory)
	}
	
	objIsEmty(obj, except := "word") {
		for key in obj {
			if(key != except) {
				return False
			}
		}
		return True
	}
}

__StrInList(str, lst) {
	if str in % __Join(",", lst)
		return True
	return False
}

__Join(sep, params) {
	for index, param in params {
		str .= param sep
	}
	return SubStr(str, 1, -StrLen(sep))
}

MsgObj(obj) {
	MsgBox, % SubStr(ObjectToString(obj), 1, -1)
}

ObjectToString(obj){
	if(!IsObject(obj)){
		return """" obj """"
	}
	res := "{"
	for key, value in obj {
		res .= """" key """ : " ObjectToString(value) ", "
	}
	return SubStr(res, 1, -2) "}"
}
Intended Use

The only function that is meant to be manually used from outside the Class is __Set since trie[key] := "" behaves in an undesirable way for this class.
To get data from the trie use the syntax trie[key]

Test: Typing Suggestions

Image

Code: Select all

#Include Trie.ahk

SetBatchLines, -1
CoordMode, Mouse, Screen

Entries := new Trie(500, "Autohotkey")

BackgroundColor := "C6D43C"
WM_SETREDRAW := 0x0B

Gui, 1:New
Gui, +AlwaysOnTop +ToolWindow -SysMenu -Caption +HwndGuiHwnd +LastFound ; +E0x08000000
Gui, Color, %BackgroundColor%

Gui, Font, s15
Gui, Margin, 0, 0, 0, 0
Gui, Add, Edit, vAddress gSearch w300
Gui, Add, ListBox, vSuggestions w300 h200 +HwndCtrlHwnd

WinSet, TransColor, %BackgroundColor%

AddDict()
return

#s::
	MouseGetPos, x, y
	Gui, Show, % "x" x " y" y " AutoSize"
return

AddDict() {
	Global Entries
	lst := StrSplit(FileOpen("English British.txt", "r").Read(), "`n")
	
	len := Round(lst.Length() / 100)
	
	for index, line in lst {
		if(!mod(index, len)) {
			ToolTip, % Format("{:02.1d}", (index / lst.Length())*100)
		}
		Entries.__Set(Format("{:L}", (pos := InStr(line, "/")) ? SubStr(line, 1, pos - 1) : RegExReplace(line, "`n|`r")), "")
	}
	ToolTip
}

#If WinActive("ahk_id " GuiHwnd)
	Enter::
		Gui, Submit, NoHide
		Entries.__Set(Address, "")
		Goto, Search
	return
	
	Delete::
		Gui, Submit, NoHide
		Entries.remove(Address)
		Goto, Search
	return
#If

Search:
	Gui, Submit, NoHide
	GuiControl, -Redraw, Suggestions
	GuiControl, , Suggestions, % "|" __Join("|", Entries[Address])
	GuiControl, +Redraw, Suggestions
Return

Re: Trie Data Structure

Posted: 14 Aug 2017, 15:39
by Helgef
I just tried the example, nice one :thumbup:. I will look closer at your code later :geek:
Thanks for sharing, cheers.

Re: Trie Data Structure

Posted: 14 Aug 2017, 15:54
by rommmcek
Looks promising even if in this example loads and works rather slow.
When I type in "summe" I miss "summer" in the suggestion list.

Re: Trie Data Structure

Posted: 14 Aug 2017, 16:04
by Helgef
You can remove the tooltip for fast load, or make it conditional, eg, if !mod(index,1000)

Re: Trie Data Structure

Posted: 14 Aug 2017, 16:12
by Capn Odin
rommmcek wrote:Looks promising even if in this example loads and works rather slow.
When I type in "summe" I miss "summer" in the suggestion list.
Yes, the reason is that the trie doesn't know that "summer" is a finished word since the branche continue to "summerhouse", I am trying to find a solution that is elegant and not too memory hungry, since it takes too much memory already.

Re: Trie Data Structure

Posted: 14 Aug 2017, 16:16
by rommmcek
Oh, thanks! It was ToolTip hindering quick load! But still works slow, specially after first letter.
When entering "sobe", "sober" is missing. There must be a bug.

P.s.: Right now I see, you are working on it! Hope there is a good solution!

Re: Trie Data Structure

Posted: 14 Aug 2017, 16:19
by Capn Odin
Helgef wrote:You can remove the tooltip for fast load, or make it conditional, eg, if !mod(index,1000)
Thanks, I assumed the bottleneck was making the trie.

Re: Trie Data Structure

Posted: 14 Aug 2017, 19:26
by Capn Odin
It should support subkeys now, rip simplicity.

Re: Trie Data Structure

Posted: 15 Aug 2017, 02:47
by rommmcek
Yeah, now it shows all the words. I liked it before better (except for the missing word)! Now looks like any other alphabetical suggestion.
P.s.: Block searching after first letter! Takes too much time and is not of great help.

Edit: Sorry, suggestions are essentiality the same, of course now are complete. But the question remains, what's the advantage? Nevertheless I'm interested in any alternatives, since I have a lot of problems with orthography!

Re: Trie Data Structure

Posted: 15 Aug 2017, 05:44
by Helgef
The trie performs well imo, it is mostly the example that has performance issues, which is good news :thumbup:
Most time is spent updating the Listbox, improve by,

Code: Select all

GuiControl, -Redraw, Suggestions 
GuiControl, , Suggestions, % "|" Join("|", Entries[Address]*)
GuiControl, +Redraw, Suggestions
Also, consider limiting the number of hits in join, eg,

Code: Select all

Join(sep, params*) {
	for index, param in params {
		str .= param sep
		if (index>50)	; added
			break
	}
	return SubStr(str, 1, -StrLen(sep))
}
Usually one will narrow the search if presented 20+ suggestions anyways.

Re: Trie Data Structure

Posted: 15 Aug 2017, 06:38
by rommmcek
Thanks, Helgef! You are a real Pro!
I added in the Search: label after Gui, Submit, NoHide:

Code: Select all

if StrLen(Address) = 1
	return
Now it works perfect even with half a million wordlist!

It's remains to see how it performs in everyday usage!

Re: Trie Data Structure

Posted: 15 Aug 2017, 08:16
by Helgef
:thumbup: That is a good addition rommmcek

Re: Trie Data Structure

Posted: 15 Aug 2017, 11:29
by Capn Odin
Helgef wrote:Most time is spent updating the Listbox
Yes, I reached the same conclusion, but had forgotten about Redraw, was trying SendMessage instead, but I decided it wasn't worth the effort.

Edit: Why is Redraw only in ListBox and not in Styles ?
Helgef wrote:Usually one will narrow the search if presented 20+ suggestions anyways.
I am all for narrowing the results in the test, but I want to keep the trie generic.
rommmcek wrote:Sorry, the question remains, what's the advantage?
The test is merely to showcase the trie. The advantage of a try is that you can limit how much you need to search though, by specifying a prefix.

Every thing that starts with a t will be in the left subtrie, meaning that you have removed half of the example trie already, this is why it is efficient.
Image

Edit: I am thinking about adding the ability to remove values.

Re: Trie Data Structure

Posted: 15 Aug 2017, 13:15
by rommmcek
Yeah, I'm dazzled using tool, having wordlist stored in DataBase and so a bit superior then the Trie by itself.
Is there any way to store the Trie in the Database and then access the keys still faster?

Re: Trie Data Structure

Posted: 15 Aug 2017, 14:29
by Capn Odin
I am sorry, rommmcek, I don't understand what you are asking.

Re: Trie Data Structure

Posted: 15 Aug 2017, 15:33
by rommmcek
Forget it! Not knowing procedures, how to store data in DB or Trie, the question obviously has no sense!
Anyway, thanks for your effort!

Re: Trie Data Structure

Posted: 15 Aug 2017, 15:53
by Helgef
Capn Odin wrote:
Helgef wrote:Usually one will narrow the search if presented 20+ suggestions anyways.
I am all for narrowing the results in the test, but I want to keep the trie generic.
Yes :thumbup: , I mean only for the example.

Re: Trie Data Structure

Posted: 16 Aug 2017, 07:12
by Joe Glines
Very cool Capn Odin! :) Thank you for sharing this during the webinar! I look forward to updates. :)

BTW- I tried it on a file with 2,235,964 words and it was not to happy. lol I'm sure this is beyond what you were imagining to use and not a realistic use case but thought you might like to know.
cheers,
Joe

Re: Trie Data Structure

Posted: 16 Aug 2017, 08:07
by Capn Odin
Joe Glines wrote:Very cool Capn Odin! :) Thank you for sharing this during the webinar! I look forward to updates. :)

BTW- I tried it on a file with 2,235,964 words and it was not to happy. lol I'm sure this is beyond what you were imagining to use and not a realistic use case but thought you might like to know.
cheers,
Joe
Would you be willing to share the file ?
I believe the problem is the memory usage, I have tried to optimise it, for a norwegian dictionary with 300,000 words, I reduced the memory usage from 251 MB to 149 MB.

Re: Trie Data Structure

Posted: 16 Aug 2017, 11:45
by rommmcek
I think the only way to takle extra large wordlists (1M+) in Ahk is MCode. Analogues to what did feiyue in FindText - Capture screen image into text and then find it.