Trie Data Structure

Post your working scripts, libraries and tools
User avatar
Capn Odin
Posts: 1290
Joined: 23 Feb 2016, 19:45
Location: Denmark

Trie Data Structure

14 Aug 2017, 10:42

An AHK implementation of a trie. Tries are a type of tree where the values are read from root to leaf, this makes them suited for things like typing suggestions, because you are able to search through only the branches that start with a specific prefix.

Source[hr][/hr]

Code: Select all

Class Trie {
	__Data := {}
	__Count := 0
	__Bool := False
	
	__New(Max := False, params*) {
		this.__Max := Max
		for i, v in params {
			this.__Set(v, "")
		}
	}
	
	__Get(key) {
		if(!__StrInList(key, ["__Data", "__Count", "__Max", "__Bool"])) {
			if(StrLen(key) > 0) {
				obj := this.__Data
				for i, char in StrSplit(key) {
					obj := obj[char]
				}
				this.__Count := 0
				return obj ? StrSplit(SubStr(this.getKeys(obj, key), 1, -1), "`n") : []
			}
		}
	}
	
    __Set(key, val) {
		if(!__StrInList(key, ["__Data", "__Count", "__Max", "__Bool"])) {
			if(StrLen(key) > 0) {
				this.__Bool := True
				obj := this.__Data
				for i, char  in StrSplit(key) {
					; Save reference to the parent of the leaf for use after loop
					preObj := obj
					
					obj := this.addCharToTrie(obj, char)
					
				}
				; Handle if word is ending inside the trie
				if(!this.objIsEmty(obj)) {
					obj["word"] := True
				}
				; If the key ends in a leaf it don't need to be an object
				if(this.objIsEmty(obj)) {
					preObj[char] := True ; (this is to save memory)
				}
			}
		}
	}
	
	remove(key) {
		if(this.__Data.HasKey(SubStr(key, 1, 1))) {
			; Get references to all the characters in the branch
			branch := [this.__Data[SubStr(key, 1, 1)]]
			for i, char in StrSplit(SubStr(key, 2)) {
				if(branch[i].HasKey(char)) {
					branch.Push(branch[i][char])
				} else {
					return
				}
			}
			
			len := branch.Length()
			
			; Handle if the key is not ending in a leaf
			branch[branch.MaxIndex()].Delete("word")
			
			; Deleat chars from the leaf to the root if nothing else will disappear as a result of the deletion
			loop % len {
				index := len + 1 - A_Index
				if(this.objIsEmty(branch[index], "")) {
					branch[index - 1].Delete(SubStr(key, index, 1))
				} else {
					if(this.objIsEmty(branch[index])) {
						branch[index - 1][SubStr(key, index, 1)] := True ; (this is to save memory)
					}
					return
				}
			}
		}
	}
	
	getKeys(obj, key) {
		if(this.objIsEmty(obj)) {
			this.__Count++
			return key "`n"
		}
		if(obj.HasKey("word")) {
			this.__Count++
			res := key "`n"
		}
		for char in obj {
			if(this.__Max && this.__Count >= this.__Max) {
				Break
			}
			if(StrLen(char) == 1) {
				res .= this.getKeys(obj[char], key char)
			}
		}
		return res
	}
	
	addCharToTrie(obj, key) {
		if(!obj.HasKey(key)) {
			; If a new leaf is added from an empty leaf, there must have been a word ending in the late leaf
			if(this.__Bool && this.objIsEmty(obj) && obj != this.__Data) {
				obj["word"] := True
			}
			this.addLeaf(obj, key)
			this.__Bool := False
		}
		
		; If it is a leaf it will not ba an object
		if(!IsObject(obj[key])){
			this.addLeaf(obj, key)
		}
		
		return obj[key]
	}
	
	addLeaf(obj, char) {
		obj[char] := {}
		obj.SetCapacity(0) ; (this is to save memory)
	}
	
	objIsEmty(obj, except := "word") {
		for key in obj {
			if(key != except) {
				return False
			}
		}
		return True
	}
}

__StrInList(str, lst) {
	if str in % __Join(",", lst)
		return True
	return False
}

__Join(sep, params) {
	for index, param in params {
		str .= param sep
	}
	return SubStr(str, 1, -StrLen(sep))
}

MsgObj(obj) {
	MsgBox, % SubStr(ObjectToString(obj), 1, -1)
}

ObjectToString(obj){
	if(!IsObject(obj)){
		return """" obj """"
	}
	res := "{"
	for key, value in obj {
		res .= """" key """ : " ObjectToString(value) ", "
	}
	return SubStr(res, 1, -2) "}"
}
Intended Use[hr][/hr]The only function that is meant to be manually used from outside the Class is __Set since trie[key] := "" behaves in an undesirable way for this class.
To get data from the trie use the syntax trie[key]

Test: Typing Suggestions[hr][/hr]Image

Code: Select all

#Include Trie.ahk

SetBatchLines, -1
CoordMode, Mouse, Screen

Entries := new Trie(500, "Autohotkey")

BackgroundColor := "C6D43C"
WM_SETREDRAW := 0x0B

Gui, 1:New
Gui, +AlwaysOnTop +ToolWindow -SysMenu -Caption +HwndGuiHwnd +LastFound ; +E0x08000000
Gui, Color, %BackgroundColor%

Gui, Font, s15
Gui, Margin, 0, 0, 0, 0
Gui, Add, Edit, vAddress gSearch w300
Gui, Add, ListBox, vSuggestions w300 h200 +HwndCtrlHwnd

WinSet, TransColor, %BackgroundColor%

AddDict()
return

#s::
	MouseGetPos, x, y
	Gui, Show, % "x" x " y" y " AutoSize"
return

AddDict() {
	Global Entries
	lst := StrSplit(FileOpen("English British.txt", "r").Read(), "`n")
	
	len := Round(lst.Length() / 100)
	
	for index, line in lst {
		if(!mod(index, len)) {
			ToolTip, % Format("{:02.1d}", (index / lst.Length())*100)
		}
		Entries.__Set(Format("{:L}", (pos := InStr(line, "/")) ? SubStr(line, 1, pos - 1) : RegExReplace(line, "`n|`r")), "")
	}
	ToolTip
}

#If WinActive("ahk_id " GuiHwnd)
	Enter::
		Gui, Submit, NoHide
		Entries.__Set(Address, "")
		Goto, Search
	return
	
	Delete::
		Gui, Submit, NoHide
		Entries.remove(Address)
		Goto, Search
	return
#If

Search:
	Gui, Submit, NoHide
	GuiControl, -Redraw, Suggestions
	GuiControl, , Suggestions, % "|" __Join("|", Entries[Address])
	GuiControl, +Redraw, Suggestions
Return
Attachments
English British.txt
(624.3 KiB) Downloaded 85 times
Last edited by Capn Odin on 21 Aug 2017, 20:57, edited 8 times in total.
Please excuse my spelling I am dyslexic.
Helgef
Posts: 3238
Joined: 17 Jul 2016, 01:02
Contact:

Re: Trie Data Structure

14 Aug 2017, 15:39

I just tried the example, nice one :thumbup:. I will look closer at your code later :geek:
Thanks for sharing, cheers.
rommmcek
Posts: 375
Joined: 15 Aug 2014, 15:18

Re: Trie Data Structure

14 Aug 2017, 15:54

Looks promising even if in this example loads and works rather slow.
When I type in "summe" I miss "summer" in the suggestion list.
Attachments
summer.png
summer.png (2.91 KiB) Viewed 1525 times
Helgef
Posts: 3238
Joined: 17 Jul 2016, 01:02
Contact:

Re: Trie Data Structure

14 Aug 2017, 16:04

You can remove the tooltip for fast load, or make it conditional, eg, if !mod(index,1000)
User avatar
Capn Odin
Posts: 1290
Joined: 23 Feb 2016, 19:45
Location: Denmark

Re: Trie Data Structure

14 Aug 2017, 16:12

Looks promising even if in this example loads and works rather slow.
When I type in "summe" I miss "summer" in the suggestion list.
Yes, the reason is that the trie doesn't know that "summer" is a finished word since the branche continue to "summerhouse", I am trying to find a solution that is elegant and not too memory hungry, since it takes too much memory already.
Please excuse my spelling I am dyslexic.
rommmcek
Posts: 375
Joined: 15 Aug 2014, 15:18

Re: Trie Data Structure

14 Aug 2017, 16:16

Oh, thanks! It was ToolTip hindering quick load! But still works slow, specially after first letter.
When entering "sobe", "sober" is missing. There must be a bug.

P.s.: Right now I see, you are working on it! Hope there is a good solution!
Last edited by rommmcek on 15 Aug 2017, 02:53, edited 1 time in total.
User avatar
Capn Odin
Posts: 1290
Joined: 23 Feb 2016, 19:45
Location: Denmark

Re: Trie Data Structure

14 Aug 2017, 16:19

You can remove the tooltip for fast load, or make it conditional, eg, if !mod(index,1000)
Thanks, I assumed the bottleneck was making the trie.
Please excuse my spelling I am dyslexic.
User avatar
Capn Odin
Posts: 1290
Joined: 23 Feb 2016, 19:45
Location: Denmark

Re: Trie Data Structure

14 Aug 2017, 19:26

It should support subkeys now, rip simplicity.
Please excuse my spelling I am dyslexic.
rommmcek
Posts: 375
Joined: 15 Aug 2014, 15:18

Re: Trie Data Structure

15 Aug 2017, 02:47

Yeah, now it shows all the words. I liked it before better (except for the missing word)! Now looks like any other alphabetical suggestion.
P.s.: Block searching after first letter! Takes too much time and is not of great help.

Edit: Sorry, suggestions are essentiality the same, of course now are complete. But the question remains, what's the advantage? Nevertheless I'm interested in any alternatives, since I have a lot of problems with orthography!
Helgef
Posts: 3238
Joined: 17 Jul 2016, 01:02
Contact:

Re: Trie Data Structure

15 Aug 2017, 05:44

The trie performs well imo, it is mostly the example that has performance issues, which is good news :thumbup:
Most time is spent updating the Listbox, improve by,

Code: Select all

GuiControl, -Redraw, Suggestions 
GuiControl, , Suggestions, % "|" Join("|", Entries[Address]*)
GuiControl, +Redraw, Suggestions
Also, consider limiting the number of hits in join, eg,

Code: Select all

Join(sep, params*) {
	for index, param in params {
		str .= param sep
		if (index>50)	; added
			break
	}
	return SubStr(str, 1, -StrLen(sep))
}
Usually one will narrow the search if presented 20+ suggestions anyways.
rommmcek
Posts: 375
Joined: 15 Aug 2014, 15:18

Re: Trie Data Structure

15 Aug 2017, 06:38

Thanks, Helgef! You are a real Pro!
I added in the Search: label after Gui, Submit, NoHide:

Code: Select all

if StrLen(Address) = 1
	return
Now it works perfect even with half a million wordlist!

It's remains to see how it performs in everyday usage!
Helgef
Posts: 3238
Joined: 17 Jul 2016, 01:02
Contact:

Re: Trie Data Structure

15 Aug 2017, 08:16

:thumbup: That is a good addition rommmcek
User avatar
Capn Odin
Posts: 1290
Joined: 23 Feb 2016, 19:45
Location: Denmark

Re: Trie Data Structure

15 Aug 2017, 11:29

Most time is spent updating the Listbox
Yes, I reached the same conclusion, but had forgotten about Redraw, was trying SendMessage instead, but I decided it wasn't worth the effort.

Edit: Why is Redraw only in ListBox and not in Styles ?
Usually one will narrow the search if presented 20+ suggestions anyways.
I am all for narrowing the results in the test, but I want to keep the trie generic.
Sorry, the question remains, what's the advantage?
The test is merely to showcase the trie. The advantage of a try is that you can limit how much you need to search though, by specifying a prefix.

Every thing that starts with a t will be in the left subtrie, meaning that you have removed half of the example trie already, this is why it is efficient.
Image

Edit: I am thinking about adding the ability to remove values.
Please excuse my spelling I am dyslexic.
rommmcek
Posts: 375
Joined: 15 Aug 2014, 15:18

Re: Trie Data Structure

15 Aug 2017, 13:15

Yeah, I'm dazzled using tool, having wordlist stored in DataBase and so a bit superior then the Trie by itself.
Is there any way to store the Trie in the Database and then access the keys still faster?
User avatar
Capn Odin
Posts: 1290
Joined: 23 Feb 2016, 19:45
Location: Denmark

Re: Trie Data Structure

15 Aug 2017, 14:29

I am sorry, rommmcek, I don't understand what you are asking.
Please excuse my spelling I am dyslexic.
rommmcek
Posts: 375
Joined: 15 Aug 2014, 15:18

Re: Trie Data Structure

15 Aug 2017, 15:33

Forget it! Not knowing procedures, how to store data in DB or Trie, the question obviously has no sense!
Anyway, thanks for your effort!
Helgef
Posts: 3238
Joined: 17 Jul 2016, 01:02
Contact:

Re: Trie Data Structure

15 Aug 2017, 15:53

Usually one will narrow the search if presented 20+ suggestions anyways.
I am all for narrowing the results in the test, but I want to keep the trie generic.
Yes :thumbup: , I mean only for the example.
User avatar
Joe Glines
Posts: 561
Joined: 30 Sep 2013, 20:49
Facebook: https://www.facebook.com/theAutomatorGuru/
Google: https://plus.google.com/105328929654286634910
GitHub: joetazz
Location: Dallas
Contact:

Re: Trie Data Structure

16 Aug 2017, 07:12

Very cool Capn Odin! :) Thank you for sharing this during the webinar! I look forward to updates. :)

BTW- I tried it on a file with 2,235,964 words and it was not to happy. lol I'm sure this is beyond what you were imagining to use and not a realistic use case but thought you might like to know.
cheers,
Joe
User avatar
Capn Odin
Posts: 1290
Joined: 23 Feb 2016, 19:45
Location: Denmark

Re: Trie Data Structure

16 Aug 2017, 08:07

Very cool Capn Odin! :) Thank you for sharing this during the webinar! I look forward to updates. :)

BTW- I tried it on a file with 2,235,964 words and it was not to happy. lol I'm sure this is beyond what you were imagining to use and not a realistic use case but thought you might like to know.
cheers,
Joe
Would you be willing to share the file ?
I believe the problem is the memory usage, I have tried to optimise it, for a norwegian dictionary with 300,000 words, I reduced the memory usage from 251 MB to 149 MB.
Please excuse my spelling I am dyslexic.
rommmcek
Posts: 375
Joined: 15 Aug 2014, 15:18

Re: Trie Data Structure

16 Aug 2017, 11:45

I think the only way to takle extra large wordlists (1M+) in Ahk is MCode. Analogues to what did feiyue in FindText - Capture screen image into text and then find it.

Return to “Scripts and Functions”

Who is online

Users browsing this forum: leosouza85 and 54 guests