How can I speed up this code that searches 800MB .csv files

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
Guest

How can I speed up this code that searches 800MB .csv files

09 Oct 2018, 18:51

Code: Select all

#SingleInstance
;Author Sahil Verma 10/7/2018

x := 1 ; rows of information
y := 1 ; columns of information
z := 0 ; Progress Count
w := 0 ; Temp smoke variable
Array:={} ; saved array information

MotherAge := 30 ; Age of Mother
Previsits := 0 ; Mother Previsits
Weight := 0 ; Birth Weight
Dplural := 1 ; Number of Children
Smoke := "Y" ;N for Not Smoking, Y for Smoking

Filename := "Name of File Goes Here"
	
		Loop, Read, %Filename%
		{
		   total_lines = %A_Index%
		   FileReadLine, line, %Filename%, %x% ;;Reads Line Fine
			Arr := StrSplit(line , ",") ;;Splits Fine, Single Dimension Array @ Arr[1,2,3...,n], TEMP
			
			Loop % Arr.length() - 1 ;; y = 0 at first, x = 1 at first
			{
			Array[x,y] := Arr[y] ; Perm Array
			y += 1
			}
			
			w := Array[x,71]
			StringReplace, w, w, ",,All"
			
			if(Array[x,10] > MotherAge){ ; Age of mother
				if(Array[x,54] > Previsits){ ;  Previs
					if(Array[x,181] > Weight){ ; dbweight
						if(Array[x,165] == Dplural){ ; dplural
							if(w = Smoke){
								z += 1
								FileAppend, %line% `n, %MotherAge%-%Previsits%-%Weight%-%Dplural%-%Smoke%-%Filename%
							}
						} 
					}
				}
			}
			
			ToolTip % "Confirmed: " . z . " | Checked: " . x " | Percentage Retained: " . (z/x)*100 "% ", 1900, 1000
			x += 1
			y := 1
		   
		}
				
x::ExitApp


Any Ideas?
just me
Posts: 9458
Joined: 02 Oct 2013, 08:51
Location: Germany

Re: How can I speed up this code that searches 800MB .csv files

10 Oct 2018, 02:45

Well,
  • Loop, Read, %Filename% reads the file line by line storing the contents of the current line into the built-in variable A_LoopReadLine. You don't need to read this line again with FileReadLine, line, %Filename%, %x% ;;Reads Line Fine (which is very slow on large files).
    Also, the file reading loop permits to specify an output file which be will kept open for the duration of the loop to increase performance.
  • Looking at your code snippet I cannot see the reason why each line should be stored in Array. Even if you need to do it, you don't need to check the conditions against the two-dimensional Array. You can simply use the field array Arr for that.
  • Tooltips are time consuming. You should consider whether you really need to show a tooltip after each loop iteration.
So this might be faster (I didn't test it):

Code: Select all

#NoEnv
#SingleInstance
;Author Sahil Verma 10/7/2018

SetBatchLines, -1 ; runs the script with maximum speed

z := 0 ; Progress Count
w := 0 ; Temp smoke variable

MotherAge := 30 ; Age of Mother
Previsits := 0  ; Mother Previsits
Weight := 0     ; Birth Weight
Dplural := 1    ; Number of Children
Smoke := "Y"    ; N for Not Smoking, Y for Smoking

FileArr := {}   ; saved array information

Filename := "Name of File Goes Here"

Loop, Read, %Filename%, %MotherAge%-%Previsits%-%Weight%-%Dplural%-%Smoke%-%Filename%
{
   total_lines := A_Index ; ???
   FieldArr := StrSplit(A_LoopReadLine, ",") ;;Splits Fine, Single Dimension Array @ Arr[1,2,3...,n], TEMP

   LineArr.Push(FieldArr) ; if you really need the LineArr

   w := FieldArr[71]
   StringReplace, w, w, ", , All

   If (FieldArr[10] > MotherAge) ; Age of mother
   && (FieldArr[54] > Previsits) ; Previs
   && (FieldArr[181] > Weight)   ; dbweight
   && (FieldArr[165] == Dplural) ; dplural
   && (w = Smoke) {
      z += 1
      FileAppend, %A_LoopReadLine% `n
   }

   If (Mod(A_Index, 100) = 0) ; or 1000, 10000, ...
      ToolTip % "Confirmed: " . z . " | Checked: " . A_Index " | Percentage Retained: " . (z / A_Index) * 100 "% ", 1900, 1000
   }

x::ExitApp
User avatar
divanebaba
Posts: 805
Joined: 20 Dec 2016, 03:53
Location: Diaspora

Re: How can I speed up this code that searches 800MB .csv files

10 Oct 2018, 03:41

Hi.

I'm not a big fan of Loop, read, ... or FileReadLine, ..., when I had to use big files.
I prefer to store the whole content into a variable and then to parse it with Loop, parse, .....
I even avoid using FileAppend, ... inside a fast loop or timer.
For this case, I store results into a variable and after closing the loop or timer, I store it on harddisk.

The reason is, that in this way, I had only read and write one times on harddik.
I'm not 250% sure, if Loop, read, ... accesses the harddisk for every line. For this case, I'm too uneducated.
But FileAppend, ... accesses the harddisk everytime you use this command.

When your file has, for example, 100.000 lines and you want to report every line with FileAppend, ..., your harddisk will get accesses 100.000 times.
And if you use FileAppend, ... inside a fast loop, you should set aside some money to be able to buy a new hard drive, because even SSD-Disk will fail after intensive use of so much accesses in so short time.

So be careful when you use commands, which are accessing the hard drive.
Use RAM for faster results and more gentle behaviour for the harddisk.

Code: Select all

#SingleInstance
;Author Sahil Verma 10/7/2018

SetBatchLines, -1 ; runs the script with maximum speed
x := 1 ; rows of information
y := 1 ; columns of information
z := 0 ; Progress Count
w := 0 ; Temp smoke variable
Array:={} ; saved array information

MotherAge := 30 ; Age of Mother
Previsits := 0 ; Mother Previsits
Weight := 0 ; Birth Weight
Dplural := 1 ; Number of Children
Smoke := "Y" ; N for Not Smoking, Y for Smoking

Filename := "Name of File Goes Here"
		
		; Loop, Read, %Filename%
		FileRead, Data, %Filename% ; store content into variable called Data
		Loop, parse, Data, `n, `r ; parse Data line for line
		{
			; total_lines = %A_Index% ; as you do not use total_lines in your disclosed code, you can comment it out.
			; FileReadLine, line, %Filename%, %x% ;;Reads Line Fine  ; obsolete because you can use A_LoopField
			Arr := StrSplit(A_LoopField, ",") ;; Splits Fine, Single Dimension Array @ Arr[1,2,3...,n], TEMP
			
			Loop % Arr.length() - 1 ;; y = 0 at first, x = 1 at first
			{
			Array[x,y] := Arr[y] ; Perm Array
			y += 1
			}
			
			w := Array[x,71]
			StringReplace, w, w, ",,All"
			
			if(Array[x,10] > MotherAge){ ; Age of mother
				if(Array[x,54] > Previsits){ ;  Previs
					if(Array[x,181] > Weight){ ; dbweight
						if(Array[x,165] == Dplural){ ; dplural
							if(w = Smoke){
								z += 1
								NewData .= "`n" . A_LoopField
								
								; FileAppend, %A_LoopField% `n, %MotherAge%-%Previsits%-%Weight%-%Dplural%-%Smoke%-%Filename%
							}
						} 
					}
				}
			}
			; ToolTip % "Confirmed: " . z . " | Checked: " . x " | Percentage Retained: " . (z/x)*100 "% ", 1900, 1000
			; x += 1
			; y := 1
		}
FileAppend, % LTrim(NewData, "`n"), %MotherAge%-%Previsits%-%Weight%-%Dplural%-%Smoke%-%Filename%, UTF-8 
; FileAppend, LTrim(NewData, "`n"), %MotherAge%-%Previsits%-%Weight%-%Dplural%-%Smoke%-%Filename%, UTF-8 ; if line above not works, try this line ( All untested :)
		
return


	
x::ExitApp
Maybe my suggestion combined with code of just me could boost the whole process.
Einfach nur ein toller Typ. :mrgreen:
DRocks
Posts: 565
Joined: 08 May 2018, 10:20

Re: How can I speed up this code that searches 800MB .csv files

11 Oct 2018, 07:17

^^ following the last post, and if I understood correctly - a good way to parse a file would look like the first option and not the second commented out one?

Code: Select all

GuiControlGet, R_content ; search content
R_resul:=
FileRead, JG_content, Data\JG.ini ; Extract all its content to JG_content variable
;/*
	loop, Parse, % JG_content, "`n"
	{
		if (A_Index >= 4) and (InStr(A_LoopField, R_content)) { ; >=4 means in TRANSACTIONS section of .ini file content
			R_resul .= A_LoopField "`n" ; si oui append to result string
		}
	}
*/
/*
	Loop, % (REF_JG - 1) ; loop ALL JG lines
	{
		IniRead, J_Index, Data\JG.ini, TRANSACTIONS, J%A_Index% ; Read J-1 - J-MaxIndex
		if (InStr(J_Index, R_content)) { ; Contenu recherché dans cette transaction?
			R_resul .= J_Index "`n" ; si oui append to result string
		}
	}
*/
MsgBox % R_resul	
;Gui, RECHERCHE: show

return
User avatar
divanebaba
Posts: 805
Joined: 20 Dec 2016, 03:53
Location: Diaspora

Re: How can I speed up this code that searches 800MB .csv files

11 Oct 2018, 08:51

DRocks wrote:^^ following the last post, and if I understood correctly - a good way to parse a file would look like the first option and not the second commented out one? ...
The message is:
Avoid access to harddisk inside Loops!
Einfach nur ein toller Typ. :mrgreen:

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: filipemb, Spawnova and 342 guests