Jump to content

Sky Slate Blueberry Blackcurrant Watermelon Strawberry Orange Banana Apple Emerald Chocolate
Photo

AutoHotkey_L beta version for testing: PCRE 8.30


  • Please log in to reply
6 replies to this topic
Lexikos
  • Administrators
  • 9844 posts
  • AutoHotkey Foundation
  • Last active:
  • Joined: 17 Oct 2006
Testing is needed for the following, with any scripts using RegEx:
[*:2ay91th6]AutoHotkey_pcre830.zip (Unicode 32-bit)
[*:2ay91th6]AutoHotkey_pcre830_ansi.zip
[*:2ay91th6]AutoHotkey_pcre830_x64.zipThese are based on v1.1.07.01, with PCRE 8.30 merged in. PCRE 8.30 supports building a 16-bit library in addition to the usual 8-bit library, superseding my earlier changes to support UTF-16. There have also been numerous bug-fixes and some other improvements. See pcre/changelog.txt.

PCRE 8.20 and later include just-in-time compilation of patterns - that is, patterns can be compiled to machine code for faster re-evaluation. This is enabled in the downloads above, and can be used by specifying the S (study) option. However, it adds about 68 KB to the final executable size (for Unicode 32-bit). I think that very few scripts are likely to benefit from it, so it will probably be disabled. Enabling/disabling it is a simple matter of adding/removing the following line in pcre/config.h before compiling:
#define SUPPORT_JIT
Note that AutoHotkey caches the last 100 patterns. I have also fixed a bug with the S (study) option; the study data wasn't freed when a studied pattern was dropped from the cache. (This was only a problem when hundreds of unique patterns were being used.)

For source code, see github.

  • Guests
  • Last active:
  • Joined: --
It appears to be insanely faster (more than 10 times) than the current stable build (v1.1.07.01) in my benchmarks. :shock: I haven't found any bugs yet. Great job.

Lexikos
  • Administrators
  • 9844 posts
  • AutoHotkey Foundation
  • Last active:
  • Joined: 17 Oct 2006
Which patterns with what input? My tests showed less than double the speed when JIT compilation was used, and little or no improvement otherwise.

nimda
  • Members
  • 4368 posts
  • Last active: Aug 09 2015 02:36 AM
  • Joined: 26 Dec 2010
I found that it slowed down (vs AHK_L 1.1.07.01) for a single run, but sped up for subsequent runs (probably due to the compiling)
/* AHK_L 1.1.07.01
Single Run:  1190    = 0.000048s
10,000 runs: 1951143 = 0.078046s

AHK Beta
Single Run:  4177    = 0.000167s
10,000 runs: 1299058 = 0.051962s
*/

#NoEnv
SetBatchLines, -1
Process, Priority,, High

Timer("Init"), clipboard := ""

UrlRegEx =
(Join
(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9`%])|www\d{0,3}[.]
|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+
|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+
|(\([^\s()<>]+\)))*\)|[^\s``!()\[\]{};:'".,<>?«»“”‘’]))
)

Timer("Start")
RegExMatch("blah www.autohotkey.net/ blah", URLRegEx, O)
Clipboard .= Timer("Stop", "Single Run: {1} = {2}s`r`n", 1)


Timer("Start")
Loop 10000
   RegExMatch("blah `n v.gd/gdip_ `n", URLRegEx, O)
Clipboard .= Timer("Stop", "10,000 runs: {1} = {2}s", 1)

Timer(sMode, sContext="", bDisplay=0){
	static startCounter, QPFrequency
	if !QPFrequency
		DllCall("QueryPerformanceFrequency", "Int64*", QPFrequency)
	if ( sMode = "Start" ){
		DllCall("QueryPerformanceCounter", "Int64*", StartCounter)
		return StartCounter
	}
	else if ( sMode = "Stop" ){
		DllCall("QueryPerformanceCounter", "Int64*", EndCounter)
		Time1 := EndCounter - StartCounter
		Time2 := Time1 / QPFrequency
		if ( bDisplay ){
			if !sContext
				sContext := "{1}"
			StringReplace, sContext, sContext, {1}, %Time1%, All
			StringReplace, sContext, sContext, {2}, %Time2%, All
			MsgBox % sContext
		}
		return sContext
	}
}


Lexikos
  • Administrators
  • 9844 posts
  • AutoHotkey Foundation
  • Last active:
  • Joined: 17 Oct 2006
The first run is always slower as the pattern has to be "compiled" to byte code. AutoHotkey caches the last 100 compiled patterns, so subsequent runs are typically faster. JIT compilation is not used by default as it takes longer if a compiled pattern isn't already in the cache. See my first post.

  • Guests
  • Last active:
  • Joined: --

Which patterns with what input?

This is just a simple benchmark script. I haven't fully tested with various different patterns and options.
; Sample Text
file := A_ScriptDir "\text.txt"
if !FileExist(file) {
	tooltip, Creating a file...
	FileDelete, % file
	HayStack := "fully armed, we determined that a human form from the cross." 
		. " The regret landscapes route out. Spiros hears that means." 
		. " It flows into a moment, ultimate map of the influence of "
		. "the protector, born in, for a little adobe house outside "
		. "of the ancient magic. Ω He pours himself in swooning in the "	;embedded a unicode character, Ω
		. "mirror, I got a random line: —See, my clodes, and throws heaps"
		. " of elfin beauty, radiant, literally ælfscínu, they have set up?"
		. " —**** me…the final evidence. How does the world with the flame "
		. "of warning.  ā —But there and cracks another 19 Audio and we do "	;embedded a unicode character, ā
		. "you will they" ;generated at http://johno.jsmf.net/knowhow/ngrams/
	loop 10000
		FileAppend, % HayStack "`n", % file, UTF-8
}

; Variables and Objects
FileRead, HayStack, % file
oNeedles := ["\QΩ\E", "S)\QΩ\E", "\Qā\E", "S)\Qā\E"]
repeat := 1000
result .= A_AHKVersion (A_IsUnicode ? " Unicode" : " Ansi") (A_PtrSize = 8 ? " x64" : " x32") "`n`n"

; test RegexMatch
result .= "RegexMatch`n"
Loop 2	; one is with S) and one is without it
{
	Index := A_Index
	tooltip, % "Benchmarking... needle: " oNeedles[Index]
	StartTime := A_TickCount
	occurrence := 0, 
	h := HayStack , n := oNeedles[Index]
	loop % repeat
	{
		spos := 1, pos := 1
		While (pos := RegexMatch(h, n, m, pos + strlen(m)))
			++occurrence
	}
	result .= "Needle:" A_Tab oNeedles[Index] "`n"
		. "Count:" A_Tab occurrence "`n"
		. "Elapsed:" A_Tab (A_TickCount - StartTime) / 1000 "`n`n"
}

; test RegexReplace
result .= "RegexReplace`n"
Loop 2	; one is with S) and one is without it
{
	Index := A_Index + 2
	tooltip, % "Benchmarking... needle: " oNeedles[Index]
	StartTime := A_TickCount
	occurrence := 0
	h := HayStack, n := oNeedles[Index]
	loop % repeat
		RegexReplace(h, n, "", count),	occurrence += count
	result .= "Needle:" A_Tab oNeedles[Index] "`n"
		. "Count:" A_Tab occurrence "`n"
		. "Elapsed:" A_Tab (A_TickCount - StartTime) / 1000 "`n`n"
}
tooltip
clipboard := result
msgbox % result
The results (on WIndows 7 64-bit):

1.1.07.01 Unicode x32

RegexMatch
Needle: \QΩ\E
Count: 10000000
Elapsed: 556.705000

Needle: S)\QΩ\E
Count: 10000000
Elapsed: 78.313000

RegexReplace
Needle: \Qā\E
Count: 10000000
Elapsed: 193.909000

Needle: S)\Qā\E
Count: 10000000
Elapsed: 34.913000


1.1.07.01 Unicode x32 pcre830

RegexMatch
Needle: \QΩ\E
Count: 10000000
Elapsed: 31.574000

Needle: S)\QΩ\E
Count: 10000000
Elapsed: 31.824000

RegexReplace
Needle: \Qā\E
Count: 10000000
Elapsed: 17.004000

Needle: S)\Qā\E
Count: 10000000
Elapsed: 19.625000


1.1.07.01 Unicode x64 pcre830

RegexMatch
Needle: \QΩ\E
Count: 10000000
Elapsed: 31.731000

Needle: S)\QΩ\E
Count: 10000000
Elapsed: 30.326000

RegexReplace
Needle: \Qā\E
Count: 10000000
Elapsed: 16.973000

Needle: S)\Qā\E
Count: 10000000
Elapsed: 17.597000

1.1.07.01 Ansi x32 pcre830

RegexMatch
Needle: \QΩ\E
Count: 10000000
Elapsed: 26.864000

Needle: S)\QΩ\E
Count: 10000000
Elapsed: 31.387000

RegexReplace
Needle: \Q?\E
Count: 70000000
Elapsed: 23.026000

Needle: S)\Q?\E
Count: 70000000
Elapsed: 24.414000



nimda
  • Members
  • 4368 posts
  • Last active: Aug 09 2015 02:36 AM
  • Joined: 26 Dec 2010
Lexikos,
I was not referring to the difference between single and multiple runs, but rather the difference between AutoHotkey versions.
When compiling a pattern for the first time, the Beta was slower.
When using cached patterns, the Beta was faster.