I didn't have time to play more with compile options and such, so I cleanned up my compile how-to and give it here to immediate consumption...
I also give my little prototype of RegExMatch as I feel it should be implemented in AutoHotkey. Of course, these are only suggestions and ideas. I didn't have time to do the RegExReplace and RegExSplit. The former is more or less implemented in my wrapper, the later have to be done yet.
Will there be a Loop RegExParse (or Loop Parse with regex option)?
How to compile PCRE on Windows
And on other systems as well, but Unices have makefiles.
Download the latest version of PCRE -- v.6.7 at time of writing, a 558KB .tar.bz2 file (also available as .tar.gz, more than 800KB).
http://www.pcre.org/
ftp://ftp.csx.cam.ac.uk/pub/software/pr ... .7.tar.bz2
Unzip the file in a directory.
You will find a NON-UNIX-USE file with so instructions for compiling on Windows.
Since it is slightly outdated (config.in is now named config.h.in), I give here new instructions.
Copy config.h.in to config.h and edit it.
As explained: "change the macros that define HAVE_STRERROR and HAVE_MEMMOVE to define them as 1 rather than 0."
I am not sure about NEWLINE, I leave it as it is.
Other defaults values seems OK.
Next step is to compile dftables.c and run dftables.exe to generate a pcre_chartables.c file, using the system (or user) locale.
On Windows, it seems to use the C locale (default), perhaps it needs an additional explicit call to set the locale within the program.
The pcre_chartables.c file can then be manually tweaked to meet any requirement.
The remainder of the steps are (almost) trivial: just add the following files to a project.
pcre_chartables.c -- Generated
pcre_compile.c
pcre_config.c
pcre_exec.c
pcre_fullinfo.c
pcre_get.c
pcre_globals.c
pcre_info.c
pcre_maketables.c
pcre_refcount.c
pcre_study.c
pcre_tables.c
pcre_try_flipped.c
pcre_version.c
pcre_xclass.c
pcre_dfa_exec.c -- Can be omitted if this method isn't used (for specialists!)
pcre_ord2utf8.c -- Probably not needed if not using UTF-8
pcre_ucp_searchfuncs.c -- Idem
pcre_valid_utf8.c -- Idem
These are just all the pcre_xxx.c files.
To make a small project, you can omit the pcre_dfa_exec.c file, and perhaps remove the pcre_dfa_exec function declaration from pcre.h.
And if UTF-8 support isn't needed, skip also the last three files.
If you need UTF-8 support, add SUPPORT_UTF8 preprocessor definition to the C compile options.
If you need UCP support (Unicode character property: escape sequences \p{..}, \P{..}, and \X), add SUPPORT_UCP preprocessor definition (and SUPPORT_UTF8 too, of course).
If you want DFA function without UTF-8/UCP, you need to edit pcre_dfa_exec.c, as I had a link error: unresolved external symbol __pcre_ucp_findprop
It seems author forgot to protect some parts with the #ifdef SUPPORT_UCP test.
I added it between
case OP_PROP_EXTRA + OP_TYPEPLUS:
and the break of
case OP_EXTUNI_EXTRA + OP_TYPEEXACT:
hoping these opcodes are generated only in UCP mode...
If you need to make a DLL from PCRE, either you copy libpcre.def to pcredll.def and edit it to remove or rename the library name that can conflict with the project name. Then add this file to the project;
Or you define (for Visual C++ only?) the preprocessor macros PCRE_DEFINITION and DLL_EXPORT, and it will use the right __declspec in front of the exported functions.
/*
RegEx.ahk
Wrapper routines to ease the use of the functions in PCRE3.dll.
The functions here should be easier to use than those in PCRE_DLL.ahk,
at the cost of some performance loss.
To compensate a bit that, I cache the latest compiled string,
so repetitive uses of the same expression are a bit optimized.
I cache only one string, because searching the cache for several
strings in pure AHK would be slower than compiling the expression...
And for difficult cases, there is still the first wrapper, with explicit compilation.
If (when) REs will be integrated to AutoHotkey, it will be able to manage
a bigger cache. I doubt caching more than 5 REs is necessary:
the problem arises mostly when using different regexes in a loop.
// by Philippe Lhoste <PhiLho(a)GMX.net> http://Phi.Lho.free.fr
// File/Project history:
1.00.000 -- 2006/09/25 (PL) -- First release.
0.01.000 -- 2006/06/23 (PL) -- Creation from PCRE_DLL.ahk.
*/
/* Copyright notice: For details, see the following file:
http://Phi.Lho.free.fr/softwares/PhiLhoSoft/PhiLhoSoftLicence.txt
This program is distributed under the zlib/libpng license.
Copyright (c) 2006 Philippe Lhoste / PhiLhoSoft
*/
#hPCREModule = 0
; Provide full path or put it in the path (or the working dir).
#PCRE_DLL = PCRE3.dll
#RegExCompRE_ref = 0
;/* Options */
#PCRE_CASELESS := 0x00000001
#PCRE_MULTILINE := 0x00000002
#PCRE_DOTALL := 0x00000004
#PCRE_EXTENDED := 0x00000008
; Non-PCRE options
#PCRE_HIDENONSTDOPT := 0x00FFFFFF
#PCRE_GLOBAL := 0x01000000
;/* Request types for pcre_fullinfo() */
#PCRE_INFO_CAPTURECOUNT := 2
;/* Exec-time and get/set-time error codes */
#PCRE_ERROR_NOMATCH := (-1)
#PCRE_ERROR_NULL := (-2)
#PCRE_ERROR_BADOPTION := (-3)
#PCRE_ERROR_BADMAGIC := (-4)
#PCRE_ERROR_UNKNOWN_NODE := (-5)
#PCRE_ERROR_NOMEMORY := (-6)
#PCRE_ERROR_NOSUBSTRING := (-7)
#PCRE_ERROR_MATCHLIMIT := (-8)
#PCRE_ERROR_BADUTF8 := (-10)
#PCRE_ERROR_BADUTF8_OFFSET := (-11)
#PCRE_ERROR_PARTIAL := (-12)
#PCRE_ERROR_BADPARTIAL := (-13)
#PCRE_ERROR_INTERNAL := (-14)
#PCRE_ERROR_BADCOUNT := (-15)
#PCRE_ERROR_DFA_UITEM := (-16)
#PCRE_ERROR_DFA_UCOND := (-17)
#PCRE_ERROR_DFA_UMLIMIT := (-18)
#PCRE_ERROR_DFA_WSSIZE := (-19)
#PCRE_ERROR_DFA_RECURSE := (-20)
OnExit RegEx_CleanUp
; Skip internal code and continue to auto-exec section of including code
Goto PCRE=>ContinueAutoExec
/*
// Like InStr(), returns the position of the first occurrence of the regular expression
// _regEx in the string _stringToSearch.
// Returns 0 if not found, or found position starting at 1.
// If not found, ErrorLevel can be checked to see what is the problem.
// It can contain an error code from DllCall (followed by a pipe and the name of the called function)
// or an error code from PCRE followed by a pipe and the offset of the error in the regex.
//
// Unlike InStr(), there is no reverse search.
//
// This function sets global variables:
// A_RegExPos, A_RegExLength, A_RegExString, (global match)
// A_RegExPos1, A_RegExLength1, A_RegExString1, (capture 1, etc.)
// A_RegExNextPos, A_RegExCaptureCount, A_RegExError
Note on the above global variables:
Somehow, they follow the same logic than Loop FilePattern which also creates
lot of built-in variables to avoid using extra commands to get results.
And actually, the same logic is used in Perl REs...
If that's too much, we can skip the capture variables (numbered) and add
a function to fetch the capture #n.
*/
RegExMatch(_stringToSearch, _regEx, _options="", _startingPos=1)
{
local options
local errorCode, errorOffset, p_errorMsg, errorMsg
local hPCRE, captureCount
local offsetTableSize, compRegExp, resCode, pos, len
If (#hPCREModule = 0)
{
#hPCREModule := DllCall("LoadLibrary", "Str", #PCRE_DLL)
If (#hPCREModule = 0)
{
MsgBox 16, RegEx, You need the %#PCRE_DLL% in your path!
ExitApp
}
}
OutputDebug RegExMatch: %#hPCREModule% for %_regEx%
options := RegEx_ParseOptions(_options)
OutputDebug Options: %_options% -> %options%
;--- Compilation phase
If (#RegExCache_RE = _regEx)
; We just compiled it, skip this step
Goto RegExMatch_MatchStep
; Compile the RE
hPCRE := DllCall(#PCRE_DLL "\pcre_compile2"
, "Str", _regEx
, "Int", options
, "Int *", errorCode
, "UInt *", p_errorMsg
, "Int *", errorOffset
, "UInt", 0
, CDecl)
If (ErrorLevel != 0)
{
ErrorLevel = %ErrorLevel%|pcre_compile2
Return 0
}
OutputDebug Handle: %hPCRE%
if (hPCRE = 0)
{
ErrorLevel = %errorCode%|%errorOffset%
VarSetCapacity(errorMsg, 100)
DllCall("lstrcpy", "Str", errorMsg, "UInt", p_errorMsg)
A_RegExError = Error compiling pattern /%_regEx%/%_options%:`n(%errorCode%) %errorMsg%
Return 0
}
#RegExCache_CompRERef := hPCRE
#RegExCache_RE := _regEx
DllCall(#PCRE_DLL "\pcre_fullinfo"
, "UInt", hPCRE
, "UInt", 0
, "UInt", #PCRE_INFO_CAPTURECOUNT
, "UInt *", captureCount
, CDecl)
If (ErrorLevel != 0)
{
ErrorLevel = %ErrorLevel%|pcre_fullinfo
Return 0
}
; This is the number of capturing parenthesis!
; It can be different of the number of real captures when matching
; but it is used as maximum size of capture buffer
#RegExCache_captureCount := captureCount
OutputDebug Capture Count: %captureCount%
;--- Matching phase
RegExMatch_MatchStep:
offsetTableSize := 3 * (#RegExCache_captureCount + 1)
VarSetCapacity(#PCRECache_offsetTable, offsetTableSize * 4)
resCode := DllCall(#PCRE_DLL "\pcre_exec"
, "UInt", #RegExCache_CompRERef
, "UInt", 0
, "Str", _stringToSearch
, "Int", StrLen(_stringToSearch)
, "Int", _startingPos - 1
, "Int", 0 ; Can be ANCHORED, NOTBOL, NOTEOL, NOTEMPTY, PARTIAL
, "UInt", PCRECache_offsetTable
, "Int", offsetTableSize
, CDecl)
If (ErrorLevel != 0)
{
ErrorLevel = %ErrorLevel%|pcre_exec
Return 0
}
If (resCode < 0)
{
ErrorLevel = %resCode%|pcre_exec
VarSetCapacity(errorMsg, 100)
DllCall("lstrcpy", "Str", errorMsg, "UInt", p_errorMsg)
A_RegExError = Error matching pattern /%_regEx%/%_options%: %resCode%
Return 0
}
OutputDebug Exec: %resCode%
resCode-- ; It counts the global capture (whole match)
A_RegExCaptureCount := resCode
; Whole match
pos := RegEx_GetOffset(#PCRECache_offsetTable, 0)
; Given positions start at 1
A_RegExPos := pos + 1
A_RegExLength := RegEx_GetOffset(#PCRECache_offsetTable, 1) - pos
StringMid A_RegExString, _stringToSearch, A_RegExPos, A_RegExLength
A_RegExNextPos := A_RegExPos + A_RegExLength
; Captures
Loop %resCode%
{
pos := RegEx_GetOffset(#PCRECache_offsetTable, A_Index * 2)
A_RegExPos%A_Index% := pos + 1
len := RegEx_GetOffset(#PCRECache_offsetTable, A_Index * 2 + 1) - pos
A_RegExLength%A_Index% := len
pos++
StringMid A_RegExString%A_Index%, _stringToSearch, pos, len
}
Return A_RegExPos
}
;===== Private section =====
RegEx_ParseOptions(_options)
{
local options
options := 0
Loop Parse, _options
{
If (A_LoopField = "i")
options := options | #PCRE_CASELESS
Else If (A_LoopField = "m")
options := options | #PCRE_MULTILINE
Else If (A_LoopField = "d")
options := options | #PCRE_DOTALL
Else If (A_LoopField = "x")
options := options | #PCRE_EXTENDED
Else If (A_LoopField = "g")
options := options | #PCRE_GLOBAL
}
Return options
}
RegEx_GetOffset(ByRef @offsetTable, _index)
{
local addr
addr := &@offsetTable + _index * 4
Return *addr + (*(addr + 1) << 8) + (*(addr + 2) << 16) + (*(addr + 3) << 24)
}
RegEx_CleanUp:
; Remove cached compiled RE
If (#RegExCache_CompRERef != 0)
{
DllCall(#PCRE_DLL "\pcre_free"
, "UInt", #RegExCache_CompRERef)
}
DllCall("FreeLibrary", "UInt", #hPCREModule)
#hPCREModule := 0
ExitApp
PCRE=>ContinueAutoExec:
#Include RegEx.ahk
Test:
variable = 9a89F87x21Beef This is a Test string containing today's Date : 14-03-2006 and Day : Tuesday, and more: 25-09-2006 or 06-10-1961 is good too.
pos := RegExMatch(variable, "([A-F\d]+)", "i")
Gosub GetResult
res = %result%
pos := RegExMatch(variable, "([A-F\d]+)", "i", A_RegExNextPos)
Gosub GetResult
res = %res%`n`n%result%
MsgBox %res%
nextPos := 1 ; Start at beginning of string
Loop
{
pos := RegExMatch(variable, "(\d+)-(\d+)-(\d+)", "", nextPos)
If (pos = 0)
Break
nextPos := A_RegExNextPos
Gosub GetResult
res = Main match:`n%result%`n`nSub-captures:
Loop %A_RegExCaptureCount%
{
pos := A_RegExPos%A_Index%
len := A_RegExLength%A_Index%
str := A_RegExString%A_Index%
res = %res%`n`n
( LTrim
A_RegExPos: %pos%
A_RegExLength: %len%
A_RegExString: %str%
)
}
MsgBox %res%
}
Return
GetResult:
result =
(
A_RegExPos: %A_RegExPos%
A_RegExLength: %A_RegExLength%
A_RegExString: %A_RegExString%
A_RegExNextPos: %A_RegExNextPos%
A_RegExCaptureCount: %A_RegExCaptureCount%
A_RegExError: %A_RegExError%
)
Return