Jump to content

Sky Slate Blueberry Blackcurrant Watermelon Strawberry Orange Banana Apple Emerald Chocolate
Photo

Regular expressions: a wrapper around the PCRE DLL


  • Please log in to reply
20 replies to this topic
not-logged-in-daonlyfreez
  • Guests
  • Last active:
  • Joined: --
Why does this not give the expected results? It seems the PCRE_GetMatchedCaptureNumber function returns the wrong amount of found occurances?

#Include PCRE_DLL.ahk

PCRE_Init()

testInput =
(
Hello I have a Lot of Words starting Capitalised.

Now I'm trying to find All of them. Will I succeed?

There should be 8 of them...
)

testRE = ([A-Z][a-z])

; Gui
Gui, Add, GroupBox, x6 y15 w460 h120 , Input
Gui, Add, Edit, x16 y35 w440 h90 vInputTxt, %testInput%

Gui, Add, GroupBox, x6 y145 w460 h120 , Regular expression
Gui, Add, Edit, x16 y165 w440 h20 Default vSearchRE, %testRE%
Gui, Add, Edit, x16 y195 w440 h20 vReplaceRE, 
Gui, Add, Button, x16 y225 w90 h30 gSearch, Search
Gui, Add, Button, x116 y225 w90 h30 gReplace, Replace

Gui, Add, GroupBox, x6 y275 w460 h120 , Output
Gui, Add, Edit, x16 y295 w440 h90 vOutputTxt, 
; Generated using SmartGUI Creator 4.0
Gui, Show, x238 y168 h407 w477, New GUI Window
Return


Search:
GuiControlGet, InputTxt, , InputTxt
If InputTxt =
  GuiControl, , InputTxt, %testInput%
  
GuiControlGet, SearchRE, , SearchRE
If SearchRE =
  GuiControl, , SearchRE, %testRE%

; Register Regular expression
hRE := PCRE_RegisterRegExp(SearchRE, #PCRE_MULTILINE) 
if (hRE = 0) 
   PCRE_ShowLastError()

hMatch := PCRE_Match(hRE, InputTxt)
n := PCRE_GetMatchedCaptureNumber(hMatch)

i = 0
toShow =

Loop
{
  s%i% := PCRE_GetMatchStr(hMatch, 0)
  If ErrorLevel = -1
    Break
  PCRE_GetMatchVals(hMatch, i, pos%i%, len%i%)
  toShow := toShow "`n" s%i% " - " pos%i% " - " len%i%
  
  PCRE_MatchNext(hRE, hMatch)
  i++
}

GuiControl, , OutputTxt, % "Found according to PCRE_GetMatchedCaptureNumber: " n "`nFound by looping: " i toShow
Return

Replace:
Return

GuiEscape:
GuiClose:
PCRE_End()
ExitApp


PhiLho
  • Moderators
  • 6850 posts
  • Last active: Jan 02 2012 10:09 PM
  • Joined: 27 Dec 2005
PCRE_Match is not a "MatchAll", PCRE_GetMatchedCaptureNumber returns the number of sub-captures, ie. the number of parentheses that have captured something in the given expression plus the global capture (the whole expression), that's 2 in your example, and the captures should be identical.
Loop
{
	i := A_Index
	s%i% := PCRE_GetMatchStr(hMatch, 0) ; The two letters
	If ErrorLevel = -1
		Break
	PCRE_GetMatchVals(hMatch, 0, pos%i%, len%i%)
	toShow := toShow "`n" s%i% " - " pos%i% " - " len%i%

	PCRE_MatchNext(hRE, hMatch)
}

GuiControl, , OutputTxt, % "Found by looping: " .  (i - 1) . toShow

Posted Image vPhiLho := RegExReplace("Philippe Lhoste", "^(\w{3})\w*\s+\b(\w{3})\w*$", "$1$2")

not-logged-in-daonlyfreez
  • Guests
  • Last active:
  • Joined: --
Ok, yet it still does not show all found instances, for it seems the function stops as soon as I loop thru a higher number than PCRE_GetMatchedCaptureNumber's length...

from PCRE_DLL.ahk:
// Return the number of sub-captures in the given match.
*/
PCRE_GetMatchedCaptureNumber(_matchRef)
{
	Return #PCREMatch#%_matchRef%_captureCount
}

/*
// Update @pos and @len with the starting position and the length
// of the _num th capture (up to PCRE_GetMatchedCaptureNumber)
// for the given match.
*/
PCRE_GetMatchVals(_matchRef, _num, ByRef @pos, ByRef @len)
{
	local pos

	If (_num > #PCREMatch#%_matchRef%_captureCount - 1)
	{
		@pos := 0
		@len := 0
		Return 0
	}

	pos := PCRE_GetOffset(#PCREMatch#%_matchRef%_offsetTable, _num * 2)
	@len := PCRE_GetOffset(#PCREMatch#%_matchRef%_offsetTable, _num * 2 + 1) - pos
	@pos := pos + 1
}

up to PCRE_GetMatchedCaptureNumber


In this case I have 8 occurances, yet it stops after the 2nd, for that is the number of sub-captures, not the number of matches, as you described.

Posted Image

What am I doing wrong? How do I get all positions, lengths and contents of found strings?

PhiLho
  • Moderators
  • 6850 posts
  • Last active: Jan 02 2012 10:09 PM
  • Joined: 27 Dec 2005
Uh, did you read my previous message? And tried my snippet? (replacing your own loop)
It seems you are confused about this number of captures.
Again, only PCRE_Replace has a concept of global matches.
PCRE_Match will match only one occurence, and you need to loop on PCRE_MatchNext to get the next ones.
PCRE_GetMatchedCaptureNumber only reports about the current match (occurence).
Same for PCRE_GetMatchVals and PCRE_GetMatchStr.
Posted Image vPhiLho := RegExReplace("Philippe Lhoste", "^(\w{3})\w*\s+\b(\w{3})\w*$", "$1$2")

not-logged-in-daonlyfreez
  • Guests
  • Last active:
  • Joined: --
:oops: Sorry, I missed the i changed to 0, I thought you only changed the index start from 0 to 1.

Thanks :)

PhiLho
  • Moderators
  • 6850 posts
  • Last active: Jan 02 2012 10:09 PM
  • Joined: 27 Dec 2005
OK, you are welcome, I like that users test my code... :-)
Interesting GUI, BTW.
Posted Image vPhiLho := RegExReplace("Philippe Lhoste", "^(\w{3})\w*\s+\b(\w{3})\w*$", "$1$2")