Jump to content

Sky Slate Blueberry Blackcurrant Watermelon Strawberry Orange Banana Apple Emerald Chocolate
Photo

RegExMatch error with 3 optional (words)?


  • Please log in to reply
5 replies to this topic
Crash&Burn
  • Members
  • 228 posts
  • Last active: Jul 16 2014 10:10 PM
  • Joined: 02 Aug 2009
Given the following regex:

RegExMatch( iTmp1, "([-]?\d+)?\.?(\d+)?(#)?", iTmp )

When the string is: -1 or -1.1 or -1.1# or -1#
Then all the vars are set correctly,
String	iTmp1	iTmp2	iTmp3
1          1
1.2        1       2
1.2#       1       2       #
1#         1               #

String	iTmp1	iTmp2	iTmp3
-1        -1
-1.2      -1       2
-1.2#     -1       2       #
-1#       -1               #
Yet, if the String is #, then iTmp3 does not get assigned. Instead I have to manually do this:

iTmp3 := ( iTmp == "#" ) ? iTmp : iTmp3



sinkfaze
  • Moderators
  • 6367 posts
  • Last active:
  • Joined: 18 Mar 2008
I cannot duplicate this problem, iTmp3 assigns fine here:

str=#
RegExMatch(str,"(-?\d+)?\.?(\d+)?(#)?",iTmp)
MsgBox %	"iTmp1: " iTmp1 "`niTmp2: " iTmp2 "`niTmp3: " iTmp3


Lexikos
  • Administrators
  • 9844 posts
  • AutoHotkey Foundation
  • Last active:
  • Joined: 17 Oct 2006
I see no bug/error.
RegExMatch( "#", "([-]?\d+)?\.?(\d+)?(#)?", iTmp )
MsgBox % iTmp3
Shows "#" for me on v1.0.48.05 and v1.0.48.05.L51.

Crash&Burn
  • Members
  • 228 posts
  • Last active: Jul 16 2014 10:10 PM
  • Joined: 02 Aug 2009
You've all changed the test case from RegExMatch(iTmp1,REGEX,iTmp).

The iTmp3 of Regex will match if I do this prior:

iStr := iTmp1
RegExMatch(iStr,"([-]?\d+)?\.?(\d+)?(#)?",iTmp).

So perhaps its an error somehow of using technically the same variable for the first match? I.E. iTmp1 and iTmp will use iTmp1 for the first match.
Yet somehow when all is said and done if the only "word" that exists in that Regex is the third one: #
Then iTmp3 is not assigned, instead it's iTmp only.

sinkfaze
  • Moderators
  • 6367 posts
  • Last active:
  • Joined: 18 Mar 2008

You've all changed the test case from RegExMatch(iTmp1,REGEX,iTmp).


Why should we have assumed that you would search an array element (or variable) before it appears to exist? You'll have to explain in better detail what variables exist ahead of time and what values/strings they contain before we can better answer this question.

Lexikos
  • Administrators
  • 9844 posts
  • AutoHotkey Foundation
  • Last active:
  • Joined: 17 Oct 2006
Here's the issue:
if (subpat_not_matched)
					array_item->Assign(); // Omit all parameters to make the var empty without freeing its memory (for performance, in case this RegEx is being used many times in a loop).
				else
				{
					[color=red]if (p < pattern_count-1 // i.e. there's at least one more subpattern after this one (if there weren't, making a copy of haystack wouldn't be necessary because overlap can't harm this final assignment).
						&& haystack == array_item->Contents(FALSE)) // For more comments, see similar section higher above.
						if (mem_to_free = _tcsdup(haystack))
							haystack = mem_to_free;[/color]
					array_item->Assign(haystack + UTF8PosToTPos(utf8Haystack, this_offset[0])
						, UTF8LenToTLen(utf8Haystack, this_offset[0], this_offset[1] - this_offset[0]));
				}
This and another block of code like it are based on the code below. The comment in green explains why the code in red is necessary. Note that for the output var itself (below), the var is emptied only if the entire match failed, in which case haystack won't be used. On the other hand, if an array item (above) fails to match, haystack may still be needed if a later array item matches. If the array item which failed to match is also haystack, haystack is deleted prematurely. The fix is to move the code in red to a point above the if..else.
if (captured_pattern_count < 0) // Failed or no match.
			output_var.Assign(); // Make the full-pattern substring blank as a further indicator, and for convenience consistency in the script.
		else // Greater than 0 (it can't be equal to zero because offset[] was definitely large enough).
		{
			[color=green]// Fix for v1.0.45.07: The following check allow haystack to be the same script-variable as the
			// output-var/array.  Unless a copy of haystack is made, any subpatterns to be populated after the
			// entire-pattern output-var below would be corrupted.  In other words, anything that refers to the
			// contents of haystack after the output-var has been assigned would otherwise refer to the wrong
			// string.[/color]  Note that the following isn't done for the get_positions_not_substrings mode higher above
			// because that mode never refers to haystack when populating its subpatterns.
			if (pattern_count > 1 && haystack == output_var.Contents(FALSE)) // i.e. there are subpatterns to be output afterward, and haystack is the same variable as the output-var that's about to be overwritten below.
				if (mem_to_free = _tcsdup(haystack)) // _strdup() is very tiny and basically just calls strlen+malloc+strcpy.
					haystack = mem_to_free;
				//else due to the extreme rarity of running out of memory AND SIMULTANEOUSLY having output-var match
				// haystack, continue on so that at least partial success is achieved (the only thing that will
				// be wrong in this case is the subpatterns, if any).
			output_var.Assign(haystack + UTF8PosToTPos(utf8Haystack, offset[0])
				, UTF8LenToTLen(utf8Haystack, offset[0], offset[1] - offset[0])); // It shouldn't be possible for the full-pattern match's offset to be -1, since if where here, a match on the full pattern was always found.
		}
One minor point: iTmp3 is assigned, but the source of the assignment has been either corrupted or freed.
iTmp3 := "foobar"
iTmp1 := "#"
RegExMatch( iTmp1, "([-]?\d+)?\.?(\d+)?(#)?", iTmp )
MsgBox %iTmp3% Do you see this text? I don't.