RegExReplace backreferences cannot be manipulated

Share your ideas as to how the documentation can be improved.
User avatar
mikeyww
Posts: 27191
Joined: 09 Sep 2014, 18:38

RegExReplace backreferences cannot be manipulated

23 Jan 2023, 11:01

Documentation for RegExReplace should indicate, for the replacement parameter, that backreferences cannot be used directly in inner functions or as numbers in expressions.

viewtopic.php?f=76&t=113022
just me
Posts: 9528
Joined: 02 Oct 2013, 08:51
Location: Germany

Re: RegExReplace backreferences cannot be manipulated

23 Jan 2023, 11:06

Replacement:
Type: String
The string to be substituted for each match, which is plain text (not a regular expression).
?
User avatar
mikeyww
Posts: 27191
Joined: 09 Sep 2014, 18:38

Re: RegExReplace backreferences cannot be manipulated

23 Jan 2023, 11:12

"Plain text" is misleading & incomplete, as a variable can be used, and some expressions, but the backreference itself cannot be included in a function that may return plain text.

Code: Select all

#Requires AutoHotkey v2.0
str   := "abcd"
regex := "ab(c)d"
f     := "123"
MsgBox RegExReplace(str, regex, 'J' f "$1q9")
To me, this is not a description of plain text, though it does evaluate to plain text. A function can also evaluate to plain text.

We need a clearer way to describe why some of the following parameter values work, while others do not.

Code: Select all

#Requires AutoHotkey v2.0
str   := "abcd"
regex := "ab(c)d"
f     := "123$1q9"
MsgBox RegExReplace(str, regex, 'J' f "$1q9")           ; A backreference
MsgBox RegExReplace(str, regex, 'J' InStr(f, "$1q9"))   ; Not a backreference
MsgBox RegExReplace(str, regex, 'J' SubStr("$1", 1, 1)) ; Not a backreference
MsgBox RegExReplace(str, regex, 3 + "$1")               ; Error
Hence, the idea:
Backreferences cannot be used directly in inner functions or as numbers in expressions.
autocart
Posts: 214
Joined: 12 May 2014, 07:42

Re: RegExReplace backreferences cannot be manipulated

23 Jan 2023, 13:43

mikeyww wrote:
23 Jan 2023, 11:12
"Plain text" is misleading & incomplete, as a variable can be used, and some expressions, but the backreference itself cannot be included in a function that may return plain text.
[...]
To me, this is not a description of plain text, though it does evaluate to plain text. A function can also evaluate to plain text.
+1 :thumbup:
safetycar
Posts: 435
Joined: 12 Aug 2017, 04:27

Re: RegExReplace backreferences cannot be manipulated

24 Jan 2023, 12:19

I find it quite understandable as it is but maybe a different possibility:
The string that represents the substitutions of each match (before it is actually made).
User avatar
lmstearn
Posts: 698
Joined: 11 Aug 2016, 02:32
Contact:

Re: RegExReplace backreferences cannot be manipulated

24 Jan 2023, 22:22

What could be plainer than plain text, but a string literal?
Edit: link inserted
Last edited by lmstearn on 26 Jan 2023, 21:06, edited 1 time in total.
:arrow: itros "ylbbub eht tuO kaerB" a ni kcuts m'I pleH
autocart
Posts: 214
Joined: 12 May 2014, 07:42

Re: RegExReplace backreferences cannot be manipulated

26 Jan 2023, 09:22

lmstearn wrote:
24 Jan 2023, 22:22
What could be plainer than plain text, but a string literal?
Since the "Type" definition for the Replacement parameter already says: "String", the pharse "plain text" is simply confusing. The normal reader wonders if that means something special that goes beyond "String". The brackets seem to say what really is meant: "(not a regular expression)". But then, why not leave "plain text" and the brackets away and write only "The string to be substituted for each match, which is not a regular expression." On top of it, the following sentence contraticts the feeling that one gets when reading "plain text", because now the "plain" text string may after all contain strings with special meaning, namely backreferences. So, it is not just "plain text" after all. It is a string that may have special meaning but is not a regular expression. The phrase "plain text" is simply confusing.
autocart
Posts: 214
Joined: 12 May 2014, 07:42

Re: RegExReplace backreferences cannot be manipulated

26 Jan 2023, 09:23

safetycar wrote:
24 Jan 2023, 12:19
I find it quite understandable as it is but maybe a different possibility:
The string that represents the substitutions of each match (before it is actually made).
:thumbup: +1
JBensimon
Posts: 118
Joined: 19 Nov 2017, 11:19

Re: RegExReplace backreferences cannot be manipulated

19 Feb 2023, 14:03

The suggested documentation addition is based on a false expectation: a backreference $n only means something to a specific RegExReplace invocation within its completed replacement string, i.e. within the final calculated result of whatever expression constitutes its third parameter. It doesn't magically acquire meaning (a value) visible to operators and functions within the expression merely because that expression happens to be the third parameter of a RegExReplace invocation.

Documentation could get pretty dense if every possible misreading of an otherwise well-defined item had to be accounted for.

JB
User avatar
mikeyww
Posts: 27191
Joined: 09 Sep 2014, 18:38

Re: RegExReplace backreferences cannot be manipulated

19 Feb 2023, 19:39

Great. So put that explanation in the documentation, for the reader to understand it. I view the documentation as doing exactly that: helping the reader not only with a catalog of functions and parameters, but with tips, guidance, and clarifications about common misunderstandings.
JBensimon
Posts: 118
Joined: 19 Nov 2017, 11:19

Re: RegExReplace backreferences cannot be manipulated

20 Feb 2023, 08:24

I guess "common" is the key word here: it never occurred to me that $n or ${n} could possibly be thought to mean anything in an expression.

JB
User avatar
mikeyww
Posts: 27191
Joined: 09 Sep 2014, 18:38

Re: RegExReplace backreferences cannot be manipulated

20 Feb 2023, 12:04

That's good. You are more advanced than some others who are trying to figure it out. Some of the AHK users have no experience with regex or expressions.
JBensimon
Posts: 118
Joined: 19 Nov 2017, 11:19

Re: RegExReplace backreferences cannot be manipulated

20 Feb 2023, 12:13

I was agreeing with you! If it occurred to you of all people, then maybe it does qualify as a common misunderstanding.

How would you phrase this usage note?

JB
User avatar
mikeyww
Posts: 27191
Joined: 09 Sep 2014, 18:38

Re: RegExReplace backreferences cannot be manipulated

20 Feb 2023, 12:38

Thank you. My suggestion:

Backreferences cannot be used directly in inner functions or as numbers in expressions.

Perhaps someone else has a clearer way to put it, but this is how I think of it.

The beauty of AutoHotkey is that it is designed for the user. For example, why can one add a GUI control using "AddText" or "Add('Text')"? Well, I think it's just because it's convenient, so the program was designed to accommodate both approaches. I think it's conceivable that RegExReplace could be designed to handle the following, but it simply isn't. I'm not suggesting that it should be, but am suggesting that some users believe that such a thing could make sense.

Code: Select all

#Requires AutoHotkey v2.0
MsgBox RegExReplace("a1c", "a(\d)c", "$1" + 3)
This clarification about backreferences would be helpful especially due to how AHK handles strings and numbers.

Code: Select all

#Requires AutoHotkey v2.0
f := "3"
MsgBox f + 2
g := "b"
MsgBox g + 2
Thus, some users may be accustomed to adding a number to what appears to be a string-- call it "plain text", too, if you wish-- when the "string" is really a number.

Good discussion here. I conclude that some readers here would like a clarification, while others view it as unnecessary or redundant. I think that one more sentence won't hurt! Redundancy has its place, even in AutoHotkey!
JBensimon
Posts: 118
Joined: 19 Nov 2017, 11:19

Re: RegExReplace backreferences cannot be manipulated

20 Feb 2023, 14:29

mikeyww wrote:
20 Feb 2023, 12:38
Backreferences cannot be used directly in inner functions or as numbers in expressions.
Hmm, that may be a little too brief, and the meaning of "inner functions" won't be obvious. Maybe something like this, added to the description of the Replacement parameter:

When the Replacement parameter is an expression, backreferences like $1 and ${12} are only meaningful in the result of the expression passed to RegExReplace() -- they cannot be used as arguments and parameters to the operators and functions that make up the expression.

JB
User avatar
mikeyww
Posts: 27191
Joined: 09 Sep 2014, 18:38

Re: RegExReplace backreferences cannot be manipulated

20 Feb 2023, 14:51

I appreciate the revision. I have no objections. Using your text, another option is below.

Backreferences cannot be used as operands or function parameters.
Saiapatsu
Posts: 17
Joined: 11 Jul 2019, 15:02

Re: RegExReplace backreferences cannot be manipulated

25 Feb 2023, 06:46

The documentation is already 100% perfectly clear what Replacement is: it is a string.
You seem to think that it RegExReplace takes some special and magical argument, but no, it is nothing more than a string.
The catch here is that RegExReplace also interprets Replacement and replaces any $number, ${number}, ${capturename} etc. in Replacement before replacing the found Needle in Haystack with the munged Replacement.
mikeyww wrote:
23 Jan 2023, 11:12

Code: Select all

#Requires AutoHotkey v2.0
str   := "abcd"
regex := "ab(c)d"
f     := "123$1q9"
MsgBox RegExReplace(str, regex, 'J' f "$1q9")           ; A backreference
MsgBox RegExReplace(str, regex, 'J' InStr(f, "$1q9"))   ; Not a backreference
MsgBox RegExReplace(str, regex, 'J' SubStr("$1", 1, 1)) ; Not a backreference
MsgBox RegExReplace(str, regex, 3 + "$1")               ; Error
Consider this:

Code: Select all

str   := "abcd"
regex := "ab(c)d"
f     := "123$1q9"
MsgBox 'J' f "$1q9"           ; A backreference
MsgBox 'J' InStr(f, "$1q9")   ; Not a backreference
MsgBox 'J' SubStr("$1", 1, 1) ; Not a backreference
MsgBox 3 + "$1"               ; Error
The first MsgBox will show J123$1q9$1q9
The second MsgBox will show J4
The third MsgBox will show J$
The fourth MsgBox will throw an error: Expected a Number but got a String

Therefore, the code at the top behaves identically to the following:

Code: Select all

MsgBox RegExReplace("abcd", "ab(c)d", "J123$1q9$1q9")
MsgBox RegExReplace("abcd", "ab(c)d", "J4")
MsgBox RegExReplace("abcd", "ab(c)d", "J$")
MsgBox RegExReplace("abcd", "ab(c)d", 3 + "$1")
Notice how there are no "function calls" "inside" Replacement or any "variables". That's because there is no magical behavior in Replacement. Replacement will be the result of whatever expression is in there, exactly as with every single other function call in AHK 2.0.

In the first RegExReplace, every $1 in Replacement was replaced with "c" because the first capture captured the "c". Then, the entire input string (because the Needle matches all of it) is replaced with the processed Replacement string, which is now "J123cq9cq9".

In the second RegExReplace, the entire string is replaced with "J4". That's because InStr(f, "$1q9") evaluates to the number 4 and "J" concatenated with 4 is "J4". There is no backreference there because there is no backreference there. There is no $ followed by a number or a { here, so nothing is changed in Replacement.

The same happens in the third RegExReplace because the $ at the end of the string by itself is not a backreference, either.

The fourth one throws an error not because of any RegExReplace behavior, but because there is no possible way AutoHotkey could determine what you meant by 3 + "$1". What do you think this should result in? It's adding apples and oranges. In JavaScript, perhaps you could get a fruit salad because it specifies that adding two things together with + results in the concatenation of the things after converting them both to strings. But in AutoHotkey, + adds together two numbers and "$1" cannot be cleanly interpreted as a number, so it tells you that it expected a number on the other side of that plus sign.

Now look at this:

Code: Select all

str   := "abcd"
regex := "(a)(b)(c)(d)"
f     := "123$1q9"
MsgBox RegExReplace(str, regex, 'J$' InStr(f, "$1q9"))
This will output "Jd". This is because Replacement is "J$4" because (d) is the fourth capture and InStr(f, "$1q9") evaluates to 4.

I do agree that the documentation for Replacement should assume less about what the reader knows and introduce the backreferences a bit more cleanly and gently.

All in all, you should have said what you actually expected each of these lines of code to do.
User avatar
mikeyww
Posts: 27191
Joined: 09 Sep 2014, 18:38

Re: RegExReplace backreferences cannot be manipulated

25 Feb 2023, 08:25

In my view, the following sum is 26, and so this is the meaning of adding the orange to the apple-- in some situations.

Code: Select all

#Requires AutoHotkey v2.0
num := RegExReplace("number23", ".*?(\d+)", "$1")
MsgBox 3 + num
MsgBox 3 + "23"
num := RegExReplace("number23", ".*?(\d+)", 3 + "$1")
This would replace the entire match with three plus the backreference's value.

Many users are not knowledgeable about how AHK adds numbers, or how AHK differs from JavaScript. They may not have a deep understanding of how functions actually work.

At least for me, a regex tutorial is not what I am suggesting or seeking here. My observation is merely that the issue confuses some readers and users of AHK. If you agree that the documentation should say more, then you can recommend an alternative to my suggested text, unless you agree with it.
User avatar
lmstearn
Posts: 698
Joined: 11 Aug 2016, 02:32
Contact:

Re: RegExReplace backreferences cannot be manipulated

26 Feb 2023, 06:51

Looks like the docs are confining the usage of plus and minus operators to (at least) one pair of variables, and of course, "raw" numbers. The effects of adding/subtracting strings to variables/numbers, or strings to strings would want a mention or link-on in there at some stage.
:arrow: itros "ylbbub eht tuO kaerB" a ni kcuts m'I pleH
User avatar
mikeyww
Posts: 27191
Joined: 09 Sep 2014, 18:38

Re: RegExReplace backreferences cannot be manipulated

26 Feb 2023, 07:24

Already noted.

Backreferences cannot be used as operands or function parameters.

Return to “Suggestions on Documentation Improvements”

Who is online

Users browsing this forum: No registered users and 5 guests