Jump to content

Sky Slate Blueberry Blackcurrant Watermelon Strawberry Orange Banana Apple Emerald Chocolate
Photo

(Fixed) AutoHotkey_L regexp bug in the last version


  • Please log in to reply
27 replies to this topic
rousni
  • Members
  • 133 posts
  • Last active: Jul 17 2018 01:36 PM
  • Joined: 23 Mar 2006
str=abc

MsgBox % RegExReplace(str, "bc", "b_")



str=abc

MsgBox % RegExReplace(str, "(b)c", "$1_")



; as expected, results are the same





str=àbc

MsgBox % RegExReplace(str, "bc", "b_")



str=àbc

MsgBox % RegExReplace(str, "(b)c", "$1_")



; results are NOT the same



Lexikos
  • Administrators
  • 9844 posts
  • AutoHotkey Foundation
  • Last active:
  • Joined: 17 Oct 2006
Please save your script as UTF-8, not ANSI. MsgBox %str% shows why RegExReplace is not giving the result you expect.

(Moved topic from Bug Reports to Ask for Help.)

Edit 2010-12-08: Moved back.

rousni
  • Members
  • 133 posts
  • Last active: Jul 17 2018 01:36 PM
  • Joined: 23 Mar 2006
 MY
 SCRIPT
 IS
 SAVED
 AS
 UTF-8

and when

str=àbc

MsgBox %str% shows àbc

HotKeyIt
  • Moderators
  • 7439 posts
  • Last active: Jun 22 2016 09:14 PM
  • Joined: 18 Jun 2008
Works for me as long as file is saved in Unicode or UTF-8 format :!:

rousni
  • Members
  • 133 posts
  • Last active: Jul 17 2018 01:36 PM
  • Joined: 23 Mar 2006

Works for me as soon as I save the file in Unicode or UTF-8 format


How interesting! I am using the 32-bit Unicode version of AutoHotkey_L on freshly installed Windows 7 and the result of

MsgBox % RegExReplace("àbc", "(b)c", "$1_")

is àbc_

instead of àb_

HotKeyIt
  • Moderators
  • 7439 posts
  • Last active: Jun 22 2016 09:14 PM
  • Joined: 18 Jun 2008
I'm on XpSp3, is your OS 32-bit, though I have no idea if that matters.

rousni
  • Members
  • 133 posts
  • Last active: Jul 17 2018 01:36 PM
  • Joined: 23 Mar 2006

I'm on XpSp3, is your OS 32-bit, though I have no idea if that matters.


I think that doesn't matter. What matters is the result of:

MsgBox % RegExReplace("àbc", "(b)c", "$1_")

when using AutoHotkey_L, Unicode, 32-bit version.
For me the result is incorrect: àbc_ instead of àb_
Could someone please confirm or invalidate that bug.

HotKeyIt
  • Moderators
  • 7439 posts
  • Last active: Jun 22 2016 09:14 PM
  • Joined: 18 Jun 2008
Make sure your file encoding is UTF-8 or Unicode :!:

rousni
  • Members
  • 133 posts
  • Last active: Jul 17 2018 01:36 PM
  • Joined: 23 Mar 2006
Yes, the encoding used is Unicode.
(By the way, with a script in ANSI i get �bbc)

Guest++
  • Guests
  • Last active:
  • Joined: --
It produces the wrong output for me too.
Win7 x64
AHK_LW
Script saved as utf-8

sbc
  • Members
  • 321 posts
  • Last active: Jun 07 2011 10:24 AM
  • Joined: 25 Aug 2009
It happens to me too.
Platform: Windows 7 64bit
AHK Version : 1.0.90.00 Unicode 64bit

Also, regex callouts provide wrong matches.
String	=
([color=#107095]Ltrim[/color]
	<img src=[color=#666666]"/images/sample01.gif"[/color] />  This is a test.
	<img src=[color=#666666]"/images/sample02.gif"[/color] />  This is a test.
	<img src=[color=#666666]"/images/sample03.gif"[/color] />  This is a test.
	<img src=[color=#666666]"/images/sample04.gif"[/color] />  This is a test.
	<img src=[color=#666666]"/images/sample05.gif"[/color] />  This is a test.
)

needle = i)\Qsrc=\E"(.+?)"(?CRegexDebug)		
pos := 1
[color=#107095]While[/color] pos := [color=#107095]RegexMatch[/color](String, needle, match, pos+[color=#107095]strlen[/color](match))
	[color=#107095]msgbox[/color],, [color=#107095]RegexMatch[/color](), %match% `n%match1%

RegexDebug(match) {
	[color=#107095]MsgBox[/color],, RegexCallOut, %match% `n%match1%
    [color=#107095]return[/color] 
}
Posted ImageThis does not happen in the previous version, R61.

rousni
  • Members
  • 133 posts
  • Last active: Jul 17 2018 01:36 PM
  • Joined: 23 Mar 2006
This is now a confirmed bug - please fix it.
<!-- m -->http://www.autohotke...ic.php?p=405149<!-- m -->
Thank you in advance!

Crash&Burn
  • Members
  • 228 posts
  • Last active: Jul 16 2014 10:10 PM
  • Joined: 02 Aug 2009
All 3 files contain the following:
MsgBox % RegExReplace("èbc", "(b)c", "$1_")
MsgBox % RegExReplace("ebc", "(b)c", "$1x")
MsgBox % RegExReplace("ebc", "(b)c", "$1_")
MsgBox % RegExReplace(("ÿbc", "(b)c", "$1_")
AutoHotkey, Ansi file:

èb_
ebx
ebc
ÿb_

AHK_L(60), Ansi file:

b_
ebx
ebc
b_

AHK_L(60), UTF8 file:

èb_
ebx
ebc
ÿb_

AHK_L(60), Unicode file:

èb_
ebx
ebc
ÿb_


The issue is only with AHK_L and the ANSI file. And the issue is, Ansi includes up to 0xFF in it's character set, it shouldn't need to be a Unicode or UTF-8 file if the characters are ANSI no?


NOTE: I tested SBC's code, it works fine as Ansi or Unicode. He has some other issue...

sbc
  • Members
  • 321 posts
  • Last active: Jun 07 2011 10:24 AM
  • Joined: 25 Aug 2009

NOTE: I tested SBC's code, it works fine as Ansi or Unicode. He has some other issue...

Provide your AHK version and platform.

Crash&Burn
  • Members
  • 228 posts
  • Last active: Jul 16 2014 10:10 PM
  • Joined: 02 Aug 2009
Win2K, most recent AHK from last year. And AHK_L rv 60, as indicated above.

I certainly don't see any Asian language characters in the output. It cleanly displays the regex as expected.