How to remove duplicates from RegExMatch?

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
ddougal
Posts: 52
Joined: 09 Nov 2015, 14:26

How to remove duplicates from RegExMatch?

23 Mar 2016, 08:44

I am trying to pull IP addresses from putty log file which is working but I am getting duplicate IP addresses in the match. How can I remove the duplicates?

Code: Select all

;FileRead, test, putty.log
;some test data:
test := "ten02.cinco.tx#Show ip route 36.183.356.166 Routing entry for 52.832.563.161/29  Known via rip, distance 120, metric 1 36.183.356.166"

pos = 1
While pos := RegExMatch(test, "\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}", match, pos+StrLen(match))
    data := data . "`n " . match

MsgBox,% data
;produced the following output:  36.183.356.166    52.832.563.161   36.183.356.166   
DoOrDontSort

Re: How to remove duplicates from RegExMatch?

23 Mar 2016, 08:54

If the order doesn't matter you can use the Sort command with the Unique option.
ddougal
Posts: 52
Joined: 09 Nov 2015, 14:26

Re: How to remove duplicates from RegExMatch?

23 Mar 2016, 09:31

Sorting worked using Sort data,U If the data is in an array how do I remove the duplicates from an array? Sort doesn't seem to work with an array.

Code: Select all

MyArray := []
pos = 1
While pos := RegExMatch(test, "\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}", match, pos+StrLen(match)) 
{
	data := data . "`n " . match
	MyArray[A_index-1] := match
}

Sort MyArray,U
DoOrDontSort

Re: How to remove duplicates from RegExMatch?

23 Mar 2016, 09:44

You can Sort arrays unless you write a function. There are several methods, the second one below is probably the easiest (fastest). If you have a large list as it doesn't have to loop the array each time it checks an IP before adding it. StrSplit returns an Array

Code: Select all

MyArray := []
pos = 1
While pos := RegExMatch(test, "\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}", match, pos+StrLen(match)) 
{
	data .= "`n" match
	new:=true
	for k, v in MyArray
		{
		 if (v = match)
		 	new:=false ; it is already in array so don't add - see below
		}
  if new
		MyArray.Push(match)
}

probably easier

Code: Select all

pos = 1
While pos := RegExMatch(test, "\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}", match, pos+StrLen(match))
    data .= "`n " match
    
Sort data,U
MyArray:=StrSplit(data,"`n")
data:="" 
User avatar
sinkfaze
Posts: 616
Joined: 01 Oct 2013, 08:01

Re: How to remove duplicates from RegExMatch?

23 Mar 2016, 13:56

Code: Select all

pos = 1
While	pos := RegExMatch(test, "[\d\.]{12,}", match, pos+StrLen(match))
	if	!(data~=match)
		data .=	match "`n"
MsgBox %	data
lexikos
Posts: 9592
Joined: 30 Sep 2013, 04:07
Contact:

Re: How to remove duplicates from RegExMatch?

23 Mar 2016, 22:41

sinkfaze's code fails for multiple reasons (though not with the specific test input given in the OP):
  • [\d\.]{12,} will not match all valid IP addresses. For example, 8.8.8.8 is valid (it's a Google DNS server).
  • data~=match treats dots in the IP address as a wildcard for a single character, so will mismatch IP addresses in the list and drop some that it shouldn't. (On second thought, if it always deals with valid IP addresses, this might only happen because of the issue below.)
  • data~=match will drop IP addresses which are a substring of an IP address in the list.
I haven't tested it very thoroughly, but I suppose this will work:

Code: Select all

test := "bla 8.8.8.8 bla 12.34.43.21 bla 1.234.43.21 bla 1.234.43.210 bla 8.8.8.8 bla"

pos = 1
data := "`n"
While	pos := RegExMatch(test, "\d{1,3}(?:\.\d{1,3}){3}", match, pos+StrLen(match))
	if	!InStr(data, "`n" match "`n")
		data .=	match "`n"
MsgBox %	SubStr(data, 2)
User avatar
Masonjar13
Posts: 1555
Joined: 20 Jul 2014, 10:16
Location: Не Россия
Contact:

Re: How to remove duplicates from RegExMatch?

24 Mar 2016, 00:34

Borrowing heavily from lexikos' code, I made a variant that will only match completely valid ip's (0-255). However, try as I may, I can't shorten it like he did. Every time I try it, one way or another, it breaks. But, I'm not all that good at RegEx, so maybe someone else can.

Code: Select all

test:="bla 8.8.8.8 bla 12.34.43.21 bla 1.234.43.21 bla 1.234.43.210 bla 8.8.8.8 bla"

pos:=1
while(pos:=regExMatch(test,"\b([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\.([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\.([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\.([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\b",match,pos + strLen(match)))
    if(!inStr(str,match))
        str.=match "`n"
msgbox % str
OS: Windows 10 Pro | Editor: Notepad++
My Personal Function Library | Old Build - New Build
lexikos
Posts: 9592
Joined: 30 Sep 2013, 04:07
Contact:

Re: How to remove duplicates from RegExMatch?

24 Mar 2016, 01:24

Masonjar13's variant fails with the input "11.1.1.1 -- 1.1.1.1". This was the reason for the extra "`n"s in my script...

Btw, 0x7f000001 is a valid IP v4 address (accepted by ping etc.). There's more to IP addresses (v4 and v6) than just four groups of decimal numbers between 1 and 255.

Also, 010.0.0.1 is the same as 8.0.0.1. I'd just use a pattern that matches only decimal addresses with no leading zero.

I suppose that \b([12]?\d\d?)(?:\.(?1)){3}\b is adequate. I think you will probably find countless regexes and discussions of validating IP addresses if you search outside of this forum.
User avatar
Masonjar13
Posts: 1555
Joined: 20 Jul 2014, 10:16
Location: Не Россия
Contact:

Re: How to remove duplicates from RegExMatch?

24 Mar 2016, 02:12

lexikos wrote:Masonjar13's variant fails with the input "11.1.1.1 -- 1.1.1.1". This was the reason for the extra "`n"s in my script...
Ah, I was wondering why; I hadn't considered that. And while I know that's not the only form of IP address, nor the only version of, I was just keeping it to his question.
lexikos wrote:I think you will probably find countless regexes and discussions of validating IP addresses if you search outside of this forum.
Sure did. It's quite the topic, apparently.
lexikos wrote:I suppose that \b([12]?\d\d?)(?:\.(?1)){3}\b is adequate.
Using that, I made a new test script, using this IP database. It correctly retrieves all of them without duplicates.

Code: Select all

pos:=1,str:="`n",cnt:=0

fileRead,test,us.csv
oTest:="bla 8.8.8.8 bla 12.34.43.21 bla 1.234.43.21 bla 1.234.43.210 bla 8.8.8.8 bla 11.1.1.1 -- 1.1.1.1`n"
test.=oTest test
while(pos:=regExMatch(test,"\b([12]?\d\d?)(?:\.(?1)){3}\b",match,pos + strLen(match)))
    if(!inStr(str,"`n" match "`n"))
        str.=match "`n",cnt++
gui,add,edit,r30 w700,% lTrim(str,"`n") "`nIteration count: " cnt
gui,show
return

guiClose:
exitApp
The list could use more variety for better testing, but it was the first one I found. :roll:
OS: Windows 10 Pro | Editor: Notepad++
My Personal Function Library | Old Build - New Build

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: Ineedhelplz and 304 guests