REGEX riddle Topic is solved

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
Olegreddo

REGEX riddle

02 Feb 2017, 09:48

Hello, everybody who is way smarter than me. Is it possible to create a query which does the following:

Takes in a string of words (small one)

Deletes every word which does not conform to following:

1. Does not start with capital letter E
2. Contains anything other than letters and integers
3. Is shorter or longer than 8 characters

I know it sounds crazy, but I need it for work.

Thanks guys.
User avatar
AlphaBravo
Posts: 586
Joined: 29 Sep 2013, 22:59

Re: REGEX riddle

02 Feb 2017, 10:12

Olegreddo wrote:3. Is shorter or longer than 8 characters
Condition #3 is Crazy!!
User avatar
Capn Odin
Posts: 1352
Joined: 23 Feb 2016, 19:45
Location: Denmark
Contact:

Re: REGEX riddle

02 Feb 2017, 10:31

Code: Select all

f := FileOpen("wordlist.txt", "r")
MsgBox, % RegExReplace(f.Read(), "((\b[^E`n]*?\b)|(\bE[A-Za-z0-9]*?\b(?CCallout)))")
Callout(Match) {
	return StrLen(Match) = 8
}

Code: Select all

Eagers
Eagerest
Ealdorman
eanlings
earaches
eardrops
Eardrums
earflaps
earful
earliest
Ear!obes
earlocks
earlship
earmarked
earmarks
Earmuffs
earnests
earnings
earphone
Earpiece
earplugs
earrings
Please excuse my spelling I am dyslexic.
User avatar
evilC
Posts: 4823
Joined: 27 Feb 2014, 12:30

Re: REGEX riddle

02 Feb 2017, 12:43

A little confused by what appears to be double negatives in your requirements.

I assume that a valid match is a word that:
1) Does not begin with Capital E
2) Is letters and numbers only (No decimal points, spaces or symbols)
3) Is exactly 8 characters long

Here is my regex: (\b[^E\s\W_][a-zA-Z0-9]{7}\b)

And here is the sample data - valid matches marked with a *

Code: Select all

  Eagers
  Eagerest
  Ealdorman
* eanlings
  earache_
  e rdrops
  Eardrums
* earflaps
  earful
* earliest
  Ear!obes
* earlocks
* earlship
  earmarked
* earmarks
  Earmuffs
* earnests
* earnings
* earphone
  Earpiece
* earplugs
* earrings
User avatar
jeeswg
Posts: 6902
Joined: 19 Dec 2016, 01:58
Location: UK

Re: REGEX riddle

02 Feb 2017, 15:13

I believe the criteria is:
Keep every word that conforms to the following:
1. Does not start with capital E
2. Contains at least one non-alphanumeric character
3. Is not length 8
e.g. 'e_34567' and 'e_3456789' would match the criteria.

Please confirm or contradict this.
Perhaps provide examples of strings that do/don't match the criteria.

In any case, a script which can be easily modified,
if I have misunderstood:

Code: Select all

q::
vText := "e234567 e2345678 e23456789 E234567 E2345678 E23456789"
vText .= " e_34567 e_345678 e_3456789 E_34567 E_345678 E_3456789"
vOutput := ""
VarSetCapacity(vOutput, StrLen(vText)*2)
Loop, Parse, vText, %A_Space%
{
vTemp := A_LoopField
if (SubStr(vTemp, 1, 1) == "E") ;check first letter case sensitive
continue
if (StrLen(vTemp) = 8) ;check length
continue
if !RegExMatch(vTemp, "[^A-Za-z0-9]") ;check for any non-alphanumeric characters
continue

vOutput .= vTemp " "
}

vOutput := SubStr(vOutput, 1, -1)
Clipboard := vOutput
MsgBox % "done"
Return
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
User avatar
evilC
Posts: 4823
Joined: 27 Feb 2014, 12:30

Re: REGEX riddle

02 Feb 2017, 15:21

I think you have it wrong - only one of his sample data set has a non-alphanumeric character..
Most of them are length 8

If those were his reqs, it would be a very odd sample data set to provide. But then you get all sorts on here ;)


Oh, that was Odin's data set. OK, yeah, I officially have no friggin clue what the reqs are.
User avatar
Capn Odin
Posts: 1352
Joined: 23 Feb 2016, 19:45
Location: Denmark
Contact:

Re: REGEX riddle

02 Feb 2017, 21:02

evilC wrote:Oh, that was Odin's data set. OK, yeah, I officially have no friggin clue what the reqs are.
The data set was taken from a previous answer, that apparently has been deleted. alltrought I don't think it was wrong, at least the approach in the deleted post was valid.
Please excuse my spelling I am dyslexic.
User avatar
bitx0r
Posts: 21
Joined: 05 Oct 2014, 12:30
Location: NorCal
Contact:

Re: REGEX riddle

02 Feb 2017, 22:38

Deciphering the OP's Match requirements:

1) String must start with a capital "E"
2) Can only be 8 Characters long
3) No special characters can be in the string, only Alpha and Digits

A Non-RegEx solution:

Code: Select all

StringCaseSense, On
For e, v in StrSplit(list, "`n", "`r")
    if v is alnum
        NewList .= (StrLen(v)==8?(SubStr(v,1,1)=="E"? v "`n":_):_)
Using Capn's list we get these results:

Code: Select all

Eagerest
Eardrums
Earmuffs
Earpiece
Olegreddo

Re: REGEX riddle

03 Feb 2017, 10:20

You guys are amazing. I have, like 10% of your brain capacity.
Olegreddo

Re: REGEX riddle

03 Feb 2017, 10:32

jeeswg wrote:I believe the criteria is:
Keep every word that conforms to the following:
1. Does not start with capital E
2. Contains at least one non-alphanumeric character
3. Is not length 8
e.g. 'e_34567' and 'e_3456789' would match the criteria.

Please confirm or contradict this.
Perhaps provide examples of strings that do/don't match the criteria.

In any case, a script which can be easily modified,
if I have misunderstood:

Code: Select all

q::
vText := "e234567 e2345678 e23456789 E234567 E2345678 E23456789"
vText .= " e_34567 e_345678 e_3456789 E_34567 E_345678 E_3456789"
vOutput := ""
VarSetCapacity(vOutput, StrLen(vText)*2)
Loop, Parse, vText, %A_Space%
{
vTemp := A_LoopField
if (SubStr(vTemp, 1, 1) == "E") ;check first letter case sensitive
continue
if (StrLen(vTemp) = 8) ;check length
continue
if !RegExMatch(vTemp, "[^A-Za-z0-9]") ;check for any non-alphanumeric characters
continue

vOutput .= vTemp " "
}

vOutput := SubStr(vOutput, 1, -1)
Clipboard := vOutput
MsgBox % "done"
Return

You got all my requirements in reverse. Thanks for your code. Can you reverse it? Thanks
User avatar
evilC
Posts: 4823
Joined: 27 Feb 2014, 12:30

Re: REGEX riddle

03 Feb 2017, 11:01

Olegreddo wrote:You got all my requirements in reverse. Thanks for your code. Can you reverse it? Thanks
We have been saying that there are so many negatives in there that we are not sure which cancel which out.

Why not just repeat your reqs and make it less ambiguous? A number of people have gone to a bunch of effort to write POC code based off ambiguous reqs - the least you can do is repeat the reqs in an unambiguous format.
kon
Posts: 1756
Joined: 29 Sep 2013, 17:11

Re: REGEX riddle

03 Feb 2017, 11:17

Olegreddo wrote:Deletes every word which does not conform to [...] Does not start with capital letter E.
Do you see why people are confused?! What does "does not does not" mean!?
Provide some sample text!!!!!
Before and after!
Olegreddo

Re: REGEX riddle

03 Feb 2017, 11:23

kon wrote:
Olegreddo wrote:Deletes every word which does not conform to [...] Does not start with capital letter E.
Do you see why people are confused?! What does "does not does not" mean!?
Provide some sample text!!!!!
Before and after!
evilC wrote:
Olegreddo wrote:You got all my requirements in reverse. Thanks for your code. Can you reverse it? Thanks
We have been saying that there are so many negatives in there that we are not sure which cancel which out.

Why not just repeat your reqs and make it less ambiguous? A number of people have gone to a bunch of effort to write POC code based off ambiguous reqs - the least you can do is repeat the reqs in an unambiguous format.
My apologies to everyone for not being 100% clear:

Here are the conditions:

Incoming string is small number of words.

The outcome must be this:

1) String must start with a capital "E"
2) Can only be 8 Characters long
3) No special characters can be in the string, only Alpha and Digits


Example:

Incoming "RE: Insured Some Company, new GLPD water damage claim in NJ, E2D84587"
Outcome: "E2D84587"
User avatar
AlphaBravo
Posts: 586
Joined: 29 Sep 2013, 22:59

Re: REGEX riddle

03 Feb 2017, 11:40

Deletes every word which does not conform to following:

1. Does not start with capital letter E
2. Contains anything other than letters and integers
3. Is not 8 characters long

Code: Select all

H = e234567 e2345678 e23456789 E234567 E2345678 E23456789
MsgBox % RegExReplace(H, "\b(E(?![a-zA-Z0-9]{7}\b)[a-zA-Z0-9]+\s?)|.", "$1")
Olegred
Posts: 8
Joined: 01 Aug 2016, 15:23

Re: REGEX riddle

03 Feb 2017, 15:31

evilC wrote:Here is your regex: \bE[a-zA-Z0-9]{7}\b
Wow, great compact statement. The only problem is that it removes the part which I would like to keep. In other words it cuts out my desired string, instead of cutting out the unneeded part.
Olegred
Posts: 8
Joined: 01 Aug 2016, 15:23

Re: REGEX riddle

03 Feb 2017, 15:34

AlphaBravo wrote:Deletes every word which does not conform to following:

1. Does not start with capital letter E
2. Contains anything other than letters and integers
3. Is not 8 characters long

Code: Select all

H = e234567 e2345678 e23456789 E234567 E2345678 E23456789
MsgBox % RegExReplace(H, "\b(E(?![a-zA-Z0-9]{7}\b)[a-zA-Z0-9]+\s?)|.", "$1")

Thanks for the help. However, even in your example it leaves in "E23456789" which is 9 characters, not 8.

Also, when I try my string "RE: Insured Some Company, new GLPD water damage claim in NJ, E2D84587" it fails.
User avatar
evilC
Posts: 4823
Joined: 27 Feb 2014, 12:30

Re: REGEX riddle  Topic is solved

03 Feb 2017, 15:40

Oh so you want to strip them out of the string, not just match them?
What about the space before/after ? leave it ? remove one of them? which one?

RegexReplace(haystack, "\bE[a-zA-Z0-9]{7}\b", "") will just strip out the codes.
RegexReplace(haystack, " ?\bE[a-zA-Z0-9]{7}\b", "") will strip out the code, plus an optional space before
Last edited by evilC on 03 Feb 2017, 15:49, edited 2 times in total.
Olegred
Posts: 8
Joined: 01 Aug 2016, 15:23

Re: REGEX riddle

03 Feb 2017, 15:44

evilC wrote:Oh so you want to strip them out of the string, not just match them?
What about the space before/after ? leave it ? remove one of them? which one?

RegexReplace(haystack, "\bE[a-zA-Z0-9]{7}\b", "") will just strip out the codes.
Thanks a lot. So my goal is that when I run it on this string "RE: Insured Some Company, new GLPD water damage claim in NJ, E2D84587" the only word left in new string is "E2D84587"
kon
Posts: 1756
Joined: 29 Sep 2013, 17:11

Re: REGEX riddle

03 Feb 2017, 15:47

Code: Select all

H := "RE: Insured Some e2D84587 Company, new E2D84589 GLPD water damage E23456789 claim in NJ, E2D84587"
RegExMatch(H, "\bE[a-zA-Z0-9]{7}\b", var)
MsgBox, % var  ; E2D84589

MsgBox % RTrim(RegExReplace(H, "(\bE[a-zA-Z0-9]{7}\b\s?)|.+?", "$1"))  ; E2D84589 E2D84587

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: filipemb, Google [Bot], OrangeCat and 140 guests