filter non English lines Topic is solved

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
User avatar
DataLife
Posts: 447
Joined: 29 Sep 2013, 19:52

filter non English lines

14 Jun 2018, 10:24

NetworkList.PNG
NetworkList.PNG (15.23 KiB) Viewed 1964 times
I am trying to filter out all lines that contain non english words. How can I get rid of the remaining non english lines?

Code: Select all

loop 50
 {  
  FileReadLine,var,textfile.txt,%a_index% ;dynamic contents
  FoundPos := RegExMatch(var, "[a-zA-Z0-9,.!?]") ;from https://autohotkey.com/board/topic/149454-how-do-i-identify-non-english-letters-in-a-string/#entry732502
  if FoundPos <> 0
   List = %List%`n%var%  
 }
 MsgBox %list%
Check out my scripts. (MyIpChanger) (ClipBoard Manager) (SavePictureAs)
All my scripts are tested on Windows 10, AutoHotkey 32 bit Ansi unless otherwise stated.
Helgef
Posts: 4709
Joined: 17 Jul 2016, 01:02
Contact:

Re: filter non English lines  Topic is solved

14 Jun 2018, 12:46

Hello, maybe,

Code: Select all

filterNonEnglish(str){
	return regexreplace(trim(regexreplace(str, "`nm)^.*[^[:ascii:]].*$"),  "`n"), "\R{2,}",  "`n")
}
Last edited by Helgef on 15 Jun 2018, 16:01, edited 4 times in total.
User avatar
DataLife
Posts: 447
Joined: 29 Sep 2013, 19:52

Re: filter non English lines

14 Jun 2018, 13:55

Helgef wrote:Hello, maybe,

Code: Select all

filterNonEnglish(str){
	return strreplace(regexreplace(str, "`nm)^.*[^[:ascii:]].*$"), "`n`n")
}
That appears to work perfectly. I will know for sure when my user in Sweden is able to run it on his computer.

Regex looks like magic to me.

thank you very much
DataLife
English only variables.PNG
English only variables.PNG (11.07 KiB) Viewed 1944 times
Check out my scripts. (MyIpChanger) (ClipBoard Manager) (SavePictureAs)
All my scripts are tested on Windows 10, AutoHotkey 32 bit Ansi unless otherwise stated.
Helgef
Posts: 4709
Joined: 17 Jul 2016, 01:02
Contact:

Re: filter non English lines

14 Jun 2018, 14:25

:thumbup:
I edit it, I added trim.

Cheers.

Edit: It doesn't work :thumbdown:
Edit2: I think it works now :thumbup: .
User avatar
DataLife
Posts: 447
Joined: 29 Sep 2013, 19:52

Re: filter non English lines

15 Jun 2018, 14:10

Helgef wrote::thumbup:
I edit it, I added trim.

Cheers.

Edit: It doesn't work :thumbdown:
Edit2: I think it works now :thumbup: .
Yes, it works, thanks very much
Check out my scripts. (MyIpChanger) (ClipBoard Manager) (SavePictureAs)
All my scripts are tested on Windows 10, AutoHotkey 32 bit Ansi unless otherwise stated.
Helgef
Posts: 4709
Joined: 17 Jul 2016, 01:02
Contact:

Re: filter non English lines

16 Jun 2018, 03:38

my user in Sweden
pcre.txt wrote:ascii character codes 0 - 127
Wikipedia wrote:The Swedish alphabet is the writing system used for the Swedish language. The 29 letters of this alphabet are the modern 26-letter basic Latin alphabet ('A' through 'Z') plus 'Å', 'Ä', and 'Ö'
Wikipedia - Å wrote:Unicode 197
Maybe this is more appropriate, if the users uses any 'Å', 'Ä', and 'Ö',

Code: Select all

removeNonSwedishLines(str){
	return regexreplace(trim(regexreplace(str, "`nm)^.*[^\x{0}-\x{ff}].*$"),  "`n"), "\R{2,}",  "`n")
}
I do not know, just guessing.

Cheers.
User avatar
DataLife
Posts: 447
Joined: 29 Sep 2013, 19:52

Re: filter non English lines

16 Jun 2018, 18:04

Helgef wrote:
my user in Sweden
pcre.txt wrote:ascii character codes 0 - 127
Wikipedia wrote:The Swedish alphabet is the writing system used for the Swedish language. The 29 letters of this alphabet are the modern 26-letter basic Latin alphabet ('A' through 'Z') plus 'Å', 'Ä', and 'Ö'
Wikipedia - Å wrote:Unicode 197
Maybe this is more appropriate, if the users uses any 'Å', 'Ä', and 'Ö',

Code: Select all

removeNonSwedishLines(str){
	return regexreplace(trim(regexreplace(str, "`nm)^.*[^\x{0}-\x{ff}].*$"),  "`n"), "\R{2,}",  "`n")
}
I do not know, just guessing.

Cheers.
Yes I was concerned about that but I am not able to test until he comes back from vacation. Your changes appear to fix this issue before it even occurs.

A_language returns 0409 on his system. I don't know how all that works but if his language code is 0409 does that mean that he would only be using English characters?
Thanks for your help
DataLife
Check out my scripts. (MyIpChanger) (ClipBoard Manager) (SavePictureAs)
All my scripts are tested on Windows 10, AutoHotkey 32 bit Ansi unless otherwise stated.
User avatar
jeeswg
Posts: 6902
Joined: 19 Dec 2016, 01:58
Location: UK

Re: filter non English lines

17 Jun 2018, 05:50

For something like this I would create a list of all the strings. I would then determine a list of all of the unique characters in that list and determine whether to allow each character. I would assess each character manually, I might then make a RegEx line based on my conclusions.
E.g. see, LIST EVERY CHARACTER THAT APPEARS IN A STRING:
jeeswg's characters tutorial - AutoHotkey Community
https://autohotkey.com/boards/viewtopic.php?f=7&t=26486
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
Helgef
Posts: 4709
Joined: 17 Jul 2016, 01:02
Contact:

Re: filter non English lines

17 Jun 2018, 06:09

0x0409 is english (US).
does that mean that he would only be using English characters?
Not really, even if the system language is english, the user may set his keyboard layout to any language, eg, swedish, and he may join any non-english named network.

Cheers.

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: doodles333, mikeyww and 329 guests