[Func] Split Text keep delimiters | Split Text in lines by punctuation

Post your working scripts, libraries and tools for AHK v1.1 and older
DigiDon
Posts: 178
Joined: 19 May 2014, 04:55
Contact:

[Func] Split Text keep delimiters | Split Text in lines by punctuation

02 Feb 2018, 05:53

Hi I hope this can benefit others ;)

I needed to display some sentences in a GUI by splitting some text into natural lines.
However I realised that StringSplit, Loop Parse etc. were omitting the delimiters from the result.
In this case this was a problem.

Couldn't find a script so here is one

Edit : Much simplier 2-liner by Helgef/jeeswg

Code: Select all

teststring:="This is a test??? This is a test2... This is a test3[] This is a test4!!?"
msgbox % "Split by punctuation `n" SplitTextByDelim(teststring)
msgbox % "Split by custom delimiters `n" SplitTextByDelim(teststring,"!?.[]")
; msgbox % 
exitapp
return

SplitTextByDelim(P_Text,P_Delim="!?.") {
P_Delim:= RegExReplace(P_Delim, "[\^\-\]\\]", "\$0")
return RTrim(RegExReplace(P_Text, "([" P_Delim "])\s+", "$1`r`n"), "`r`n ")
}
Original overcomplicated function without regex :lol:

Code: Select all

teststring:="Teststring. This is a test? This is a test2! This is a test3?"
msgbox % SplitTextByDelim(teststring)
exitapp


;SplitTextByDelim(P_Text,Delim="!?.")
;SplitText by delimiters but keeps delimiters in returned string
;*****************************
;by DigiDon; Use/modify as you wish
;P_Text - Text to be split
;Delim - List of 1 char delimiters; Default are end-sentence characters (!?.)
SplitTextByDelim(P_Text,Delim="!?.") {
	StartingPos:=1
	loop {
	FoundPositions:={}
		Loop, Parse, Delim
		{
		FoundPos%A_Index%:=InStr(P_Text, A_LoopField,,StartingPos)
		If FoundPos%A_Index%
			FoundPositions.Push(FoundPos%A_Index%)
		}
		if !FoundPositions.MaxIndex()
			break
		FoundPos:=min(FoundPositions*)
		if StartingPos=1
			ReturnedText:=Trim(SubStr(P_Text,StartingPos,FoundPos-StartingPos+1))
		else
			ReturnedText.="`n" Trim(SubStr(P_Text,StartingPos,FoundPos-StartingPos+1))
		StartingPos:=FoundPos+1
	}
	if StartingPos!=1
		ReturnedText.="`n"
	ReturnedText.= Trim(SubStr(P_Text,StartingPos))
	
	return ReturnedText
}

;by infogulch
;https://autohotkey.com/board/topic/71072-arraymax-arraymin/
min(n, x*) {
    if (ObjMaxIndex(x) == 0) && IsObject(n)
        x := n, x._NewEnum().next(k,n)
    for k,v in x
        if (v < n)
            n := v
    return n
}
Cheers
Last edited by DigiDon on 03 Feb 2018, 05:41, edited 2 times in total.
EverFastAccess : Take Notes on anything the Fast way: Attach notes, Set reminders & Speed up research in 1 gesture - AHK topic
AHK Dynamic Obfuscator L - Protect your AHK code by Obfuscation - AHK topic
QuickModules for Outlook : Sort Outlook emails very quickly to multiple folders - AHK topic
Coding takes lots of time and efforts. If I have helped you or if you enjoy one of my free projects, please consider a small donation :thumbup:
Sorry I am working hard at the moment at a new job and can't commit on delays of answers & updates
Helgef
Posts: 4709
Joined: 17 Jul 2016, 01:02
Contact:

Re: [Func] Split Text keep delimiters | Split Text in lines by punctuation

02 Feb 2018, 14:09

Hello, thanks for sharing :wave: . There is a problem with sentences which contains more than one delimiter, eg, to be continued... or what!?

I tried it with regex, rtrim(regexreplace(teststring,"[!.?](\s+)", "$0`n"), "`n "). You can remove linebreaks before the regex if needed.

Cheers.
DigiDon
Posts: 178
Joined: 19 May 2014, 04:55
Contact:

Re: [Func] Split Text keep delimiters | Split Text in lines by punctuation

02 Feb 2018, 16:10

Hum...right I didn't have this case.

Here is one version. It loops a bit ... but gets the job done ;)

Code: Select all

teststring:="Teststring... This is a test??? This is a test2?! This is a test3! This is a test4!!?"
msgbox % SplitTextByDelim(teststring)
exitapp
return


;SplitTextByDelim(P_Text,Delim="!?.")
;SplitText by delimiters but keeps delimiters in returned string
;If multiple delimiters are next to each other it takes the ending one
;*****************************
;by DigiDon; Use/modify as you wish
;P_Text - Text to be split
;Delim - List of 1 char delimiters; Default are end-sentence characters (!?.)
SplitTextByDelim(P_Text,Delim="!?.") {
	StartingPos:=1
	loop {
		FoundPositions:={}
		Loop, Parse, Delim
			{
			FoundPos%A_Index%:=InStr(P_Text, A_LoopField,,StartingPos)
			If FoundPos%A_Index%
				FoundPositions.Push(FoundPos%A_Index%)
			}
		if !FoundPositions.MaxIndex()
			break
		FoundPos:=min(FoundPositions*)
		
		loop {
			FoundDelim=0
			Loop, Parse, Delim
			if (SubStr(P_Text,FoundPos,1)=A_Loopfield) {
				FoundDelim=1
				FoundPos++
				break
				}
			if !FoundDelim
				break
			}
		
		if StartingPos=1
			ReturnedText:=Trim(SubStr(P_Text,StartingPos,FoundPos-StartingPos+1))
		else
			ReturnedText.="`n" Trim(SubStr(P_Text,StartingPos,FoundPos-StartingPos+1))
		StartingPos:=FoundPos+1
		}
	if StartingPos!=1
		ReturnedText.="`n"
	ReturnedText.= Trim(SubStr(P_Text,StartingPos))
	
	return ReturnedText
}

;by infogulch
;https://autohotkey.com/board/topic/71072-arraymax-arraymin/
min(n, x*) {
    if (ObjMaxIndex(x) == 0) && IsObject(n)
        x := n, x._NewEnum().next(k,n)
    for k,v in x
        if (v < n)
            n := v
    return n
}
EverFastAccess : Take Notes on anything the Fast way: Attach notes, Set reminders & Speed up research in 1 gesture - AHK topic
AHK Dynamic Obfuscator L - Protect your AHK code by Obfuscation - AHK topic
QuickModules for Outlook : Sort Outlook emails very quickly to multiple folders - AHK topic
Coding takes lots of time and efforts. If I have helped you or if you enjoy one of my free projects, please consider a small donation :thumbup:
Sorry I am working hard at the moment at a new job and can't commit on delays of answers & updates
DigiDon
Posts: 178
Joined: 19 May 2014, 04:55
Contact:

Re: [Func] Split Text keep delimiters | Split Text in lines by punctuation

02 Feb 2018, 16:19

Sorry I didn't get you got it working into one line ! Ahah :D

Yep it seems way more concise with regex. And I know it as well but I don't know why it didn't came to my mind.

So can you come with the complete function or should I ? :P
EverFastAccess : Take Notes on anything the Fast way: Attach notes, Set reminders & Speed up research in 1 gesture - AHK topic
AHK Dynamic Obfuscator L - Protect your AHK code by Obfuscation - AHK topic
QuickModules for Outlook : Sort Outlook emails very quickly to multiple folders - AHK topic
Coding takes lots of time and efforts. If I have helped you or if you enjoy one of my free projects, please consider a small donation :thumbup:
Sorry I am working hard at the moment at a new job and can't commit on delays of answers & updates
DigiDon
Posts: 178
Joined: 19 May 2014, 04:55
Contact:

Re: [Func] Split Text keep delimiters | Split Text in lines by punctuation

02 Feb 2018, 16:25

So it seems it works. You've nailed it ;)

Code: Select all

teststring:="Teststring... This is a test??? This is a test2?! This is a test3! This is a test4!!?"
msgbox % SplitTextByDelim(teststring)
; msgbox % 
exitapp
return

SplitTextByDelim(P_Text,Delim="!?.") {
return rtrim(regexreplace(P_Text,"[" Delim "](\s+)", "$0`n"), "`n ")
}
EverFastAccess : Take Notes on anything the Fast way: Attach notes, Set reminders & Speed up research in 1 gesture - AHK topic
AHK Dynamic Obfuscator L - Protect your AHK code by Obfuscation - AHK topic
QuickModules for Outlook : Sort Outlook emails very quickly to multiple folders - AHK topic
Coding takes lots of time and efforts. If I have helped you or if you enjoy one of my free projects, please consider a small donation :thumbup:
Sorry I am working hard at the moment at a new job and can't commit on delays of answers & updates
User avatar
jeeswg
Posts: 6902
Joined: 19 Dec 2016, 01:58
Location: UK

Re: [Func] Split Text keep delimiters | Split Text in lines by punctuation

02 Feb 2018, 17:19

- Here's a little variant code. It's basically the same except:
- It can handle if any of the 4 characters: ^ - ] \ are used as delimiters.
- It removes wwhitespace from the end of each line.

Code: Select all

q:: ;text to one line per sentence
vDelim := "!?."
vDelim .= "^-]\"
;4 characters that need escaping in a RegEx character class: ^-]\
vText := "Sentence 1!!! Sentence 2??? Sentence 3... Sentence 4!?. Sentence 5^-]\ Sentence 6."
vDelim := RegExReplace(vDelim, "[\^\-\]\\]", "\$0")
MsgBox, % Clipboard := RTrim(RegExReplace(vText, "([" vDelim "])\s+", "$1`r`n"), "`r`n ")
return
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
DigiDon
Posts: 178
Joined: 19 May 2014, 04:55
Contact:

Re: [Func] Split Text keep delimiters | Split Text in lines by punctuation

03 Feb 2018, 05:35

Excellent I'll update the first post to reflect it :)
EverFastAccess : Take Notes on anything the Fast way: Attach notes, Set reminders & Speed up research in 1 gesture - AHK topic
AHK Dynamic Obfuscator L - Protect your AHK code by Obfuscation - AHK topic
QuickModules for Outlook : Sort Outlook emails very quickly to multiple folders - AHK topic
Coding takes lots of time and efforts. If I have helped you or if you enjoy one of my free projects, please consider a small donation :thumbup:
Sorry I am working hard at the moment at a new job and can't commit on delays of answers & updates
carno
Posts: 265
Joined: 20 Jun 2014, 16:48

Re: [Func] Split Text keep delimiters | Split Text in lines by punctuation

06 Feb 2018, 08:19

Thanks! More extensive test:

Code: Select all

teststring:="This is a test??? This is a test2... This is a test3[] This is a test4!!? This is a test!!! This is a 

test> This is a test/ This is a text\ This is a text| This is a test^ This is a test- This is a test~ 

This is a test~~ This is a test' This is a test: This is a text"" This is a test :) This is a test) This is a text :( 

This is a test;) This is a test `;) This is a test ... This is a test ? This is a test {} This is a test [] This is a 

test`` This is a test+"
msgbox % "Split by punctuation `n" SplitTextByDelim(teststring)
msgbox % "Split by custom delimiters `n" SplitTextByDelim(teststring,"!?.[]")
msgbox % "Split by custom delimiters `n" SplitTextByDelim(teststring,"!?.[]{}""/\|>^-~':();)``")
exitapp
return

SplitTextByDelim(P_Text,P_Delim="!?.") {
P_Delim:= RegExReplace(P_Delim, "[\^\-\]\\]", "\$0")
return RTrim(RegExReplace(P_Text, "([" P_Delim "])\s+", "$1`r`n"), "`r`n ")
}
lolhehehe
Posts: 4
Joined: 04 May 2020, 17:54

Re: [Func] Split Text keep delimiters | Split Text in lines by punctuation

11 May 2020, 03:55

Hello friends, is it possible to make this function return an array? Just like StrSplit(), but keeping the delimiters and with the powerful Regex provided by this function.

Thanks in advance!

Return to “Scripts and Functions (v1)”

Who is online

Users browsing this forum: No registered users and 69 guests