[Func] Split Text keep delimiters | Split Text in lines by punctuation

Post your working scripts, libraries and tools
DigiDon
Posts: 176
Joined: 19 May 2014, 04:55
Contact:

[Func] Split Text keep delimiters | Split Text in lines by punctuation

02 Feb 2018, 05:53

Hi I hope this can benefit others ;)

I needed to display some sentences in a GUI by splitting some text into natural lines.
However I realised that StringSplit, Loop Parse etc. were omitting the delimiters from the result.
In this case this was a problem.

Couldn't find a script so here is one

Edit : Much simplier 2-liner by Helgef/jeeswg

Code: Select all

teststring:="This is a test??? This is a test2... This is a test3[] This is a test4!!?"
msgbox % "Split by punctuation `n" SplitTextByDelim(teststring)
msgbox % "Split by custom delimiters `n" SplitTextByDelim(teststring,"!?.[]")
; msgbox % 
exitapp
return

SplitTextByDelim(P_Text,P_Delim="!?.") {
P_Delim:= RegExReplace(P_Delim, "[\^\-\]\\]", "\$0")
return RTrim(RegExReplace(P_Text, "([" P_Delim "])\s+", "$1`r`n"), "`r`n ")
}
Original overcomplicated function without regex :lol:

Code: Select all

teststring:="Teststring. This is a test? This is a test2! This is a test3?"
msgbox % SplitTextByDelim(teststring)
exitapp


;SplitTextByDelim(P_Text,Delim="!?.")
;SplitText by delimiters but keeps delimiters in returned string
;*****************************
;by DigiDon; Use/modify as you wish
;P_Text - Text to be split
;Delim - List of 1 char delimiters; Default are end-sentence characters (!?.)
SplitTextByDelim(P_Text,Delim="!?.") {
	StartingPos:=1
	loop {
	FoundPositions:={}
		Loop, Parse, Delim
		{
		FoundPos%A_Index%:=InStr(P_Text, A_LoopField,,StartingPos)
		If FoundPos%A_Index%
			FoundPositions.Push(FoundPos%A_Index%)
		}
		if !FoundPositions.MaxIndex()
			break
		FoundPos:=min(FoundPositions*)
		if StartingPos=1
			ReturnedText:=Trim(SubStr(P_Text,StartingPos,FoundPos-StartingPos+1))
		else
			ReturnedText.="`n" Trim(SubStr(P_Text,StartingPos,FoundPos-StartingPos+1))
		StartingPos:=FoundPos+1
	}
	if StartingPos!=1
		ReturnedText.="`n"
	ReturnedText.= Trim(SubStr(P_Text,StartingPos))
	
	return ReturnedText
}

;by infogulch
;https://autohotkey.com/board/topic/71072-arraymax-arraymin/
min(n, x*) {
    if (ObjMaxIndex(x) == 0) && IsObject(n)
        x := n, x._NewEnum().next(k,n)
    for k,v in x
        if (v < n)
            n := v
    return n
}
Cheers
Last edited by DigiDon on 03 Feb 2018, 05:41, edited 2 times in total.
EverFastAccess : Take Notes on anything the Fast way: Attach notes, Set reminders & Speed up research in 1 gesture - AHK topic
AHK Dynamic Obfuscator L - Protect your AHK code by Obfuscation - AHK topic
QuickModules for Outlook : Sort Outlook emails very quickly to multiple folders - AHK topic

Coding takes lots of time and efforts. If I have helped you or if you enjoy one of my free projects, please consider a small donation :thumbup:
Helgef
Posts: 3221
Joined: 17 Jul 2016, 01:02
Contact:

Re: [Func] Split Text keep delimiters | Split Text in lines by punctuation

02 Feb 2018, 14:09

Hello, thanks for sharing :wave: . There is a problem with sentences which contains more than one delimiter, eg, to be continued... or what!?

I tried it with regex, rtrim(regexreplace(teststring,"[!.?](\s+)", "$0`n"), "`n "). You can remove linebreaks before the regex if needed.

Cheers.
DigiDon
Posts: 176
Joined: 19 May 2014, 04:55
Contact:

Re: [Func] Split Text keep delimiters | Split Text in lines by punctuation

02 Feb 2018, 16:10

Hum...right I didn't have this case.

Here is one version. It loops a bit ... but gets the job done ;)

Code: Select all

teststring:="Teststring... This is a test??? This is a test2?! This is a test3! This is a test4!!?"
msgbox % SplitTextByDelim(teststring)
exitapp
return


;SplitTextByDelim(P_Text,Delim="!?.")
;SplitText by delimiters but keeps delimiters in returned string
;If multiple delimiters are next to each other it takes the ending one
;*****************************
;by DigiDon; Use/modify as you wish
;P_Text - Text to be split
;Delim - List of 1 char delimiters; Default are end-sentence characters (!?.)
SplitTextByDelim(P_Text,Delim="!?.") {
	StartingPos:=1
	loop {
		FoundPositions:={}
		Loop, Parse, Delim
			{
			FoundPos%A_Index%:=InStr(P_Text, A_LoopField,,StartingPos)
			If FoundPos%A_Index%
				FoundPositions.Push(FoundPos%A_Index%)
			}
		if !FoundPositions.MaxIndex()
			break
		FoundPos:=min(FoundPositions*)
		
		loop {
			FoundDelim=0
			Loop, Parse, Delim
			if (SubStr(P_Text,FoundPos,1)=A_Loopfield) {
				FoundDelim=1
				FoundPos++
				break
				}
			if !FoundDelim
				break
			}
		
		if StartingPos=1
			ReturnedText:=Trim(SubStr(P_Text,StartingPos,FoundPos-StartingPos+1))
		else
			ReturnedText.="`n" Trim(SubStr(P_Text,StartingPos,FoundPos-StartingPos+1))
		StartingPos:=FoundPos+1
		}
	if StartingPos!=1
		ReturnedText.="`n"
	ReturnedText.= Trim(SubStr(P_Text,StartingPos))
	
	return ReturnedText
}

;by infogulch
;https://autohotkey.com/board/topic/71072-arraymax-arraymin/
min(n, x*) {
    if (ObjMaxIndex(x) == 0) && IsObject(n)
        x := n, x._NewEnum().next(k,n)
    for k,v in x
        if (v < n)
            n := v
    return n
}
EverFastAccess : Take Notes on anything the Fast way: Attach notes, Set reminders & Speed up research in 1 gesture - AHK topic
AHK Dynamic Obfuscator L - Protect your AHK code by Obfuscation - AHK topic
QuickModules for Outlook : Sort Outlook emails very quickly to multiple folders - AHK topic

Coding takes lots of time and efforts. If I have helped you or if you enjoy one of my free projects, please consider a small donation :thumbup:
DigiDon
Posts: 176
Joined: 19 May 2014, 04:55
Contact:

Re: [Func] Split Text keep delimiters | Split Text in lines by punctuation

02 Feb 2018, 16:19

Sorry I didn't get you got it working into one line ! Ahah :D

Yep it seems way more concise with regex. And I know it as well but I don't know why it didn't came to my mind.

So can you come with the complete function or should I ? :P
EverFastAccess : Take Notes on anything the Fast way: Attach notes, Set reminders & Speed up research in 1 gesture - AHK topic
AHK Dynamic Obfuscator L - Protect your AHK code by Obfuscation - AHK topic
QuickModules for Outlook : Sort Outlook emails very quickly to multiple folders - AHK topic

Coding takes lots of time and efforts. If I have helped you or if you enjoy one of my free projects, please consider a small donation :thumbup:
DigiDon
Posts: 176
Joined: 19 May 2014, 04:55
Contact:

Re: [Func] Split Text keep delimiters | Split Text in lines by punctuation

02 Feb 2018, 16:25

So it seems it works. You've nailed it ;)

Code: Select all

teststring:="Teststring... This is a test??? This is a test2?! This is a test3! This is a test4!!?"
msgbox % SplitTextByDelim(teststring)
; msgbox % 
exitapp
return

SplitTextByDelim(P_Text,Delim="!?.") {
return rtrim(regexreplace(P_Text,"[" Delim "](\s+)", "$0`n"), "`n ")
}
EverFastAccess : Take Notes on anything the Fast way: Attach notes, Set reminders & Speed up research in 1 gesture - AHK topic
AHK Dynamic Obfuscator L - Protect your AHK code by Obfuscation - AHK topic
QuickModules for Outlook : Sort Outlook emails very quickly to multiple folders - AHK topic

Coding takes lots of time and efforts. If I have helped you or if you enjoy one of my free projects, please consider a small donation :thumbup:
User avatar
jeeswg
Posts: 5271
Joined: 19 Dec 2016, 01:58
Location: UK

Re: [Func] Split Text keep delimiters | Split Text in lines by punctuation

02 Feb 2018, 17:19

- Here's a little variant code. It's basically the same except:
- It can handle if any of the 4 characters: ^ - ] \ are used as delimiters.
- It removes wwhitespace from the end of each line.

Code: Select all

q:: ;text to one line per sentence
vDelim := "!?."
vDelim .= "^-]\"
;4 characters that need escaping in a RegEx character class: ^-]\
vText := "Sentence 1!!! Sentence 2??? Sentence 3... Sentence 4!?. Sentence 5^-]\ Sentence 6."
vDelim := RegExReplace(vDelim, "[\^\-\]\\]", "\$0")
MsgBox, % Clipboard := RTrim(RegExReplace(vText, "([" vDelim "])\s+", "$1`r`n"), "`r`n ")
return
DigiDon
Posts: 176
Joined: 19 May 2014, 04:55
Contact:

Re: [Func] Split Text keep delimiters | Split Text in lines by punctuation

03 Feb 2018, 05:35

Excellent I'll update the first post to reflect it :)
EverFastAccess : Take Notes on anything the Fast way: Attach notes, Set reminders & Speed up research in 1 gesture - AHK topic
AHK Dynamic Obfuscator L - Protect your AHK code by Obfuscation - AHK topic
QuickModules for Outlook : Sort Outlook emails very quickly to multiple folders - AHK topic

Coding takes lots of time and efforts. If I have helped you or if you enjoy one of my free projects, please consider a small donation :thumbup:
carno
Posts: 129
Joined: 20 Jun 2014, 16:48

Re: [Func] Split Text keep delimiters | Split Text in lines by punctuation

06 Feb 2018, 08:19

Thanks! More extensive test:

Code: Select all

teststring:="This is a test??? This is a test2... This is a test3[] This is a test4!!? This is a test!!! This is a 

test> This is a test/ This is a text\ This is a text| This is a test^ This is a test- This is a test~ 

This is a test~~ This is a test' This is a test: This is a text"" This is a test :) This is a test) This is a text :( 

This is a test;) This is a test `;) This is a test ... This is a test ? This is a test {} This is a test [] This is a 

test`` This is a test+"
msgbox % "Split by punctuation `n" SplitTextByDelim(teststring)
msgbox % "Split by custom delimiters `n" SplitTextByDelim(teststring,"!?.[]")
msgbox % "Split by custom delimiters `n" SplitTextByDelim(teststring,"!?.[]{}""/\|>^-~':();)``")
exitapp
return

SplitTextByDelim(P_Text,P_Delim="!?.") {
P_Delim:= RegExReplace(P_Delim, "[\^\-\]\\]", "\$0")
return RTrim(RegExReplace(P_Text, "([" P_Delim "])\s+", "$1`r`n"), "`r`n ")
}

Return to “Scripts and Functions”

Who is online

Users browsing this forum: No registered users and 54 guests