StrSplit() returns an array of unexpected capacity

Report problems with documented functionality
pneumatic
Posts: 338
Joined: 05 Dec 2016, 01:51

StrSplit() returns an array of unexpected capacity

18 Aug 2017, 20:55

For some reason the StrSplit method returns a larger array "capacity" than if the elements are manually added. I'm guessing there is some technical reason for this, but it doesn't appear to be documented.

Code: Select all

String := "1,2,3,4,5,6"
Array := StrSplit( String, "," )
MsgBox % "Array.GetCapacity(): " Array.GetCapacity() "`nArray.Length(): " Array.Length()

Array := ["1","2","3","4","5","6"]
MsgBox % "Array.GetCapacity(): " Array.GetCapacity() "`nArray.Length(): " Array.Length()
just me
Posts: 9425
Joined: 02 Oct 2013, 08:51
Location: Germany

Re: StrSplit() returns an array of unexpected capacity

19 Aug 2017, 02:00

pneumatic wrote:I'm guessing there is some technical reason for this, but it doesn't appear to be documented.
Bug Reports wrote:Report problems with documented functionality.
So why did you post this in "Bug Reports"?
obeeb
Posts: 140
Joined: 20 Feb 2014, 19:15

Re: StrSplit() returns an array of unexpected capacity

19 Aug 2017, 09:41

Expandable arrays in other languages are usually implemented in such a way that when an array needs to be increased it's capacity is doubled. This provides a reasonable trade-off between simplicity, speed and size.
This is what StrSplit() does, it starts with 4 which is the minimal initial capacity and when it needs to insert "5" it doubles the capacity. This is definitely a bug or at the very least an undocumented "feature", it should shrink the array before returning it.
just me wrote:
Bug Reports wrote:Report problems with documented functionality.
So why did you post this in "Bug Reports"?
If the goal of reporting bugs is to improve Autohotkey and it's documentation then you definitely did the right thing by reporting it, I don't know what @just me wants from you.
Unless we take "Report problems with documented functionality" completely literally and then I don't know what's is the purpose of this forum.
just me
Posts: 9425
Joined: 02 Oct 2013, 08:51
Location: Germany

Re: StrSplit() returns an array of unexpected capacity

19 Aug 2017, 10:10

obeeb wrote:Unless we take "Report problems with documented functionality" completely literally ...
That's the reason why this subtitle has been added.
obeeb wrote:If the goal of reporting bugs ...
The reported 'issue' cannot be called a 'bug' in my opinion.

If you want to know why the actual behaviour differs from what you are expecting, ask for the reason in "Ask For Help".

To get the number of elements in a simple array use Array.Length(). What do you want to do with the value returned by Array.GetCapacity()?
obeeb wrote:... it should shrink the array before returning it.
Why? It's the same for variables. VarSetCapacity(String) and StrLen(String) aren't equivalent in many cases.
Helgef
Posts: 4709
Joined: 17 Jul 2016, 01:02
Contact:

Re: StrSplit() returns an array of unexpected capacity

19 Aug 2017, 10:39

This is definitely not a bug, it should not shrink the array before returning it, that would be an unreasonable trade-off between simplicity, speed and size.

@ Op, if you want to shrink it, do it,
Spoiler
Cheers.
obeeb
Posts: 140
Joined: 20 Feb 2014, 19:15

Re: StrSplit() returns an array of unexpected capacity

19 Aug 2017, 11:25

just me wrote:
obeeb wrote:Unless we take "Report problems with documented functionality" completely literally ...
That's the reason why this subtitle has been added.
That's interesting to know but how is it relevant? Why are you only interested in "problems with documented functionality"?
If for example the following code: var := "This will crash the computer but just me doesn't think it should be reported as bug" will crash the computer you don't think it belongs here?
just me wrote:The reported 'issue' cannot be called a 'bug' in my opinion.
If you want to know why the actual behaviour differs from what you are expecting, ask for the reason in "Ask For Help".
To get the number of elements in a simple array use Array.Length(). What do you want to do with the value returned by Array.GetCapacity()?
I want to know what my code does, if StrSplit can return an array almost twice as big as the number of elements in it this should be documented. Can't argue with your opinion that this is not a bug, this is something lexikos needs to see and decide. Adding elements to array also used to double it's capacity, this is something he did decide to fix.
just me wrote:
obeeb wrote:... it should shrink the array before returning it.
Why? It's the same for variables. VarSetCapacity(String) and StrLen(String) aren't equivalent in many cases.
Nothing here is the same VarSetCapacity is called explicitly, it's not about them being different, it's about unexpected and in my opinion incorrect behavior, I don't see any reason for StrSplit to return a larger array then needed.
Helgef wrote:This is definitely not a bug, it should not shrink the array before returning it, that would be an unreasonable trade-off between simplicity, speed and size.
Please don't use my arguments against me if you don't understand them.
This only applies to general arrays, in a core language function in which you have full information about the input and required output it's completely unreasonable to do anything but return an array with the exact capacity.
Helgef
Posts: 4709
Joined: 17 Jul 2016, 01:02
Contact:

Re: StrSplit() returns an array of unexpected capacity

19 Aug 2017, 15:27

obeeb, neither one of us made an argument, I twisted your statement to make my own.
in a core language function in which you have full information about the input and required output it's completely unreasonable to do anything but return an array with the exact capacity.
Had it been true, I'd agree.
Cheers.
obeeb
Posts: 140
Joined: 20 Feb 2014, 19:15

Re: StrSplit() returns an array of unexpected capacity

19 Aug 2017, 15:35

Helgef wrote: Had it been true, I'd agree.
What's not true about it?
Helgef
Posts: 4709
Joined: 17 Jul 2016, 01:02
Contact:

Re: StrSplit() returns an array of unexpected capacity

19 Aug 2017, 16:16

StrSplit doesn't know the required capacity of the output, because it doesn't know the number of delimiters in the input. So it parses the string, searches for delimiters, and adds what's between them in the array, which expands as you described (0->4->8->16...) , then returns the array without shrinking it, as it should. The cost for expanding has been paid, it shouldn't be assumed that user do not want to add more values after the split, causing it to pay for yet another expansion. When you do not want to add to your array anymore, shrink it using setCapacity(0), if desired.

Which one is better?

Code: Select all

String := "1,2,3,4,5,6"
Array := StrSplit( String, "," )
Array["a"]:=1
MsgBox % "StrSplit do not shrink the array, capacity: " Array.GetCapacity()

String := "1,2,3,4,5,6"
Array := strSplitCap( String, "," )
Array["a"]:=1
MsgBox % "StrSplit do shrink the array, capacity: " Array.GetCapacity()

strSplitCap(String, Delimiters:="", OmitChars:=""){
	local s:=strSplit(String, Delimiters, OmitChars)
	s.setCapacity(0)
	return s
}
Cheers.
just me
Posts: 9425
Joined: 02 Oct 2013, 08:51
Location: Germany

Re: StrSplit() returns an array of unexpected capacity

19 Aug 2017, 16:36

Code: Select all

#NoEnv
SetBatchLines, -1
S := A_TickCount
A1 := []
S := A_TickCount
Loop, 100000
   A1.Push(A_Index)
T := A_TickCount - S
MsgBox % "A1.GetCapacity(): " A1.GetCapacity() "`nA1.Length(): " A1.Length() "`nTime: " T ; Time: 562

S := A_TickCount
A2 := []
A2.SetCapacity(100000)
Loop, 100000
   A2.Push(A_Index)
T := A_TickCount - S
MsgBox % "A2.GetCapacity(): " A2.GetCapacity() "`nA2.Length(): " A2.Length() "`nTime: " T ; Time: 31

V := ""
Loop, 100000
   V .= "," . A_Index
V := LTrim(V, ",")
S := A_TickCount
A3 := StrSplit(V, ",")
T := A_TickCount - S
MsgBox % "A3.GetCapacity(): " A3.GetCapacity() "`nA3.Length(): " A3.Length() "`nTime: " T ; Time: 16

S := ""
Loop, 132
   S .= "A"
MsgBox, % "VarSetCapacity(S): " VarSetCapacity(S) . "`n(StrLen(S) << !!A_IsUnicode): " . (StrLen(S) << !!A_IsUnicode)
ExitApp
MsgBox wrote:A1.GetCapacity(): 100000
A1.Length(): 100000
Time: 562

A2.GetCapacity(): 100000
A2.Length(): 100000
Time: 47

A3.GetCapacity(): 131072
A3.Length(): 100000
Time: 15

VarSetCapacity(S): 518
(StrLen(S) << !!A_IsUnicode): 264
obeeb
Posts: 140
Joined: 20 Feb 2014, 19:15

Re: StrSplit() returns an array of unexpected capacity

19 Aug 2017, 19:30

Finally! A useful post, I applaud you for that.
Helgef wrote:StrSplit doesn't know the required capacity of the output, because it doesn't know the number of delimiters in the input
If you would also stop assuming I'm a moron, explaining obvious things to me and start responding to what I'm actually saying we might actually get somewhere. It knows the required capacity of the output before returning this is the only thing that's relevant.
Helgef wrote: returns the array without shrinking it, as it should. The cost for expanding has been paid, it shouldn't be assumed that user do not want to add more values after the split, causing it to pay for yet another expansion. When you do not want to add to your array anymore, shrink it using setCapacity(0), if desired.
No, it should shrink it. The cost of one invocation of std::copy is completely negligible, it should be assumed that the user doesn't want to add more values after the split, returning an array without shrinking will cause an unnecessary memory usage, it can be expanded with setCapacity if desired.
Helgef wrote: Which one is better?
For all practical intents and purposes they are the same. I can make my own contrived example that will use 1GB of memory in one case and 2GB of memory in the other, which one is better?

Now you did make a valid argument, there can be some completely negligible advantages to using the existing array. I disagree but can't say that you are objectively wrong. If you wouldn't automatically assume that people who don't post here much are idiots we could've gotten there much faster. Still waiting for a response to the actual problem I have with this:
obeeb wrote:I want to know what my code does, if StrSplit can return an array almost twice as big as the number of elements in it this should be documented.
just me wrote:A1.Length(): 100000
Time: 562

A2.GetCapacity(): 100000
A2.Length(): 100000
Time: 47

A3.GetCapacity(): 131072
A3.Length(): 100000
Time: 15
I don't know what you wanted to show with this, but in any case if you want to measure time in a meaningful way you should use QueryPerformanceCounter.
The first example is exactly what I talked about this is caused by the change Lexikos made(don't know when), inserting into the array doesn't double it's capacity anymore but only increases it by one. Copying the array 100000 times is costly this is not comparable in any way to doing it one time at the end of StrSplit.
Running your code(with QueryPerformanceCounter) on 1.1.14 version which does double the array on insert I get:
A1 Time:~18
A2 Time:~17
A3 Time:~10

The same on the latest 1.1.26
A1 Time:~500
A2 Time:~25
A3 Time:~8

The full code, I changed push to insert to make it work on the earlier version used QueryPerformanceCounter and also didn't add your last example(I understand even less what you wanted to show with it).

Code: Select all

SetBatchLines, -1
DllCall("QueryPerformanceFrequency", "Int64*", Frequency)
A1 := []
DllCall("QueryPerformanceCounter", "Int64*", S)
Loop, 100000
   A1.insert(A_Index)
DllCall("QueryPerformanceCounter", "Int64*", T)
T := (T - S) * 1000 / frequency
MsgBox % "A1.GetCapacity(): " A1.GetCapacity() "`nA1.Length(): " A1.Length() "`nTime: " T ; Time: 562

A2 := []
A2.SetCapacity(100000)
DllCall("QueryPerformanceCounter", "Int64*", S)
Loop, 100000
   A2.insert(A_Index)
DllCall("QueryPerformanceCounter", "Int64*", T)
T := (T - S) * 1000 / frequency
MsgBox % "A2.GetCapacity(): " A2.GetCapacity() "`nA2.Length(): " A2.Length() "`nTime: " T ; Time: 31

V := ""
Loop, 100000
   V .= "," . A_Index
V := LTrim(V, ",")

DllCall("QueryPerformanceCounter", "Int64*", S)
A3 := StrSplit(V, ",")
DllCall("QueryPerformanceCounter", "Int64*", T)
T := (T - S) * 1000 / frequency
MsgBox % "A3.GetCapacity(): " A3.GetCapacity() "`nA3.Length(): " A3.Length() "`nTime: " T ; Time: 16
just me
Posts: 9425
Joined: 02 Oct 2013, 08:51
Location: Germany

Re: StrSplit() returns an array of unexpected capacity

20 Aug 2017, 02:43

obeeb wrote:I can make my own contrived example that will use 1GB of memory in one case and 2GB of memory in the other, which one is better?
Array.GetCapacity() doesn't retrieve the total size of memory used for the array.
Returns the current capacity of an object or one of its fields.
which is
The maximum number of key-value pairs the object should be able to contain before it must be automatically expanded.
obeeb wrote:I don't know what you wanted to show with this, ..
I wanted to show you the costs of shrinking an array's capacity to fit the current number of key-value pairs after each change.
obeeb wrote:... if you want to measure time in a meaningful way you should use QueryPerformanceCounter.
I think A_TickCount is meaningful enough in this case (562 <> 47).
obeeb wrote:... and also didn't add your last example(I understand even less what you wanted to show with it).
I wanted to show you that the size of the string buffer used for 132 Unicode characters (266 bytes) is actually 512 bytes. Do you call this a bug, too?
Helgef
Posts: 4709
Joined: 17 Jul 2016, 01:02
Contact:

Re: StrSplit() returns an array of unexpected capacity

20 Aug 2017, 04:15

obeeb. I do not assume you are a moron, I'm sorry you feel that way, I do not know what you know and don't, you asked me a direct question and I spent my time trying to answer it.
It knows the required capacity of the output before returning this is the only thing that's relevant.
[...]
it should be assumed that the user doesn't want to add more values after the split, returning an array without shrinking will cause an unnecessary memory usage, it can be expanded with setCapacity if desired.
Our opinions differ. In my opinion, it doesn't know the required capacity before returning it, it only knows the current capacity, beacuse it doesn't know what the user wants to do with it. Now, you think it is valid to assume that the user doesn't want to add to the array, and with that assumption, I agree with you, it should shrink the array. But to me, it is more flexible to not make that assumption, which was the point of my contrived example. In the end, the developer is to decide which assumptions to make. There will be drawbacks and benefits with each one.
I want to know what my code does, if StrSplit can return an array almost twice as big as the number of elements in it this should be documented.
It seems you already know what the code does. I agree a more detailed documentation on the implementation would be great, but documenting is a tedious task, and the interested user can look in the source code, or ask in the help forum.
just me wrote:I wanted to show you that the size of the string buffer used for 132 Unicode characters (266 bytes) is actually 512 bytes.
It is a great example, showing a favor of speed vs memory usage, where the user has the option to conserve memory when wanted. Had ahk kept on shrinking the string buffers, there would be no option for the user to increase the speed performance. @obeeb, to be clear, I know this is not what you talk about.

Cheers.
obeeb
Posts: 140
Joined: 20 Feb 2014, 19:15

Re: StrSplit() returns an array of unexpected capacity

20 Aug 2017, 11:14

Helgef wrote:obeeb. I do not assume you are a moron, I'm sorry you feel that way, I do not know what you know and don't, you asked me a direct question and I spent my time trying to answer it.
You started with the assumption that I don't know what I'm talking about, this is not a respectful way to treat a fellow human. I'm not going to be coy, I understand were you're coming from, most people who post here are not programmers and do in fact don't know what they are talking about however I happen to be one and I'm not used to be treated in such way. I have done the same thing when I tried to help someone and it was completely wrong of me.

I came here with the goal of joining this community and help making it and Autohotkey better, this was the third(out of 3) frustrating interaction I had with a prominent member, @just me in this same thread and also masonjar(with him it wasn't even related to Autohotkey he just immediately assumed I'm generally an idiot who can't understand a simple sentence) being the other two. The one with you was marginally better than the other two but because it was the third I responded in the most aggressive way. I don't feel it was aggressive enough to warrant an apology but you're not responsible for other peoples' actions and it was unfair to you.
Helgef wrote:
I want to know what my code does, if StrSplit can return an array almost twice as big as the number of elements in it this should be documented.
It seems you already know what the code does. I agree a more detailed documentation on the implementation would be great, but documenting is a tedious task, and the interested user can look in the source code, or ask in the help forum.
I only know because I saw this thread, it would've never occurred to me that StrSplit returns an array with larger capacity then it needs to. This is not about me personally but about the quality of Autohotkey and it's documentation.
If you look at the original post:
pneumatic wrote:I'm guessing there is some technical reason for this, but it doesn't appear to be documented.
OP is not looking for a solution he's confused by the behavior, I find it terrible for Autohotkey and I don't think it's reasonable to expect people to look at the source code or come here(you can of course have a different opinion).
This can be fixed with one line of text in documentation or changed with one line of code, I will gladly submit a pull request with that line.

My biggest problem with all this and what made me jump in is @just me telling OP that he shouldn't have posted it. Even if the decision is to do nothing discouraging people to submit possible bugs is very bad for Autohotkey.

As to the rest of our disagreement I think we reached an understanding and for Autohotkey scripts I really think it doesn't make any difference what you do(as long as it's documented).

@just me
just me wrote:
obeeb wrote:I don't know what you wanted to show with this, ..
I wanted to show you the costs of shrinking an array's capacity to fit the current number of key-value pairs after each change.
I will repeat myself one last time, doing something 100000 is not the same as doing it one time. This is not a good way to calculate the actual cost of the operation but for illustrative purposes let's do it anyway. It took 475 more milliseconds to create a new array 100000 times it means that 1 time took 475 / 100000 = 0.00475. This means StrSplit will take a whooping 0.004 milliseconds more to complete.
The following was in the first line I wrote in this thread:
obeeb wrote:This provides a reasonable trade-off between simplicity, speed and size.
Using a little bit of deductive reasoning you can understand that I'm perfectly aware of the fact that there is a cost associated with it, why you wanted to show me something that I demonstrated to already know is beyond me.

You keep ignoring what I write and just comment on whatever you want so I will not bother to respond to the rest of your post, feel free to respond with an explanation about regular expressions or maybe ImageSearch.
just me
Posts: 9425
Joined: 02 Oct 2013, 08:51
Location: Germany

Re: StrSplit() returns an array of unexpected capacity

21 Aug 2017, 02:44

obeeb wrote:... I will not bother to respond to the rest of your post, feel free to respond with an explanation about regular expressions or maybe ImageSearch.
I'm quite willing to give you some explanation about expressions or ImageSearch as soon as you open a new topic in "Ask For Help".
obeeb
Posts: 140
Joined: 20 Feb 2014, 19:15

Re: StrSplit() returns an array of unexpected capacity

21 Aug 2017, 06:57

just me wrote:
obeeb wrote:regular expressions
I'm quite willing to give you some explanation about expressions or ImageSearch as soon as you open a new topic in "Ask For Help".
WOW!!! You even managed to talk about something else responding to my jest.

At this point I will have to assume you're trolling but just in case you are serious, I DON'T need any help, definitely not from you.
just me
Posts: 9425
Joined: 02 Oct 2013, 08:51
Location: Germany

Re: StrSplit() returns an array of unexpected capacity

21 Aug 2017, 11:15

@obeep:

I am serious, so back to the roots again:
pneumatic wrote:For some reason the StrSplit method returns a larger array "capacity" than if the elements are manually added. I'm guessing there is some technical reason for this, but it doesn't appear to be documented.

Code: Select all

String := "1,2,3,4,5,6"
Array := StrSplit( String, "," )
MsgBox % "Array.GetCapacity(): " Array.GetCapacity() "`nArray.Length(): " Array.Length()

Array := ["1","2","3","4","5","6"]
MsgBox % "Array.GetCapacity(): " Array.GetCapacity() "`nArray.Length(): " Array.Length()
  • There's no section in the documentation telling you that Array.GetCapacity() will always return the same value as Array.Length() for simple arrays, so it does not contradict 'documented behaviour'.
  • The whole post is a kind of question ("For some reason...", "I'm guessing...") and thus belongs to "Ask For Help".
  • If pneumatic would suggest to document this behaviour it should have been posted in "Wish List -> Suggestions on documentation improvements".
  • If pneumatic would suggest to change this behaviour it should have been posted in "Wish List", too.
My conclusion: Legitimate question posted in the wrong section!

I partially agree with your arguments related to really large arrays, but this doesn't change my mind in respect of the OP.
obeeb
Posts: 140
Joined: 20 Feb 2014, 19:15

Re: StrSplit() returns an array of unexpected capacity

21 Aug 2017, 14:10

just me wrote:There's no section in the documentation telling you that Array.GetCapacity() will always return the same value as Array.Length() for simple arrays, so it does not contradict 'documented behaviour'.
No argument there.
just me wrote:The whole post is a kind of question ("For some reason...", "I'm guessing...") and thus belongs to "Ask For Help".
Also correct but I have an issue with it I will expand upon later.
just me wrote:If pneumatic would suggest to document this behaviour it should have been posted in "Wish List -> Suggestions on documentation improvements".
He didn't understand it enough to do that when he posted here. I also don't think the location of documentation improvements is intuitive(I had no idea it was there and I lurk here since the very early days of ahkscript.org) but it might be just me and it's a separate discussion.
just me wrote:If pneumatic would suggest to change this behaviour it should have been posted in "Wish List", too.
Same as the previous point.
just me wrote:My conclusion:Legitimate question posted in the wrong section!
From looking at the last 12 threads in this sub only 2 are actually bugs ,the others are completely documented and expected behavior or less questionable than this one. Your conclusion is correct but I don't think your one line response to him: "So why did you post this in "Bug Reports"?" is justified. All the other posters who shouldn't have posted here received much better treatment.

I think that people should be encouraged to post possible bugs and as long as it's not something completely ridiculous they should be thanked for posting followed by an explanation that if their question starts with "I'm guessing..." it belongs in the "Ask For Help" forum. There were 103 threads opened in the "Bug Reports" forum during the last year(unless there were many more that were deleted or moved and in that case I'm way off with my calculation), that's less than 1 bug report every 3 days! It's really not like you are swarming with bug reports that drastic measures are required to limit the amount of them.

I hope that your next response will be on topic(as was the previous) and that if I stay here we will have more pleasant exchanges in the future.
just me
Posts: 9425
Joined: 02 Oct 2013, 08:51
Location: Germany

Re: StrSplit() returns an array of unexpected capacity

21 Aug 2017, 16:01

I don't know whether you think it's on topic, but it's the last thing I have to say in this context: Rename 'Bug Reports' to 'Issues'
pneumatic
Posts: 338
Joined: 05 Dec 2016, 01:51

Re: StrSplit() returns an array of unexpected capacity

21 Aug 2017, 21:15

just me wrote:There's no section in the documentation telling you that Array.GetCapacity() will always return the same value as Array.Length() for simple arrays, so it does not contradict 'documented behaviour'.
The issue is with StrSplit not GetCapacity
documentation wrote: StrSplit()
For example: "," would divide the string based on every occurrence of a comma. Similarly, [A_Tab, A_Space] would create a new array element every time a space or tab is encountered in the input string.
The part in bold isn't strictly correct because it creates more array elements than just "every time a space or tab is encountered".

Predicted response: nothing wrong because GetCapacity doesn't return the number of "elements" but some other [undocumented] thing :D
just me wrote:"Wish List -> Suggestions on documentation improvements".
It's a fine line between "suggestion on documentation improvements" and "problems with documented functionality" :think:

Return to “Bug Reports”

Who is online

Users browsing this forum: No registered users and 40 guests