Loading scripts with very long strings(10mb+) will be very slow

Discuss the future of the AutoHotkey language
User avatar
thqby
Posts: 560
Joined: 16 Apr 2021, 11:18
Contact:

Loading scripts with very long strings(10mb+) will be very slow

10 Oct 2021, 11:02

I did a test on the loading speed of ahk v2, It took 10.8 seconds to load a 27,800 line script containing a long string.
After I used std::wstring as the memory allocator for Script::LineBuffer, loading time reduced to 0.4 seconds.
Last edited by thqby on 17 Oct 2021, 00:36, edited 1 time in total.
swagfag
Posts: 6222
Joined: 11 Jan 2017, 17:59

Re: Using the std::string library can greatly speed up script loading

10 Oct 2021, 11:06

but did u check how much the code size ballooned afterwards???😱
User avatar
thqby
Posts: 560
Joined: 16 Apr 2021, 11:18
Contact:

Re: Using the std::string library can greatly speed up script loading

10 Oct 2021, 11:28

The 32-bit version is 3-4 KB larger. :D
iseahound
Posts: 1582
Joined: 13 Aug 2016, 21:04
Contact:

Re: Using the std::string library can greatly speed up script loading

10 Oct 2021, 18:08

Hey I happen to like how small the executable is, and despite it being the perfect size for payloads, it's definitely one of the most under appreciated aspects of AHK. Super well designed, no bloat at all.
User avatar
kczx3
Posts: 1677
Joined: 06 Oct 2015, 21:39

Re: Using the std::string library can greatly speed up script loading

11 Oct 2021, 12:43

I have never had a script even close to 27k lines long with a long string. You didn't state what the code size was for AHK 64-bit (which I use) but 3-4 KB larger for 32-bit seems pretty minimal to me...
lexikos
Posts: 9780
Joined: 30 Sep 2013, 04:07
Contact:

Re: Using the std::string library can greatly speed up script loading

13 Oct 2021, 07:02

4 KB for a basic buffer which expands according to unknown rules, that happen to perform better for what sounds like an extreme edge case?

What does it even mean to use "std::wstring as the memory allocator for Script::LineBuffer"? wstring isn't a memory allocator, and isn't intended to be used as a generic buffer.

The two buffers are reused for the entire file, and are expanded in increments of 4096 characters only when the capacity is reached. If the longest line is n characters, I suppose the worst case is something like 2*Ceil((n+1)/4096) allocations, where all but the first also require copying the previous content to the new memory block. It does not matter how many lines there are, because once a buffer is expanded to n characters, it can fit all subsequent lines of up to n characters and does not require reallocation.

I have done a few tests with lines as long as 1,000,000 characters and continuation sections of around 550,000 characters of HTML including escape sequences where line breaks would be. The long lines have virtually no impact on load time, while the very large continuation sections with escape sequences have a big impact which scales with the number of sections (i.e. the cost is unrelated to allocating the buffer, which is already sufficiently sized after the first two sections).

I did another test with a line of 550,000 characters (without escape sequences) followed by as much legitimate v2 code as I could bother collecting into one file. I gave up at around 5000 lines, 175,000 characters (not including the long line), having not seen any increase in load time (measured externally, by SciTE, so including the process launch overhead). I see 0.12-0.16 seconds regardless of whether it contains this code or nearly nothing.

Incremental expansion would theoretically perform poorly for cases where expansion far greater than the increment is needed. However, reuse of the buffer should make the difference negligible in the context of launching the script. If not for that, one conventional solution would be to expand exponentially; implementing that would be just a case of replacing return Realloc(size + EXPANSION_INTERVAL); with something like return Realloc(size ? size * 2 : 0x1000);, which would not increase code size.

Please provide a file that demonstrates this significant difference in load time.
User avatar
thqby
Posts: 560
Joined: 16 Apr 2021, 11:18
Contact:

Re: Using the std::string library can greatly speed up script loading

16 Oct 2021, 10:47

Without limiting the string length, Realloc executes slower and slower as size increases.
It can also be significantly faster with return Realloc(size ? size * 2 : 0x1000); instead.
lexikos
Posts: 9780
Joined: 30 Sep 2013, 04:07
Contact:

Re: Using the std::string library can greatly speed up script loading

17 Oct 2021, 00:09

Putting aside my testing and explanations, I asked for two things in my previous post:
  • Clarification about how std::wstring was used.
  • A file that can be used to reproduce the issue.
I would like to understand how such a difference in performance could occur, as it doesn't fit with my understanding of how the code works, and is inconsistent with my own testing.

However, since it seems that this may be futile, I will just change Realloc to use exponential growth.
safetycar
Posts: 435
Joined: 12 Aug 2017, 04:27

Re: Loading scripts with very long strings(10mb+) will be very slow

23 Oct 2021, 10:40

@lexikos would you mind reviewing the commit related to this again: https://github.com/Lexikos/AutoHotkey_L/commit/5470f96b7f8a1c5ff03cfe66eb5a465d279d1514
I'm seeing newsize calculated and not used, and I can't look away. I don't really know c++ so I hope I'm not being annoying over nothing. :roll:
lexikos
Posts: 9780
Joined: 30 Sep 2013, 04:07
Contact:

Re: Loading scripts with very long strings(10mb+) will be very slow

23 Oct 2021, 18:04

Thanks. :facepalm:

Reading a large (>4096 chars) continuation section or two would be slower, although it would probably have to be pushed to extremes to be noticeable.

Return to “AutoHotkey Development”

Who is online

Users browsing this forum: No registered users and 20 guests