I did a test on the loading speed of ahk v2, It took 10.8 seconds to load a 27,800 line script containing a long string.
After I used std::wstring as the memory allocator for Script::LineBuffer, loading time reduced to 0.4 seconds.
Loading scripts with very long strings(10mb+) will be very slow
Loading scripts with very long strings(10mb+) will be very slow
Last edited by thqby on 17 Oct 2021, 00:36, edited 1 time in total.
Re: Using the std::string library can greatly speed up script loading
but did u check how much the code size ballooned afterwards???
Re: Using the std::string library can greatly speed up script loading
The 32-bit version is 3-4 KB larger.
Re: Using the std::string library can greatly speed up script loading
Hey I happen to like how small the executable is, and despite it being the perfect size for payloads, it's definitely one of the most under appreciated aspects of AHK. Super well designed, no bloat at all.
Re: Using the std::string library can greatly speed up script loading
I have never had a script even close to 27k lines long with a long string. You didn't state what the code size was for AHK 64-bit (which I use) but 3-4 KB larger for 32-bit seems pretty minimal to me...
Re: Using the std::string library can greatly speed up script loading
4 KB larger for 64-bit
Re: Using the std::string library can greatly speed up script loading
4 KB for a basic buffer which expands according to unknown rules, that happen to perform better for what sounds like an extreme edge case?
What does it even mean to use "std::wstring as the memory allocator for Script::LineBuffer"? wstring isn't a memory allocator, and isn't intended to be used as a generic buffer.
The two buffers are reused for the entire file, and are expanded in increments of 4096 characters only when the capacity is reached. If the longest line is n characters, I suppose the worst case is something like 2*Ceil((n+1)/4096) allocations, where all but the first also require copying the previous content to the new memory block. It does not matter how many lines there are, because once a buffer is expanded to n characters, it can fit all subsequent lines of up to n characters and does not require reallocation.
I have done a few tests with lines as long as 1,000,000 characters and continuation sections of around 550,000 characters of HTML including escape sequences where line breaks would be. The long lines have virtually no impact on load time, while the very large continuation sections with escape sequences have a big impact which scales with the number of sections (i.e. the cost is unrelated to allocating the buffer, which is already sufficiently sized after the first two sections).
I did another test with a line of 550,000 characters (without escape sequences) followed by as much legitimate v2 code as I could bother collecting into one file. I gave up at around 5000 lines, 175,000 characters (not including the long line), having not seen any increase in load time (measured externally, by SciTE, so including the process launch overhead). I see 0.12-0.16 seconds regardless of whether it contains this code or nearly nothing.
Incremental expansion would theoretically perform poorly for cases where expansion far greater than the increment is needed. However, reuse of the buffer should make the difference negligible in the context of launching the script. If not for that, one conventional solution would be to expand exponentially; implementing that would be just a case of replacing return Realloc(size + EXPANSION_INTERVAL); with something like return Realloc(size ? size * 2 : 0x1000);, which would not increase code size.
Please provide a file that demonstrates this significant difference in load time.
What does it even mean to use "std::wstring as the memory allocator for Script::LineBuffer"? wstring isn't a memory allocator, and isn't intended to be used as a generic buffer.
The two buffers are reused for the entire file, and are expanded in increments of 4096 characters only when the capacity is reached. If the longest line is n characters, I suppose the worst case is something like 2*Ceil((n+1)/4096) allocations, where all but the first also require copying the previous content to the new memory block. It does not matter how many lines there are, because once a buffer is expanded to n characters, it can fit all subsequent lines of up to n characters and does not require reallocation.
I have done a few tests with lines as long as 1,000,000 characters and continuation sections of around 550,000 characters of HTML including escape sequences where line breaks would be. The long lines have virtually no impact on load time, while the very large continuation sections with escape sequences have a big impact which scales with the number of sections (i.e. the cost is unrelated to allocating the buffer, which is already sufficiently sized after the first two sections).
I did another test with a line of 550,000 characters (without escape sequences) followed by as much legitimate v2 code as I could bother collecting into one file. I gave up at around 5000 lines, 175,000 characters (not including the long line), having not seen any increase in load time (measured externally, by SciTE, so including the process launch overhead). I see 0.12-0.16 seconds regardless of whether it contains this code or nearly nothing.
Incremental expansion would theoretically perform poorly for cases where expansion far greater than the increment is needed. However, reuse of the buffer should make the difference negligible in the context of launching the script. If not for that, one conventional solution would be to expand exponentially; implementing that would be just a case of replacing return Realloc(size + EXPANSION_INTERVAL); with something like return Realloc(size ? size * 2 : 0x1000);, which would not increase code size.
Please provide a file that demonstrates this significant difference in load time.
Re: Using the std::string library can greatly speed up script loading
Without limiting the string length, Realloc executes slower and slower as size increases.
It can also be significantly faster with return Realloc(size ? size * 2 : 0x1000); instead.
It can also be significantly faster with return Realloc(size ? size * 2 : 0x1000); instead.
Re: Using the std::string library can greatly speed up script loading
Putting aside my testing and explanations, I asked for two things in my previous post:
However, since it seems that this may be futile, I will just change Realloc to use exponential growth.
- Clarification about how std::wstring was used.
- A file that can be used to reproduce the issue.
However, since it seems that this may be futile, I will just change Realloc to use exponential growth.
Re: Loading scripts with very long strings(10mb+) will be very slow
@lexikos would you mind reviewing the commit related to this again: https://github.com/Lexikos/AutoHotkey_L/commit/5470f96b7f8a1c5ff03cfe66eb5a465d279d1514
I'm seeing newsize calculated and not used, and I can't look away. I don't really know c++ so I hope I'm not being annoying over nothing.
I'm seeing newsize calculated and not used, and I can't look away. I don't really know c++ so I hope I'm not being annoying over nothing.
Re: Loading scripts with very long strings(10mb+) will be very slow
Thanks.
Reading a large (>4096 chars) continuation section or two would be slower, although it would probably have to be pushed to extremes to be noticeable.
Reading a large (>4096 chars) continuation section or two would be slower, although it would probably have to be pushed to extremes to be noticeable.
Return to “AutoHotkey Development”
Who is online
Users browsing this forum: No registered users and 20 guests