My frustration with the design of AutoHotkey has reached a peak recently. The interface of the new
BoundFunctype is inconsistent with the
Functype. The addition of this type requires me to change Plaster. The inconsistency makes it impossible for me to add support for the
BoundFunctype to Plaster's
Func-like types having
I hope that this post can encourage positive change in the development of AutoHotkey.
When considering changes to a programming language, it is important to keep its design goals in mind, to make sure that the changes help achieve, or at least do not harm achieving, the design goals.
As far as I know there is no official statement of the design goals, but Chris' and Lexikos' posts on the forums cause me to believe this is close:
- make it easy to automate Windows and Windows programs that were not designed with automation in mind
- make it easy for novice programmers to learn
AutoHotkey does achieve the goal of making it easy to automate Windows and Windows programs that were not designed with automation in mind. I have never had a problem with this aspect of AutoHotkey, nor have I encountered another programming language that does it as well, much less better.
AutoHotkey does not achieve the goal of being easy for novice programmers to learn.
Traits that make a programming language easy for a novice to learn are:
- good error handling
- distinguishing unrelated concepts
Novices make more mistakes than experts. Bad error handling makes it harder for them to learn from their mistakes. Good error handling limits the damage mistakes can cause. AutoHotkey fails dramatically in this regard by not reporting errors and continuing execution after encountering them. If a novice attempts to process files, silent corruption may upset them so much they never try programming again.
Familiarity means less to learn. Most novice programmers know some English and high school algebra. Programming languages like Python are easy for novices to learn, because they avoid requiring much beyond this. AutoHotkey's heavy use of string interpolation (
%s) for code causes difficulty for new programmers, because it is not something they are already familiar with (among other reasons).
Consistency means less to learn, remember, and write code to abstract over. Each inconsistency is one more thing to remember, that is not relevant to the problem you are trying to solve, while trying to solve your problem. Inconsistency requires writing more code to abstract over the differences that should not exist. AutoHotkey has many inconsistencies. For example, there appears to be no rhyme or reason as to where
%s are required.
Distinguishing unrelated concepts prevents undesirable behavior. If inconsistency is creating differences where there should be none, conflation is creating similarities where there should be none. AutoHotkey conflates many concepts. For example, storing a key named “HasKey” will break the method with the same name, due to conflating interface and contents.
It may be surprising to associate elegance with novices' needs. It is more often associated with experts' desires. However, elegance means not needing to learn a lot of different things, to do a lot of different things. AutoHotkey is not very elegant. For example, it has more control flow constructs than most programming languages, but they are less generally useful.
Experts benefit from the same traits that benefit novices. They do have a higher tolerance for unfamiliarity, which allows them to make use of unfamiliar but useful syntax and semantics.
Is the design of AutoHotkey so bad that fixing it is worth destroying all code written in it?
Breaking backwards compatibility in a programming language is rarely done, and when it is, it rarely ends well.
The transition from Perl 5 to 6 has resulted in most Perl programmers abandoning Perl, and most of what is left remains on 5.
The transition from Python 2 to 3 has resulted in most Python programmers remaining on 2.
While I agree that, in AutoHotkey's case, it really is that bad, I am of the opinion that v2 should not be released until it fixes most of the problems in this post. In its current state I do not find it ‘better enough’ to be worth the loss.
The Type System:
The changes to the type system do not fit neatly into categories for areas of improvement. They affect multiple categories. Discussing them also provides an overview of the programming language. Therefore it seems to be a natural place to start.
The type hierarchy should look similar to this:
This diagram is necessarily incomplete. COM objects exist outside AutoHotkey. The exception types depend on what can go wrong. That will be discussed with improving error handling. User-defined types should have
Objectas their supertype by default. Obviously I cannot list types that will be created in the future.
All types must obey the Liskov Substitution Principle. In short, each subtype must have the interface (members; i.e. properties and methods) of the supertype (but may have more members), must not require more than the supertype (but may require less), and must guarantee everything the supertype does (but may guarantee more). Without this, object-oriented programming does not work, because polymorphism does not work. That matters, because not following these rules is error-prone, and requires more code to abstract over differences that should not exist.
The development team seems to have a love-hate relationship with object-oriented programming, but supporting it is the only way to allow user-defined types to be consistent with built-in types. Even Haskell, which is normally decidedly non-object-oriented, uses typeclasses for this purpose, which are almost identical to Java interfaces.
AutoHotkey does not exist in a vacuum. Since its purpose is automation, it will often be used with Automation (a.k.a. OLE Automation; informally COM) APIs. AutoHotkey is already dependent on some Automation interfaces (e.g.
_NewEnum()from the collection interface and
║ Collection Interface ║
║ Member │ Description ║
║ Add(IndexOrKey, Value) │ a method used to insert an element ║
║ Count │ a property containing the number of elements ║
║ Item[IndexOrKey] │ a parameterized property used to look up an element ║
║ Remove(IndexOrKey) │ a method used to remove an element ║
║ _NewEnum() │ a method that returns an enumerator over the ║
║ │ elements ║
║ IEnumVARIANT Interface ║
║ Member │ Description ║
║ Clone() │ a method that returns a copy of the enumerator ║
║ Next(Count) │ a method that returns the next Count items ║
║ Reset() │ a method that resets the enumeration sequence to the beginning ║
║ Skip(Count) │ a method that attempts to skip the next Count items in the ║
║ │ enumeration sequence ║
Of course we should not adopt these interfaces without question.
Insertas an alternate name for
Addin Guidelines for Creating Collection Classes. This is what AutoHotkey uses, and I approve.
Addis a bad name for a method that performs anything other than addition.
AutoHotkey has no need for an
Itemparameterized property, but the terminology might be adopted for consistency with Automation.
Then there is the matter of what gets implemented versus what the standards require. Of the collection interface, the only members you can rely on are
_NewEnum(). Of the IEnumVARIANT interface, the only members you can rely on are
Next(Count), and perhaps
Skip(Count). With the exception of
Skip(Count), Visual Basic will fail upon trying to use the collection if any of these members are absent. Lack of standards compliance is not just a third party issue. Excel does not support the
Reset()method on its enumerators, for example.
It would be best if AutoHotkey did not rely on any additional members of these interfaces, except for
Count, which is guaranteed to be present on Automation collections. It should, however, support their use.
Countshould be present on anything with a notion of size in AutoHotkey (
Str). It should contain
0when a collection is empty, not
"". It should be a read-only property, not a method (as was being considered for v2). It should not be parameterized. This assures no additional code is needed to abstract over a difference between AutoHotkey and Automation objects that should not exist.
Clone()should be present on most mutable compound types (
Enum). It would be confusing on certain types (e.g.
File), because it would be unclear what is getting copied.
There is also the question of where integer indices should begin (for
Str, and miscellaneous uses like
It has been known for a long time (Why Numbering Should Start at Zero from 1982) that 0-based indexing is best. Half-closed intervals compose more easily than closed intervals. Imagine that you want to use an array as a circular buffer. With 0-based indexing this is easy to achieve, by initializing your index to 0, incrementing it, and using modulo to limit it to the length of the array. With any other base this requires adjustment.
There is no standard for the indexing of Automation collections. It appears that, originally, Microsoft intended 1-based indexing. The 54 Commandments of COM Object Model Design might lead one to believe this is still the case. However, Microsoft's own IE, WMI, Access, ADO, and DAO use 0-based indexing, while the rest of Microsoft Office and Visual Studio uses 1-based indexing. Third party Automation APIs supposedly usually use 0-based indexing. That means whatever AutoHotkey chooses, it will not match Automation APIs universally.
AutoHotkey also supports DLL calls, and C and C++ use 0-based indexing.
It would be preferable if AutoHotkey switched to 0-based indexing in v2, for ease of use and interoperability. There should be a warning in the documentation that the indexing of Automation collections varies.
Interfaces should be provided for operator overloading and customizable hashing. It should be an error (detected before the program starts) to override hashing but not overload equality. That is necessary for proper dictionary lookup. Python and Lua are good sources of inspiration. They use a mechanism similar to AutoHotkey's meta-functions for this purpose. Like the other standard interfaces, this allows user-defined types to be consistent with built-in types.
Good Error Handling:
The best way to handle errors is to change the programming language to make them impossible, without limiting the programming language's power.
By “power” I refer to what the programming language can express. Turing completeness is an example. In practice, most programming languages are limited by their I/O and linking facilities, not their computational facilities.
Unfortunately, making an error impossible without limiting power is also the hardest way to handle errors. Still, there are some obvious opportunities in AutoHotkey's case…
Allocation and initialization should always be combined, and initialization and mutation should always be separated. This makes it impossible to read uninitialized data at run time, and prevents typos from creating new variables. It is still possible to typo a variable name, but with this change that can (and should) be detected as an error before the program starts, by looking for variables that are read or written but not defined within the scope.
def x 0 ; allocation and initialization
x := 1 ; mutation
The “expression length limit” should be removed. I could understand a depth limitation, since parsing nested expressions recursively is easy, but risks stack overflow in programming languages like C and C++. However, parsing long expressions should be easy to perform with iteration, which carries no such risk. I know of no other programming language implementation with this problem.
AutoHotkey should switch to tracing garbage collection. I expect integrating the Boehm garbage collector to be the lowest-effort way to achieve this. That would eliminate memory leaks due to cyclical references.
Most other errors will need to be detected and reported at run time…
AutoHotkey's current policy of ignoring errors and continuing execution is obviously wrong. When an error occurs at run time, data has become inaccessible or corrupt. Continuing execution causes corruption to spread through the program's state like a plague, maximizing damage (potentially to files). When the program crashes due to operating system enforced error handling, or exits on its own, the programmer is left with no clues to help them understand why their program misbehaved.
Of course, sometimes there are clues. The programmer can check
A_LastErrorabout every other line and hope they get lucky. Not every error sets
A_LastError. They can also obsessively check for empty strings where they would not be expected. Since this requires an unbearable amount of effort, and obscures the intent of the code, it is rarely if ever done in practice.
Run time error handling should:
- be consistent
- require no effort to detect errors
- halt execution, unless the program has been written to handle the error
- if the program does not handle the error, display a message that reports:
- where the error was encountered (file and line)
- the relevant values
- what expectation was violated
- where the error was encountered (file and line)
AutoHotkey already supports exceptions, which could satisfy all these requirements, if only they were used for all run time errors. Instead, they seem to only be used to produce incomprehensible reports when COM errors are detected.
While it is the programming language's responsibility to detect most errors, and report them in a helpful way, programmers occasionally need to report their own types of errors. To assure these reports are helpful, the programming language needs good introspection support. AutoHotkey falls flat here as well.
There is the problem of defining and catching a new type of error. While you can use classes to define new types, and you can
throwanything, this is useless because objects have no string representation (more on that in a moment). Further, even if you did
throwsomething other than a string, you could not
Most programming languages have a type hierarchy of exceptions, and a
catchstatement that matches one or more types. Ideally there would be a supertype of all exceptions named something like
Exception, with two subtypes named something like
Defectexceptions should not normally be caught, because they are caused by the programmer, such as division by zero.
Systemexceptions should normally be caught, and usually involve I/O, like trying to open a locked file. All other exceptions, both built-in and user-defined, would be defined under
System. In the rare case where it is not bad design to catch all exceptions, like when implementing an interpreter, the common supertype
Exceptionmakes this possible.
,and additional type, and the
asand variable, clauses are optional. At least one type must be specified, but most of the time only one type of error will be handled by the same code. The first example shows how to write a single handler for two different types of errors. The second example shows how to write separate handlers for the same types. The first handler that matches should be the one that is executed. This design eliminates repetitive filtering code that AutoHotkey's current
There is also the problem of determining if an error has occurred. AutoHotkey, like most programming languages, has no problem checking values or relationships between values. What AutoHotkey does have problems with is checking types or interfaces. AutoHotkey v1 is incapable of type checking, and v2's support is broken. Both v1 and v2 have broken support for checking interfaces.
isoperator seems to work acceptably. This can be used to determine if the type of a value is what you expect.
It is sometimes possible to detect the presence of a property or method in AutoHotkey (v1 or v2) by using
ObjHasKey, due to the conflation of dictionary and user-defined types. This does not work for built-in types or Automation objects. When it does ‘work’, it is unreliable, due to the conflation of interface and contents. Dedicated
HasMethodfunctions should be provided that work on both AutoHotkey (built-in and user-defined types) and Automation objects. They should be functions, not methods, to avoid potentially conflicting with methods with the same name. You can use this to determine if an interface is supported by a value, which is often preferable to checking the type, because multiple types may implement the same interface without having any supertypes in common.
Once you have determined that an error has occurred, there is the problem of reporting it in a helpful way. The runtime can be expected to handle reporting the file and line number. Reporting the relevant values and what expectation was violated must be left to the programmer.
Only numbers (including Booleans) and strings convert to strings in AutoHotkey. This makes it impossible to implement good error reporting. A meta-function similar to Python's
__repr__()should be adopted. It should not be named something like “Str” or “String”, because the representation of a string may contain escape sequences and always begins and ends with double quotes. Converting a string to a string returns the same value. Converting a string to its representation does not return the same value. All built-in types must implement this meta-function. This will be required if AutoHotkey is ever to have a REPL anyway. The documentation should explain that this meta-function ideally returns source code that would produce the same value, but where that is impossible (e.g. for Automation objects) it should return a string in angle brackets that contains as much helpful information as possible (e.g. Plaster returns things like
<ComObj IDictionary at 0x0000000001234567>). The angle bracket notation for unrepresentable values seems to be universal.
falsebeing glorified integer constants makes error reports harder to read than necessary. It is important to be able to convert between Boolean and integer values, for working with integers containing flags, but it would be better if they were a subtype of integers, with
falseas their representation. That would make it considerably easier to read error reports that contain both integer and Boolean values.
Error reporting would benefit from the ability to query the path and file name of a
Typefunction seems to be missing in the latest alpha I tested (a063), but it is still in the documentation. When it ‘worked’, it reported different types for AutoHotkey and Automation enumerations (one of which had double-semicolons in the name), even though they have the same interface, requirements, and guarantees, and reported
Objectfor user-defined types, instead of their actual type.
Typewould normally be used to report what the type is when it was not what you expected. The type names should be based on the class (for user-defined types) or the factory function (for most built-in types) that produces them. The type names I listed in my suggested type hierarchy follow these rules. I also tried to keep them short, but familiar.
Other than the pervasive peppering of
%, someone that can read English, and is familiar with high school algebra, should not find AutoHotkey much less familiar than most programming languages.
The use of string interpolation for code should be eliminated for this and other reasons.
Changing the type names to be related to the class or function that constructs them as already mentioned will improve the consistency of AutoHotkey to ease learning and remembering.
Several changes already mentioned will improve the consistency of AutoHotkey to eliminate the need to write code to abstract over differences that should not exist:
- requiring types to follow the Liskov Substitution Principle (e.g.
- the standard collection interface (e.g.
- the cloneable interface
- operator overloading
- customizable hashing
- exceptions for all error handling (e.g. removing
A_LastError, and actually handling errors)
- the representation interface
There should be no observable difference between built-in and user-defined types, functions, and methods.
Examples where this principle is violated:
- user-defined types cannot “extend” built-in types
- built-in types cannot be monkey patched
- built-in methods do not support the Func interface
- built-in methods cannot be stored in variables or data structures
There should be no syntactical difference between calling a function (or method) and calling it through a function reference. To be specific, a function's identifier should be a function reference, as it is in every other programming language I have ever encountered.
MyFunc(x) ; call a function directly, through its default function reference
MyFuncRef(x) ; call a function through a function reference
Aside from harming consistency, these problems (built-in versus user-defined types/functions/methods inconsistencies, and function calling inconsistencies) hamper the programmer's ability to debug and profile their code, and work around bad design decisions in AutoHotkey. I feel it important to mention that AutoHotkey's treatment of methods as just functions with a leading
thisparameter is a very good thing. Being able to wrap an existing function in another function, without breaking code, is very useful.
There should be no observable difference between iterating over an AutoHotkey
SAFEARRAY, or an AutoHotkey
SAFEARRAYvalues end up in the
Keyvariable, and the VARIANT type constant value ends up in the
Keyshould contain the index, and
Valueshould contain the value.
Scripting.Dictionarykeys are handled correctly, but the VARIANT type constant value ends up in the
Keyshould contain the key, and
Valueshould contain the value. If the VARIANT type constant is of concern, AutoHotkey v2's
Typefunction should be extended to handle it, and it should return the VARIANT type constant name, not value, since that is much more readable.
A common complaint is being unable to remember where
%is particularly problematic, because it is used both as brackets, and alone. The bracketed form is especially hard to read, because unlike ‘normal’ brackets,
%is not directional. This requires nested uses to add parentheses. Still, these are all merely symptoms of the real problem, which is where expressions are allowed seems to be random. Expressions should be allowed anywhere they might be useful. If the programmer wants to use a literal, they can write it the way they usually would. Strings would always be enclosed in
"s, and the need for
%would be eliminated.
Bitwise-not's behavior changes based on the range of values passed to it, making it difficult to use reliably.
~Nshould always be equivalent to
-1 - N, as it is in Mathematica, and most other programming languages with bignum support (e.g. Python). Programming languages with bignum support are relevant, because, like AutoHotkey's integers which could be 32-bit or 64-bit, the bit-width of bignums varies.
Hotkeynotation is inconsistent with
Sendnotation is longer, but significantly more readable.
Sendnotation should be used everywhere.
It should be possible to add and remove hotstrings dynamically, and have them call function objects, like hotkeys.
Distinguishing Unrelated Concepts:
The conflation of arrays, dictionaries, exceptions, and user-defined types causes several problems:
- it is impossible to distinguish between arrays, dictionaries, and exceptions for type or interface checking
- it is impossible to distinguish between user-defined types that are collections and ones that are not for interface checking
- dictionary keys are case folded
- arrays are unnecessarily space and time inefficient
- dictionaries are unnecessarily time inefficient
Good error handling requires type and interface checking. Type and interface checking is also sometimes necessary to abstract over differences that should not exist (using it to choose a control flow path), which can be a problem with third party code, not just AutoHotkey.
Sometimes the case of dictionary keys matters. For example, I have written a keyboard layout optimization program (in another programming language) that made heavy use of dictionaries. Each character of a large corpus was stored along with its frequency in a dictionary, which was used for various purposes. Uppercase letters indicate Shift being pressed along with the letter, which is part of determining how frequently Shift is used. If I could not tell the difference between uppercase and lowercase letters, I could not accurately detect how often Shift is used. Many other text processing programs would need this functionality (e.g. implementing an interpreter for a case-sensitive programming language, and natural language processing).
Although efficiency should be the least important concern for a programming language like AutoHotkey, there is no reason to spend space and time if it does not make the programming language easier to use.
C++, which is what AutoHotkey is implemented in, comes with a generic dynamic array type in the Standard Template Library. If AutoHotkey's arrays were implemented as dynamic arrays of tagged unions, or references to objects, the space wasted by the singly linked lists used to implement a dictionary, and the time wasted by hashing indices and chasing pointers in the linked lists, could be saved.
C++ also comes with a generic dictionary type in the Standard Template Library. C++'s unordered map (a hash table) will not return items in an easily predictable order, hence the name. The contents of hash tables have to be sorted if they are to be abused as arrays, which wastes time. Stopping the abuse would save time. Data structures which maintain order (like red-black trees), to avoid the need for sorting, are less time efficient than hash tables.
Arrays, dictionaries, exceptions, and user-defined types should be different types. The dictionaries AutoHotkey programmers use should not case fold their keys. If dictionaries are going to be used to implement call stack frames and user-defined types (and they almost certainly are), this should not be revealed in the interface.
Missing elements are another nuisance caused by conflating arrays and dictionaries. Tolerating missing elements makes it impossible to predict the length of the array that will be produced by many transformations (e.g. take the first “n” elements, take every “n”th element, reverse the order, etc.). AutoHotkey uses missing elements to indicate default values should be used in variadic calls, so they cannot be removed without changing that. Using dynamic arrays for AutoHotkey arrays would eliminate the possibility of having missing elements. I suggest only allowing trailing unspecified arguments in variadic calls, like most programming languages (e.g. Lisp and Python). Another alternative is to use
nullto indicate the corresponding default value should be used. Sparse arrays are almost never useful, but if you want them you could always use a dictionary and sort the keys, or implement a red-black tree.
nullis not used to allow leading and internal unspecified arguments in variadic calls, it could be a useful addition to AutoHotkey. It can be used to tell the difference between nothingness and an empty string.
- be a unique type (not
- be impossible to subtype
- only have a single instance
- only support being assigned to variables and data structures, and checking for equality and inequality
- only be equal to itself
Nonehas these properties, but
nullis the term for the concept that is familiar to most programmers. Lua's
nilis almost identical, but cannot be stored in tables, which would prevent it from replacing the use of missing elements.
The conflation of interface and contents causes some problems:
- storing a key with the same name as a property or method in a dictionary will break that property or method
- the interface changes based on the contents, which is nonsensical
Any attempt to write a library or tool in AutoHotkey that works with AutoHotkey types or source code will quickly encounter these problems.
These problems are not specific to dictionaries, or dictionaries masquerading as other types. For example, the
RegExMatchtype has these problems.
.operator should refer to the interface of an object, while the
operator should refer to its contents. A change in an object's contents should never change its interface. This still allows monkey patching by using the
The rest of the problems in this section are arguably problems with consistency not distinguishing unrelated concepts, but they can only be corrected after correcting the conflations mentioned so far…
AutoHotkey currently uses two different notions of object-hood. One is based on what
truefor. The other is based on instances of
truefor more than instances of
Object. There cannot be two different types with identical names, so this should be corrected. With the improvements I suggest,
IsObjectwill no longer be needed, since everything will be an object except for
nullcan be tested for with equality. The functionality of
Objectwill be broken out into several appropriate types.
Floating point numbers used as dictionary keys are indexed by their string representation, not their value, unlike integers. AutoHotkey does not have a dictionary type as such, but when that is corrected, this should be too. It should be easy and efficient to
reinterpret_castthe floating point number to an integer for use as its hash code. Some canonicalization will need to be performed (e.g. to assure -0 and 0 have the same hash code), but that should not be too difficult. Correcting this will keep dictionary lookup from breaking if the floating point string representation is changed.
Elegance in programming language design is a result of the following characteristics:
- simplicity – few primitive constructs
- generality – the primitive constructs can be used for many different purposes
- composability – constructs can be combined to produce more complex constructs
- brevity – achieving the programmer's goal requires little code
I believe these characteristics to be both necessary and sufficient.
It may seem like brevity would naturally result from simplicity, generality, and composability. While this is usually true, pathological counterexamples can be constructed. One instruction set computers are examples of such Turing tarpits.
It may seem like pursuing elegance alone would be sufficient for good programming language design. While this is also usually true, pathological counterexamples can also be constructed for it. APL is an example of an elegant programming language that is markedly unfamiliar.
Elegant programming language designs exhibit certain tendencies that may act as signposts to indicate you are on the right path, but they are not always present:
- relevance – code requires few constructs irrelevant to the programmer's goal
- symmetry – constructs often have symmetrical relationships (e.g. inverse functions)
“Relevance” is often synonymous with “high-level”; freedom from hardware- and resource-related concerns. Some ‘irrelevant’ code will exist in most realistic programs in any programming language, no matter how elegant. For example, files will need to be read and written, even in very high-level, domain specific, programming languages. Sometimes programming languages (e.g. assembly) are designed to control hardware and resources, however.
Sometimes there is no known efficient way to implement symmetrical constructs.
Elegance benefits those that implement the programming language, not just those that use it. Elegance usually results in less code to write, test, document, and maintain.
Occasionally elegance does create work for the implementers (e.g. garbage collection). The needs of the users should be put before those of the implementers, because there are more of them. Besides, the implementers are likely to be users as well.
AutoHotkey can be transformed into an elegant programming language by:
- eliminating redundant constructs
- minimizing non-composable constructs
- generalizing constructs
Some types have redundant members:
Seek((Distance [, Origin = 0])),
In the case of
File, it is probably best to retain the
Positionproperty, rename the
Count, and remove the rest.
Positionis easier to understand than
Pos. If you know the
Countit is trivial to achieve the same things you could with the
Seekmethod's relative distance form (use
+=), and the
RegExMatch's case, it is probably best to retain
[N], and remove the rest.
[N]refers to its contents, as is proper.
Callmeta-function is redundant and should be removed. The Func interface (which has
__Call) is a superset of its functionality.
AutoHotkey v2 has function versions of most ‘commands’ except for control flow statements. The function version can be composed (nested), while the command version cannot. The function version can be used via function references, while the command version cannot. The command versions are inferior and should be eliminated. That will have the pleasant side effects of drastically reducing global namespace pollution by eliminating all the constants and keywords used by those commands, and eliminating most uses of
AutoHotkey makes heavy use of global mutable state, and until recently, required the use of unstructured control flow and hard-coded event handlers, which makes it very hard to write code that can be composed. This is an example of the fractal of bad design that can result from elegance being neglected, where bad design decisions at the programming language level force, or at least strongly encourage, bad design decisions in code written in it.
Experienced programmers may recoil upon first reading the screenfuls of global variables used in AutoHotkey. However, most of these are constants, or read-only, making them relatively innocuous.
The real culprit is the extreme configurability of AutoHotkey. Code that works under one combination of settings may malfunction under another. These settings are not scoped, so combining code that requires different combinations of settings is difficult or impossible. Most of these settings should have been function or method parameters (e.g.
SetFormat). Programming languages should not be configurable.
StringCaseSense, is worth singling out. String comparisons should always be case-sensitive. If case-insensitive comparison is desired, it is trivial to lower- or upper-case the string beforehand. AutoHotkey comes with functions for that (
StringUpper). This will eliminate the inconsistency of
==obeying different rules than the rest of the comparison operators. It will also allow
=to be removed.
=is a bad choice for a comparison operator. In mathematics it defines a permanent equality relationship. In most (C-syntaxed) programming languages it performs assignment. In AutoHotkey it is neither of those. Most programmers, upon seeing it in an expression, will assume it is an error and
AutoHotkey would be better off without
--. Consciously or not, most programmers expect expressions to be free of side effects (i.e. they expect expressions to compose).
--are normally used in expressions, and they perform assignment. Most mistakes occur when
--appear more than once in an expression. Few programmers can correctly predict the order the side effects will occur in. This kind of unnecessary and unhelpful complexity has little place in a programming language designed to appeal to novices. Lua and Python get by just fine without these operators. Python forbids any form of assignment in expressions, due to its confusing nature, which is a stance AutoHotkey should adopt.
Now that unstructured control flow is no longer required to handle events,
gotoshould be removed. Functions make
gosubis similar to a function call, only it cannot accept arguments, and labeled code blocks can overlap.
A limited amount of non-composable constructs will have to be tolerated. At least one global mutable reference or variable must exist to pass state between event handlers. Various forms of
gotothat cannot bypass initialization should also be allowed. This includes
return, and exception handling. Labels should be retained due to
continuemaking good use of them. Even in Haskell programmers end up reinventing these with monads, primarily for error handling. On rare occasions they also greatly improve time efficiency. I/O is also not composeable, but without I/O a computer is just a bad space heater.
The interfaces of the
Filetypes should be generalized.
Strshould be similar to an
Arrayof characters. Specifically, it should be possible to index them with
, and iterate over them with
for. The reason
Stris not a subtype of
Strshould be immutable, so it is safe to use strings as
Arraysupports mutation, its subtypes are required to. These changes would make it possible to write procedures that operate on both
Strwithout requiring code to abstract over differences that should not exist. Lexers frequently need to iterate over strings, character by character.
File's vast number of (
Line|Num) and (
Line|Num) methods should be reduced to one of each, and use parameters to dictate the desired behavior. It should be possible to iterate over lines of text in a file using
for. This is more brief than using a
whileloop to achieve the same effect.
forshould be generalized, and
Strchanges just mentioned provides most “parse a string”
loopfunctionality, and the
StrSplitfunction covers the rest. The
Filechanges just mentioned eliminate the need for the “read file contents”
loop. Objects with enumerators should provide
forwith “files & folders” and “registry”
os.walkcan serve as examples of handling the file system this way. The registry would be handled similarly, due to its hierarchical nature.
loopshould be omitted from the beginning of
untilloops, as in most programming languages.
Sortshould be ‘generalized’. Generalized is in scare quotes because what I really propose is changing the type that it works on from
Array. However, that is more generally useful, and it is easy to get the previous behavior with this design.
Stris a bad choice for input to
Sort, because putting data into a string loses all its structure and type information. It is also difficult to assure that the character combination used to split the string is not unexpectedly contained within the data somewhere. Further, internally, the existing
Sortimplementation must convert the string into an array. Converting data into a string, only to have it converted into an array, then converted back into a string, which will probably have to be converted back into usefully structured data, introduces a lot of unnecessary complexity and is very time inefficient.
Sortshould be a function with these signatures:
Functions with multiple signatures can be implemented by making them variadic and throwing exceptions if incorrect numbers of arguments, or arguments with incorrect types or interfaces, are passed to them. This parameter order was chosen because it is the most useful with
BoundFuncor currying. You are more likely to want to use the same comparison function with multiple arrays, than use the same array with multiple comparison functions.
Sortshould work on anything with the same interface as
SAFEARRAYs). We want it to be generalized.
The sorting algorithm should be stable; probably merge sort or some variation (like Timsort). Stable sorts can be composed to sort ‘within’ each other.
Sortshould default to using the
<operator for comparison, but it should be possible to pass a reference to a function or function object for custom comparison. While other comparisons can be made to work,
<is the one that is conventionally used for higher-order sort functions. Having the ability to customize sorting is important. It allows new types to be sorted, and existing types to be sorted new ways.
Sortshould be referentially transparent (i.e. it should return a new array, instead of changing an existing one). My experience with both forms existing in Python is the destructive version surprises novice programmers, and as with most side effects, tends to cause even experienced programmers to make occasional mistakes. If you want to use a referentially transparent sort as a destructive sort, it is as simple as assigning the return value to the original variable (e.g.
MyArray := Sort(MyArray)).
As promised, in the unlikely event that you want to destructively sort the contents of a string, it is easy to do with this design.
A new function,
StrJoin, which concatenates an array of strings, optionally inserting a string after each element, could be introduced to improve time efficiency by allocating the destination string once (since it knows the total size needed). This function would be useful for more than handling the output of
MyString := StrJoin(Sort(StrSplit(MyString, "`r`n")), "`r`n")
Sortdoes not need to perform string splitting (which
StrSplitalready does), most of the existing options to
Sortcan be eliminated.
Additional (referentially transparent) functions could be added to provide the remaining functionality that has been conflated with
Reverse– reverse the order of array elements
Shuffle– randomly rearrange an array
Uniq– remove duplicates from a sorted array
Reverseis more useful in this form, since you might want to reverse an array without sorting it. Sorting in descending order with the proposed design is as simple as passing a function equivalent to
>, or wrapping an existing custom comparison function in a function that applies
The Standard Library:
The standard library is so large that I have probably overlooked some problems. Someone that is more familiar with it (i.e. the development team) should go through it carefully, looking for naming inconsistencies in functions, methods, and parameters, and parameter order inconsistencies. These problems are not just aesthetically offensive, they cause difficulties remembering function and method names, and defects resulting from passing arguments in the wrong order.
When choosing between different parameter orders keep optional parameters and
BoundFunc(or currying) in mind. The more likely a parameter is to be omitted, the later it should appear in the parameter list. The more likely a parameter is to be reused, the earlier it should appear in the parameter list.
If a ‘real’ module system is not going to be introduced, the standard library should be broken down into ‘fake’ namespaces by abusing classes, similar to how Lua's standard library is organized by using tables. This would reduce global namespace pollution, and give more visual structure to the programming language. Problems with inconsistent naming and parameter order may become more apparent in the process.
The GUI API is difficult to understand and use because it uses the wrong paradigm. Windows and controls are objects. They are long-lived bundles of mutable state that you can perform certain limited, well defined, operations on. One might also reasonably argue they form push dataflow networks, but I do not suggest representing them that way, since that paradigm is unfamiliar to most novice programmers (though they are likely to have used spreadsheets), and it would be inconsistent with the evaluation of the rest of the programming language. They are definitely not procedures, however you look at them. Well-respected GUI API's, like QtGUI, should be used for inspiration when fixing this. It should not be necessary to write this, but the improved API should, like everything else, use references to functions and function objects for event handling.
I would prefer AutoHotkey v2 to primarily be about changing and removing constructs, without limiting the programming language's power, rather than adding them.
There are some additions that I believe would make AutoHotkey much more pleasant to use. They are presented in order from most to least important to me.
A grid layout manager would prevent AutoHotkey programs from having to micromanage controls. Other types of layout managers are often provided by GUI toolkits, but most needs can be met with only a grid layout manager. Inspiration should be taken from existing good designs like QGridLayout.
Evalcan be useful for deserialization.
A REPL would make AutoHotkey much easier to use. This would appear second on my list if it did not require
Evalto implement. It also requires the representation interface mentioned in the “Good Error Handling” section.
%is no longer used for string interpolation of code, having it return as its conventional use for modulo would be very nice.
It should be possible to write and represent integers in binary notation (e.g.
0b101). This is nice for working with binary files, which often pack several values into one or more bytes.
I encountered this distressing line in the manual:
AutoHotkey Help: Variables and Expressions wrote:In v1.0.48+, the comma operator is usually faster than writing separate expressions, especially when assigning one variable to another (e.g. x:=y, a:=b). Performance continues to improve as more and more expressions are combined into a single expression; for example, it may be 35% faster to combine five or ten simple expressions into a single expression.
There is no excuse for encouraging programmers to write all their code on one line. This could be interpreted as saying the code can be spread out over multiple lines, and commas can be used for continuation sections, but that still makes the code uglier. The implementation flaw that penalizes the time efficiency of properly written code should be fixed. It is an implementation flaw. I know of no other programming language with this problem.
I considered not posting this. I fully expect it to primarily, if not exclusively, receive dismissive responses and flames.
I decided to post it anyway, because not trying guarantees failure.
I spent several hours a day, for about two weeks, summarizing the problems I have encountered in the year I have been heavily using AutoHotkey. Hopefully some good will come of the effort.