Static initializers and auto-execution

06 Apr 2018, 19:19

Current implementation and problems

It has been pointed out that nested functions can't be referenced by name (with CallbackCreate, Func with a variable, etc.) in a static initializer. This is due to a more fundamental issue.

Conceptually, local variables only exist while the function is executing. Each time the function is called, it creates a new set of local variables. So how can a static initializer refer to a local variable, given that it executes when the script starts, without the function having been called?

This is due to the current implementation:

I wrote:In AutoHotkey, all non-dynamic variable references are resolved at the moment the script launches, including local variables. Each function has one set of local variables which exist from the moment the script launches until it exits. The code within the function contains pointers to these variables. Recursion is handled by backing up the local variables when a second layer of the function begins and restoring them after it returns, so at any one time, the local variables belong to the top layer of the function.

Closures may be associated with any layer of a function still running, or a set of variables from a function which has since returned. So obviously, they can't refer to the "top-layer" local variables described above. What they need are "free variables", which are not tied to the top layer of the function, are not freed when the function returns, and are able to exist for as long as a closure refers to them.

Closures are implemented by detecting which variables need to be "free variables", allocating those when the outer function begins, and turning the top-layer variables in the outer and inner function into aliases of the free variables.
Source: nested-functions ByRef parameters - AutoHotkey Community

So currently, a static initializer can use a local variable, and it will retain its value until the first call to the function returns:

Code: Select all

fn()  ; #2: 1
fn()  ; #3: blank
fn() {
    static x := MsgBox(y := 1)  ; #1: 1
    MsgBox y
}

Similarly, non-dynamic function references are resolved according to their placement in the script, and there is no exception for static initializers. In some cases this is perfectly legitimate; i.e. when the nested function does not refer to the outer function's local variables:

Code: Select all

outer() {
    static x := inner()
    inner() {
        MsgBox "inner"
    }
}

In the next case, the static initializer can refer to the (blank) local variable, but the nested function lacks the context needed for it to execute correctly, so an exception is thrown:

Code: Select all

outer() {
    static x := (MsgBox(outervar), inner())
    outervar := 1
    inner() {
        MsgBox "inner"
        (outervar)
    }
}

The problem with CallbackCreate and similar is that the function isn't executing, so any dynamic references cannot resolve to nested functions/local variables. Even if it could get a reference, it would not be associated with any particular call to the function. So if the nested function refers to the outer function's local variables, which set of local variables should that be?

Func("nested_function") (partly) works due to an optimization, which was originally only intended to improve performance, not affect behaviour. However, since the outer function isn't executing when Func() is called, what you get is a plain Func reference, never a Closure. If the outer function has free variables (that is, if nested_function should be a closure), the reference you get can only be called while the outer function is executing. If it is called, it is associated with the "top layer" of the outer function. For example:

Code: Select all

outer1()  ; OK
outer2()  ; OK
outer3().call()  ; FAIL

outer1() {
    static x := Func("inner1")
    x.call()
    inner1() {
        MsgBox "inner1"
    }
}

outer2() {
    static x := Func("inner2")
    local outervar
    x.call()
    inner2() {
        MsgBox "inner2"
        (outervar)
    }
}

outer3() {
    static x := Func("inner3")
    local outervar
    return x
    inner3() {
        MsgBox "inner3"
        (outervar)
    }
}

Although a nested function cannot be called if it lacks the needed context, it can contain static initializers. Since static initializers can contain references to local variables, they can also contain references to "upvars" (the local aliases for outer variables) or "downvars" (the local aliases for variables which a nested function needs to capture). This presents two problems:

The static initializer could assign a string or object to the upvar/downvar before it becomes an alias. This would never be freed.
If a function is running, its upvars/downvars are aliases for whichever set of variables is associated with the "top layer" of the function. At some point after the last layer returns, the aliases become dangling pointers. So given static initializers X and Y (defined in that order), if X calls the function inside which Y is defined, Y may try to dereference a dangling pointer.

Possible solutions

For just the last two problems, the "upvars/downvars" could be freed before becoming aliases and reverted to non-aliases after the function returns. However, this imposes a performance penalty on all legitimate calls just to allow virtually illegitimate ones. That is why "externally called subroutines" (as described here) are not permitted in v2.

1. References to nested functions or local variables in static initializers could be prohibited.

Care must be taken to not lose flexibility; for instance, a static initializer should be able to call a self-contained nested function. A function would need certain limitations to be considered self-contained, such as not having any "upvars", nor calling a non-self-contained nested function or referring to one by name, nor referring to any function by name if that name cannot be determined at load-time (e.g. Func(var) vs Func("name")). This seems complex, which is a problem for both implementation and usability. This is why "Func vs Closure" rule is very simple: if any of the outer function's local variables are referenced by any nested function, they all must be closures.

2. The minimum context could be established before evaluating the static initializers. That is, effectively "call" the function with blank parameters, and supply a set of empty free variables if the function is a closure.

3. Static initializers could be evaluated the first time the function is called. This would legitimize local variables and closures within static initializers, eliminating any weird behaviour. There are at least two ways it could be implemented:

Evaluate all static initializers before execution "begins" at the top of the function.
Evaluate each static initializer only if and when that line is reached. That is, each static line would act like a normal assignment (or multi-statement) the first time it is reached during execution, then it would become a no-op. This might add greater flexibility.

Either method would reduce the occurrence of problems when a static initializer calls a function which has its own static initializers. (With the current implementation, the target function's static variables might not have been initialized yet.) The second method would probably be better for that, since different sets of static initializers might be reached depending on how the function was called.

However, static initializers currently also serve as a means of auto-executing code on startup. A replacement for that could be cleaner and more flexible than static initializers. For instance:

Code: Select all

Initialize() {
    ;...
}
auto Initialize()
; ---- or ----
auto Initialize() {
    ;...
}
; ---- or ----
auto {
    ;... (global)
}
; ==== vs ====
Initialize() {
    ;...
    static _ := Initialize()
}

If an auto-execute statement could be used inside a function (with the v1 static behaviour), it would come back to the earlier problem of how to handle local/nested references. Perhaps allowing it doesn't make sense: calling the function is what establishes the context for code inside that function to be evaluated. If for some reason the function's static variables need to be initialized at script startup, it wouldn't be difficult to achieve by calling the function.

07 Apr 2018, 02:24

1 is a massive loss of flexibility and it seems towards me that nested function would loose one of their main selling points, if you can't define a static callback before calling the function ( I'd rather refer to another function than use if !callback )

The way you describe 2 seems to ask for massive script crashes. e.g. a Critical DllCall inside the function that relies upon some static var initialisation.

3 also doesn't seem like an option at least we will have to rewrite all functions and classes from AHK v1 when porting to AHK v2.
However having a tag which marks specific code parts of classes and functions for auto-execution seems like a good idea regardless of the solution for this problem.

I think it would be better to reevaluate closures/nested functions once the context is established.
A statically created closure should be reevaluated once the outer function is run and they are called afterwards.
If a nested function is run before any context is established it could throw an error if any variable name also appears inside the outer function - otherwise just treat it like a normal function call.
But I don't know if that behaviour is high in cost or feasible with the current framework.

07 Apr 2018, 04:24

@nnnik, I don't think you understood my explanation of #2, because I certainly don't understand how you think it could cause that. The static initializers would be evaluated much like they are now, except that the context would be set up beforehand as if the function was called (with the only exception being that parameters would be blank). The function's body would not execute.

Helgef · 07 Apr 2018, 05:09

Hello, thank you for the explaination

.

To me, solution 3.b seems preferable, even if we didn't have nested functions/closures, because it reduces the meaning of in which order functions are defined, we can avoid executing code which is never used and we can avoid manual checks of static variables, eg if !myStatic {...}. It is very straight forward to document and understand such behaviour.

@nnnik, ease of convertion shouldn't limit the (new) language.

Cheers.

iseahound · 07 Apr 2018, 05:46

If I understand correctly, the current implementation of static does two things: implement this code at startup, and save the value for future use. I think solution 3.b's greater flexibility could be used to separate the two distinct use cases, reserving static for "save this value after the first evaluation" , and "auto" to separate auto-execute code. A major side benefit as Helgef mentioned would be a speed-up of scripts on startup, with static values not being evaluated until they are reached.

The second method would probably be better for that, since different sets of static initializers might be reached depending on how the function was called.

I think retaining non-commutativity is a good idea. Method 3.a feels as if the compiler knows best, and might encourage users to focus on how the language might interpret their code, instead of letting users dictate the order their code should be interpreted in. After encountering too much unexpected behavior, the user might stop using the static keyword altogether.

07 Apr 2018, 11:42

@lexikos
Yeah I really didn't understand what you said about option 2 - towards me calling means executing the functions body. You decribed it

@helgef Well lexikos does seem to care for compatability at least from what I have seen.

Reading through it again I must have missed out the point where you said that 3.b also involves lines only getting executed once.

If thats an option I am for that option. Thinking about static it currently does 3 things.

A: Making the line execute only once.
B: Executing it automatically at the start of the script.
C: Adding all variables that are attatched to the correponding static scope of the outer syntax element ( either classes or functions )

We can't have static do 2 completely different things in classes and functions - so I think exploding the static keyword into seperate subparts might really be a good idea.
Depending on how these subparts look it might become an decisive advantage over other languages.

just me · 08 Apr 2018, 03:27

Closures:
I don't think I understand the concept of closures completely. The samples in the OP describing the problems are very simple. Related to solution 2, would anybody show me a real-life example of the advantage of creating a closure based on empty 'free variables'?

Local variables used in static initializers:

lexikos wrote:So currently, a static initializer can use a local variable, and it will retain its value until the first call to the function returns ...

I think that very most users don't expect this behaviour. They expect that all not explicitely initialized variables are empty whenever the function is called. If it is not possible to clear the local variables used in static initializers the usage should be prohibited.

Auto-execution flexibility:

lexikos wrote:Either method would reduce the occurrence of problems when a static initializer calls a function which has its own static initializers. (With the current implementation, the target function's static variables might not have been initialized yet.)

I think that 3.a should be added as an option expanding the current behaviour. Whenever a function is called during the statics initialization process, initialize its static variables, if needed.

Nested functions:

v2 docs wrote:A nested function is not accessible outside of the function which immediately encloses it, but is accessible anywhere inside that function, including inside other nested functions.
...
Func can be used to retrieve a reference to a nested function, which can be called even after the outer function returns.

Is the latter restricted to closures?

10 Apr 2018, 04:42

just me wrote:Related to solution 2, would anybody show me a real-life example of the advantage of creating a closure based on empty 'free variables'?

The advantages are that dynamic references to functions will work, and the function will not crash the program or leak memory. It is purely about making nested functions safe to use in static initializers, not about providing something uniquely useful.

However, 'free variables' can be used to share state between closures, or across function calls to the same closure (but different 'instances' of the same nested function may have different values, unlike static variables). A closure in this context would be similar to a method within a class (the outer function), except that the 'free variables' are private and really are variables (so can be passed ByRef), unlike properties/array elements. So the 'free variables' may start out empty, but be assigned values by a closure (or multiple closures).

Is the latter restricted to closures?

No. If it was, it would say "closures", not "nested functions".

Like I said, Func("nested_function") works within a static initializer only by accident, and it returns a Func, not a Closure. This should not be possible if the function refers to 'free variables'. If it does, but doesn't have any bound to it, it is incomplete and cannot be called. However, if you call it while the outer function is running, it is assumed to be a direct call (like nested_function()) and is automatically associated with the current set of 'free variables'. (It was designed this way for simplicity and performance: direct calls can be handled much like they always were and do not need to actually allocate a Closure object.)

I think that 3.a should be added as an option expanding the current behaviour.

The current behaviour is to evaluate static initializers in global context, immediately before the auto-execute section. If 3.a is added to the current behaviour, it will not solve any of the problems this topic was started for. It would add inconsistency; those problems being avoided only if the function was called from a static initializer before its own static initializers were reached.

just me · 10 Apr 2018, 11:06

lexikos wrote:... those problems being avoided only if the function was called from a static initializer before its own static initializers were reached.

That's exactly what I intended. It would solve the only problem I already had with static initializers calling user-defined functions and it might even work in v1.1.

...it will not solve any of the problems this topic was started for.

AFAICS, this topic's title is "Static initializers and auto-execution", so how wouldn't it not solve any problem?

Problems related to nested fundtions and closures should be discussed by those seeing any real usage for accessing them in static initializers; I don't.

11 Apr 2018, 01:49

The topic title does not describe any of the problems this topic was started for. It is just a title, not a definition of the scope of this discussion. I did not refer to the title, only the problems this topic was started for.

Static initializers and auto-execution

Static initializers and auto-execution

Re: Static initializers and auto-execution

Re: Static initializers and auto-execution

Re: Static initializers and auto-execution

Re: Static initializers and auto-execution

Re: Static initializers and auto-execution

Re: Static initializers and auto-execution

Re: Static initializers and auto-execution

Re: Static initializers and auto-execution

Re: Static initializers and auto-execution

Who is online