Node.js event loop
AWS Lambda can freeze and thaw its execution context, which can impact Node.js event loop behavior.
One of the more surprising things I learned recently while working with AWS Lambda is how it interacts with the Node.js event loop.
Lambda is powered by a virtualization technology. And to optimize performance it can “freeze” and “thaw” the execution context of your code so it can be reused.
This will make code run faster, but can impact the expected event loop behavior. We’ll explore this in detail. But lets quickly refresh the Node.js concurrency model.
Already familiar with the event loop?
Go straight to the AWS Lambda section.
Concurrency model
Node.js is single threaded and the event loop is the concurrency model that allows non-blocking I/O operations to be performed1.
How? Well, we’ll have to discuss the call stack and the task queue first.
Call stack
Function calls form a stack of frames, where each frame represents a single function call.
Every time a function is called, it’s pushed onto the stack (i.e. added to the stack). And when the function is done executing, it’s popped off the stack (i.e. removed from the stack).
The frames in a stack are popped off in LIFO order.
Each frame stores information about the invoked function. Like the arguments the function was called with and any variables defined inside the called function’s body.
When we execute the following code:
We can visualize the call stack over time like this.
-
When the script starts executing, the call stack is empty.
-
main()
is called, and pushed onto the call stack:
- While executing
main
,console.log("main start")
is called, and pushed onto the call stack:
-
console.log
executes, printsmain start
, and is popped off the call stack. -
main
continues executing, callswork()
, and is pushed onto the call stack:
- While executing
work
,console.log("do work")
is called, and pushed onto the call stack:
-
console.log
executes, printsdo work
, and is popped off the call stack. -
work
finishes executing, and is popped off the call stack. -
main
continues executing, callsconsole.log("main end")
and is pushed onto the call stack:
-
console.log
executes, printsmain end
, and is popped off the call stack. -
main
finishes executing, and is popped off the call stack. The call stack is empty again and the script finishes executing.
This code didn’t interact with any asynchronous (internal) APIs. But when it does (like when calling setTimeout(callback)
) it makes use of the task queue.
Task queue
Any asynchronous work in the runtime is represented as a task in a queue, or in other words, a message queue.
Each message can be thought of as a function that will be called in FIFO order to handle said work. For example, the callback provided to the setTimeout
or Promise
API.
Additionally, each message is processed completely before any other message is processed. This means that whenever a function runs it can’t be interrupted. This behavior is called run-to-completion and makes it easier to reason about our JavaScript programs.
Messages get enqueued (i.e. added to the queue) and at some point messages will be dequeued (i.e. removed from the queue).
When? How? This is handled by the Event Loop.
Event loop
The event loop can be literally thought of as a loop that runs forever, and where every cycle is referred to as a tick.
On every tick the event loop will check if there’s any work in the task queue. If there is, it will execute the task (i.e. call a function), but only if the call stack is empty.
The event loop can be described with the following pseudo code2:
To summarize:
- When code executes, function calls are added to the call stack.
- Whenever calls are made via asynchronous (internal) APIs (like
setTimeout
orPromise
) the corresponding callbacks are eventually added to the task queue. - When the call stack is empty and the task queue contains one or more tasks, the event loop will remove a task on every tick and push it onto the call stack. The function will execute and this process will continue until all work is done.
With that covered, we can explore how the AWS Lambda execution environment interacts with the Node.js event loop.
AWS Lambda
AWS Lambda invokes a Lambda function via an exported handler function, e.g. exports.handler
. When Lambda invokes this handler it calls it with 3 arguments:
The callback
argument may be used to return information to the caller and to signal that the handler function has completed, so Lambda may end it. For that reason you don’t have to call it explicitly. Meaning, if you don’t call it Lambda will call it for you3.
Baseline
From here on we’ll use a simple script as a “baseline” to reason about the event loop behavior. Create a file called timeout.js
with the following contents:
When we execute this script locally (not via Lambda) with node timeout.js
, the following will print:
The last message takes 5 seconds to print, but the script does not stop executing before it does.
What happens in Lambda, stays in Lambda
Now lets modify the code from timeout.js
so it’s compatible with Lambda:
You can create a new function in the AWS Lambda console and paste in the code from above. Run it, sit back and enjoy.
Wait, what? Lambda just ended the handler function without printing the last message timeout cb fired after 5000 ms
. Lets run it again.
It now prints timeout cb fired after 5000 ms
first and then the other ones! So what’s going on here?
AWS Lambda execution model
AWS Lambda takes care of provisioning and managing resources needed to run your functions. When a Lambda function is invoked, an execution context is created for you based on the configuration you provide. The execution context is a temporary runtime environment that initializes any external dependencies of your Lambda function.
After a Lambda function is called, Lambda maintains the execution context for some time in anticipation of another invocation of the Lambda function (for performance benefits). It freezes the execution context after a Lambda function completes and may choose to reuse (thaw) the same execution context when the Lambda function is called again (but it doesn’t have to).
In the AWS docs we can find the following regarding this subject:
Background processes or callbacks initiated by your Lambda function that did not complete when the function ended resume if AWS Lambda chooses to reuse the Execution Context.
As well as this somewhat hidden message:
When the callback is called (explicitly or implicitly), AWS Lambda continues the Lambda function invocation until the event loop is empty.
Looking further, there’s some documentation about the context object. Specifically about a property called callbackWaitsForEmptyEventLoop
. This is what it does:
The default value is
true
. This property is useful only to modify the default behavior of the callback. By default, the callback will wait until the event loop is empty before freezing the process and returning the results to the caller.
Okay, so with this information we can make sense of what happened when we executed the code in timeout.js
before. Lets break it down and go over it step by step.
- Lambda starts executing the code in
timeout.js
. The call stack is empty.
main
is called, and pushed onto to the call stack:
- While executing
main
,console.log("main start")
is called, and pushed onto the call stack:
console.log
executes, printsmain start
, and is popped off the call stack.
main
continues executing, callstimeout(5e3)
, and is pushed onto the call stack:
- While executing
timeout
,console.log("timeout start")
is called, and pushed onto the call stack:
console.log
executes, printstimeout start
, and is popped off the call stack.
timeout
continues executing, callsnew Promise(callback)
on line 6, and is pushed onto the call stack:
- While
new Promise(callback)
executes, it interacts with thePromise
API and passes the provided callback to it. ThePromise
API sends the callback to the task queue and now must wait until the call stack is empty before it can execute.
new Promise
finishes executing, and is popped of the call stack.
timeout
finishes executing, and is popped off the call stack.
main
continues executing, callsconsole.log("main end")
, and is pushed onto the call stack:
console.log
executes, printsmain end
, and is popped off the call stack.
main
finishes executing, and is popped off the call stack. The call stack is empty.
- The
Promise
callback (step 9) can now be scheduled by the event loop, and is pushed onto the call stack.
- The
Promise
callback executes, callssetTimeout(callback, timeout)
on line 7, and is pushed onto the call stack:
- While
setTimeout(callback, timeout)
executes, it interacts with thesetTimeout
API and passes the corresponding callback and timeout to it.
setTimeout(callback, timeout)
finishes executing and is popped of the call stack. At the same time thesetTimeout
API starts counting down the timeout, to schedule the callback function in the future.
- The Promise callback finishes executing and is popped off the call stack. The call stack is empty again.
At this point the call stack and task queue are both empty. At the same time a timeout is counting down (5 seconds), but the corresponding timeout callback has not been scheduled yet. As far as Lambda is concerned, the event loop is empty. So it will freeze the process and return results to the caller!
The interesting part here is that Lambda doesn’t immediately destroy its execution context. Because if we wait for +5 seconds and run the Lambda again (like in the second run) we see the console message printed from the setTimeout
callback first.
This happens because after the Lambda stopped executing, the execution context was still around. And after waiting for +5 seconds, the setTimeout
API sent the corresponding callback to the task queue:
When we execute the Lambda again (second run), the call stack is empty with a message in the task queue, which can immediately be scheduled by the event loop:
This results in timeout cb fired after 5000 ms
being printed first, because it executed before any of the code in our Lambda function:
Doing it right
Obviously this is undesired behavior and you should not write your code in the same way we wrote the code in timeout.js
.
Like stated in the AWS docs, we need to make sure to complete processing all callbacks before our handler exits:
You should make sure any background processes or callbacks (in case of Node.js) in your code are complete before the code exits.
Therefore we’ll make the following change to the code in timeout.js
:
This change makes sure the handler function does not stop executing until the timeout
function finishes:
When we run our code with this change, all is well now.
Macrotasks and microtasks
I intentionally left out some details about the the task queue. There are actually two task queues. One for macrotasks (e.g. setTimeout
) and one for microtasks (e.g. Promise
).
According to the spec, one macrotask should get processed per tick. And after it finishes, all microtasks will be processed within the same tick. While these microtasks are processed they can enqueue more microtasks, which will all be executed in the same tick.
For more information see this article from RisingStack where they go more into detail.
Note
This page was originally published on Medium.
Footnotes
-
The event loop is what allows Node.js to perform non-blocking I/O operations (despite the fact that JavaScript is single-threaded) by offloading operations to the system kernel whenever possible. ↩
-
When using Node.js version
8.10
or above, you may also return aPromise
instead of using the callback function. In that case you can also make your handlerasync
, becauseasync
functions return aPromise
. ↩