Daniel Khan About the Author

Daniel has over 15 years experience as full stack developer, architect and technical lead in the field of web engineering proving his strong problem solving skills in hundreds of projects. He is passionate about constant learning, using new technologies and sharing his findings with others. As technology strategist, Daniel focuses on driving support for emerging technologies like Node.js and MongoDB at Dynatrace.

Understanding Garbage Collection and hunting Memory Leaks in Node.js

Whenever there is bad press coverage of Node.js, it is (typically) related to performance problems. This does not mean that Node.js is more prone to problems than other technologies – the user must simply be aware of certain things about how Node.js  works . While this technology has a rather flat learning curve, the machinery that keeps Node.js ticking is quite complex and you must understand it to preemptively avoid performance pitfalls. And if things go wrong you need to know how to fix things fast.

In this post I’ll cover how Node.js manages memory and how to trace down memory-related problems. Unlike platforms like PHP, Node.js applications are long-running processes. While this has lots of positive implications such as allowing database connections to be set up once and then reused for all requests, this may also cause problems. But first, let’s cover some Node.js basics.

Figure 1: A real Austrian garbage collection vehicle

Figure 1: A real Austrian garbage collection vehicle

Node.js is a C++ program controlled via V8 JavaScript

Google V8 is a JavaScript engine initially created for Google Chrome but it can also be used as a standalone. This makes it the perfect fit for Node.js, and it is the only part of the platform that actually ‘understands’ JavaScript. V8 compiles JavaScript down to native code and executes it. During execution it manages the allocation and freeing of memory as needed. This means that if we talk about memory management in Node.js we actually always talk about V8.

Please read on here for a simple example on how to use V8 from a C++ perspective.

V8’s Memory Scheme

A running program is always represented through some space allocated in memory. This space is called Resident Set. V8 uses a scheme similar the Java Virtual Machine and divides the memory into segments:

  • Code: the actual code being executed
  • Stack: contains all value types (primitives like integer or Boolean) with pointers referencing objects on the heap and pointers defining the control flow of the program
  • Heap: a memory segment dedicated to storing reference types like objects, strings and closures.
Figure 2: V8 Memory Scheme

Figure 2: V8 Memory Scheme

Within Node.js the current memory usage can easily be queried by calling process.memoryUsage().

This function will return an object containing:

  • Resident Set Size
  • Total Size of the Heap
  • Heap actually Used

We can use this function to record the memory usage over time to create a graph that perfectly shows how V8’s memory handling actually works.

Figure 3: Node.js memory usage over time

Figure 3: Node.js memory usage over time

We see that the used heap graph is highly volatile but always stays within certain boundaries to keep the median consumption constant. The mechanism that allocates and frees heap memory is called garbage collection.

Enter Garbage Collection

Every program that consumes memory requires a mechanism for reserving and freeing space. In C and C++ this is accomplished by malloc() and free() as the example below shows.

We see that the programmer is responsible for freeing heap memory that is no longer required. If a program allocates memory that is never freed the heap will constantly grow until the usable memory is exhausted, causing the program to crash. We call this a memory leak.

As we already learned, in Node.js JavaScript is compiled to native code by V8. The resulting native data structures don’t have much to do with their original representation and are solely managed by V8. This means that we cannot actively allocate or deallocate memory in JavaScript. V8 uses a well-known mechanism called garbage collection to address this problem.

The theory behind garbage collection is quite simple: If a memory segment is not referenced from anywhere, we can assume that it is not used and, therefore, can be freed. However, retrieving and maintaining this information is quite complex as there may be chained references and indirections that form a complex graph structure.

Figure 4: A heap graph. Only if there are no more references to the red object can it can be discarded

Figure 4: A heap graph. Only if there are no more references to the red object can it can be discarded

Garbage collection is a rather costly process because it interrupts the execution of an application, which naturally impacts its performance. To remedy this situation V8 uses two types of garbage collection:

  • Scavenge, which is fast but incomplete
  • Mark-Sweep, which is relatively slow but frees all non-referenced memory

For an excellent blog post containing in-depth information about garbage collection in V8 please click here.

Revisiting the data we collected from process.memoryUsage() we can now easily identify the different garbage collection types: The saw-tooth pattern is created by Scavenge runs and the downward jumps indicate Mark-Sweep operations.

By using the native module node-gc-profiler we can gather even more information about garbage collection runs. The module subscribes to garbage collection events fired by V8, and exposes them to JavaScript.

The object returned indicates the type of garbage collection and the duration. Again, we can easily graph this to gain a better understanding on how things work.

 

Figure 5: Duration and frequency of GC runs

Figure 5: Duration and frequency of GC runs

We can see that Scavenge Compact runs at a much higher frequency than Mark-Sweep. Depending on the complexity of an application the durations will vary. Interestingly the above chart also shows frequent, very short, Mark-Sweep runs, the function of which I have yet determined.

When things go wrong

So if garbage collection cleans up the memory, why do you have to care at all?  In fact, it is still possible — and easy — to introduce memory leaks that suddenly appear in your logs.

Figure 6: Exception caused by memory leak

Figure 6: Exception caused by memory leak

Employing our previously introduced charting we can even watch the memory piling up!

Figure 7: Memory leak in progress

Figure 7: Memory leak in progress

Garbage collection tries its best to free memory but for every run we see that consumption after a garbage collection run is constantly climbing, which is a clear indication of a leak.  While these metrics are apparently a great starting point for anomaly detection, let’s review how to build a leak first before discussing how to trace it down.

Building a Leak

Some leaks are obvious — like storing data in process-global variables, an example of which would be storing the IP of every visiting user in an array. Others are more subtle like the famous Walmart memory leak that was caused by a tiny missing statement within Node.js core code, and which took weeks to track down.

I won’t cover core code errors here. Instead, let’s just look at a difficult to track leak you can easily introduce into your own JavaScript code that I found on Meteor’s blog.

Figure 8: Introducing a leak into your own JavaScript code

Figure 8: Introducing a leak into your own JavaScript code

This looks OK at first glance. We could think that theThing get’s overwritten with every invocation of replaceThing(). The problem is that someMethod has its enclosing scope as context. This means that unused() is known within someMethod() and even if unused() is never invocated, it prevents the garbage collector from freeing originalThing.  There are simply too many indirections to follow.  This is not a bug in your code but it will cause a memory leak that is difficult to track down.

So wouldn’t it be great if we could have a look into our heap to see what’s currently in there? Fortunately, we can! V8 provides a way to dump the current heap, and V8-profiler exposes this functionality to JavaScript.

This simple module creates heap dump files if memory usage is constantly rising. Yes there are more sophisticated approaches to detect anomalies but — for our purpose — this should be sufficient. If there is a memory leak you may end up with a significant number of such files. So you should monitor this closely and add some alerting capabilities to that module. The same heap dump functionality is also provided within Chrome and, fortunately, you can use Chrome developer tools to analyze the dumps V8-profiler.

Figure 9: Chrome Developer tools

Figure 9: Chrome Developer tools

One heap dump may not help you, because it won’t show you how the heap develops over time. That’s why Chrome developer tools allow you to compare different memory profiles. By comparing two dumps we get delta values that indicate which structures grew between two dumps as seen below.

Figure 10: Heap dump comparison showing our leak

Figure 10: Heap dump comparison showing our leak

And here we have our problem. A variable called longStr contains a string of asterisks, and is referenced by originalThing, which is referenced by some method, which is referenced by…well, you get the point. There is a long path of nested references and closure contexts that prevent longStr from being freed anytime soon.

Although this example leads to obvious results the process is always the same:

  1. Create heap dumps with some time and a fair amount of memory allocation in between
  2. Compare a few dumps to find out what’s growing

Wrap Up

As we have seen, garbage collection is a complex process and even valid code can cause memory leaks. By using the out-of-the-box functionality provided by V8 plus Chrome developer tools it’s possible to obtain insights that help us trace down the root cause of the leaks and, if you build in such functionality into your application, you have everything necessary to fix a problem when it occurs.

But one question remains: How can we fix this leak? This answer is simple – just add theThing = null; add the end of the function, and your day is saved.

Node.js and Dynatrace

To see the big picture, including which transactions are passing through your node application, you may want to use Dynatrace. We provide a Node.js agent which will, in conjunction with the various other technologies supported, help you to understand your application as a whole.

This blog post is part of my ongoing research on instrumenting Node.js, and some of the tools introduced here are already in our product or will be incorporated into it soon.

You can start monitoring your Node.js environment right away with Dynatrace SaaS

Please feel free to contact me anytime if you have questions about my blog posts or how to instrument node.  You can reach me on Twitter at @dkhan.

About The Author
Daniel Khan
Daniel Khan Daniel has over 15 years experience as full stack developer, architect and technical lead in the field of web engineering proving his strong problem solving skills in hundreds of projects. He is passionate about constant learning, using new technologies and sharing his findings with others. As technology strategist, Daniel focuses on driving support for emerging technologies like Node.js and MongoDB at Dynatrace.

Comments

  1. It’s worth mentioning that there’s an option in node to expose the GC to the application, so you can manually call GC. It’s expensive and shouldn’t generally be used for a public facing server, but I found it invaluable for keeping the memory usage down (not because of leaks) in processes that handle one-off translations. Such as MQ processing, or import/export processors. I find that node is a *great* fit for scripting these types of workloads, but as the script processes hundreds or thousands of items, the memory will build quite a bit before GC happens on it’s own, and by running it after each item that is processed, it’s a good time to force it.

    Also, using functional closures over variable references and fp workflows can help to avoid some of the more common reference leaks. Your example is a somewhat noisy, but I can see how it can happen in practice.

  2. Daniel Khan Daniel Khan says:

    Thank you Michael! Your additions are really valuable. If you have or are planning to do a writeup about best practices to avoid leaks code-wise please contact me anytime. I would be really happy to share that.

  3. Shouldn’t the solution be “just add originalThing = null;” since that’s the variable allocation that’s leaking memory?

  4. Hello.
    I really like this post. So I want to translate to korean.
    And.. then may I post on my blog? (with orin source, link)

  5. Angel Malavar says:

    Hi! Great explanation…
    I just have a doubt. How can we free a memory leak on array.ForEach() or node and v8 made the job.
    In some places I’ve found that a loop consumes a lot of memory but nobody told how can we manage with it.
    Thanks

    • Daniel Khan Daniel Khan says:

      A memory leak occurs when allocated memory can not be freed. I doubt that this can happen during a foreach.
      Iterating over a large array allocates a lot of memory. I would recommend using other constructs like maybe streams for it.

  6. Hi..! Its a very informative post.
    I have one doubt .Lets consider a parent process running in nodeJs and now we invoke a child process using the spawn function.
    Now Assume the child process have a loop which leads to memory out of bounds.How is it handled? Does the child and parent process share same memory?Please explain the memory management between the parent and child processes.
    Thanks..

    • Daniel Khan Daniel Khan says:

      A child process is an independent process that communicates with its parent using inter process communication (IPC). Parent and child don’t share memory.

  7. Bob Myers says:

    > even if `unused()` is never invocated, it prevents the garbage collector from freeing `originalThing`

    Is this true even in modern engine versions which can detect that in fact `unused()` is not used from within `someMethod`?

Speak Your Mind

*

Do the math so we know you are human *