2014/10/28

Practical use of TCMalloc #2

1. Does the TCMalloc reliable?


When I first used the TCMalloc, I thought this solution will be good for my application because many threads allocate and release memory resources frequently.

However, there some problems were occurred.

Sometimes, memory usage of a process increased sharply within short time and there were hardly available system memory.
Finally, Linux OS operated OOM killer and kill my process.

Even though my application has some possibilities that a logic cause huge memory usage, this has never occurred before I apply the TCMalloc.

What's going wrong?

Thank to googling, I could find some similar cases that the TCMalloc's memory management is not complete.

http://stackoverflow.com/questions/15566083/tcmallocs-fragmentation
tcmalloc tries to do some smart things to anticipate your memory use, but it isn't very good about releasing memory back to the system even after you have freed it. in fact, it might be resident in memory and leading to OOM.


It's true.
You cannot trust memory management of the TCMalloc.
There are some reports about memory fragmentation issues.

https://groups.google.com/forum/#!searchin/google-perftools/ReleaseFreeMemory/google-perftools/FmeMfZ2CAJM/U15HZzZ15JIJ



For this issue, you can try some solutions.

One thing is releasing 'free memory' to system using this API.
MallocExtension::instance()->ReleaseFreeMemory();

Another thing is that you can adjust free memory rate.
see section 'Modifying Behavior In Code' in the page below
http://google-perftools.googlecode.com/svn/trunk/doc/tcmalloc.html


Here let me show you a change of memory of one of my application as a good example.
This graph shows how much frequently memory is released when TCMALLOC_RELEASE_RATE is set to 10.
(Exact period is not guaranteed)


I will try another experiment with an open source program again in the future.




2. tc_malloc_stats

Remember that when you use the TCMalloc, you must not judge or analyze memory usage change of your process with report from OS.

To grab exact current tcmalloc stats, you have to use tc_malloc_stats() function.
(see gperftools/tcmalloc.h)


Or, you can use this too.
MallocExtension::instance()->GetStats(buffer, buffer_length);
std::cout << buffer;

refer to this:
http://gperftools.googlecode.com/svn/trunk/doc/tcmalloc.html


This is tc_malloc_stats() report.

------------------------------------------------
MALLOC:          16832 (    0.0 MiB) Bytes in use by application
MALLOC: +     38682624 (   36.9 MiB) Bytes in page heap freelist
MALLOC: +        97632 (    0.1 MiB) Bytes in central cache freelist
MALLOC: +            0 (    0.0 MiB) Bytes in transfer cache freelist
MALLOC: +          224 (    0.0 MiB) Bytes in thread cache freelists
MALLOC: +      1175704 (    1.1 MiB) Bytes in malloc metadata
MALLOC:   ------------
MALLOC: =     39973016 (   38.1 MiB) Actual memory used (physical + swap)
MALLOC: +            0 (    0.0 MiB) Bytes released to OS (aka unmapped)
MALLOC:   ------------
MALLOC: =     39973016 (   38.1 MiB) Virtual address space used
MALLOC:
MALLOC:             10              Spans in use
MALLOC:              1              Thread heaps in use
MALLOC:           8192              Tcmalloc page size
------------------------------------------------
Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
Bytes released to the OS take up virtual address space but no physical memory.
you can also see the message that explains call ReleaseFreeMemory() to release freelist memory.

2014/10/22

Practical use of TCMalloc #1

Thread-Caching Malloc(tcmalloc) is one of new approaches for memory allocation proposed by Google.

With tcmalloc, Google also provides powerful tools for system resource profiling.

Here, let me explain how to use tcmalloc and some interesting experiments.


Here, my test code used in this page:
https://github.com/sshtel/practical_gperftools




1. What is tcmalloc?

Thread-caching malloc(tcmalloc) uses memory pool to manage memory.
In a word, although programmers allocate or release memory frequently, system call for memory allocation or release is not actually run.
Instead, tcmalloc merely search and return pointer of proper memory space which is already allocated for the process.
If only tcmalloc fail to search memory from the memory pool, it asks memory allocation to the OS.


Introduction to the Thread-Caching Malloc:
http://gperftools.googlecode.com/svn/trunk/doc/tcmalloc.html




2. How to use?

You can just download source code here.

gperftools project: https://code.google.com/p/gperftools/


If you build gperftools, you will get static library file, named libtcmalloc.a (if you use Linux)
According to manual, you are surely recommended to use static library, not dynamic library.

And build your application.
$ g++ your_program.cpp libtcmalloc.a -o your_program

This is my sample Makefile:
https://github.com/sshtel/practical_gperftools/blob/master/sample/test001/Makefile


**NOTE: When compiling with programs with gcc, that you plan to link
with libtcmalloc, it's safest to pass in the flags
 -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free

http://google-perftools.googlecode.com/svn/trunk/README





3. Process memory usage when you use tcmalloc

When you first use tcmalloc, you will experience interesting phenomenon.
First of all, you will see a little bit increase of memory usage of your program.
Secondly, you might think that your memory is not actually returned to the system after asking memory release such as delete and free.

Here, let me show you a sample code and change of memory usage.

Using a sample code below, I recorded change of a process memory for both of two test cases, using tcmalloc and not.
https://github.com/sshtel/practical_gperftools/tree/master/sample/test002/




This graph shows change of actual physical memory usage of a process. (I recorded VmRSS field of /proc/pid/status of Linux system)
This graph explains how tcmalloc holds memory even though you try to release memory of the process.




4. What I learned from practical use of tcmalloc

When you use tcmalloc, you have to consider a total memory usage of the process.

For most time of running your process, your process will keep maximum usage of memory with tcmalloc.
This does not happen forever actually, but I will explain it later.
However, if there are other processes and you design your program not considering this, you would get some tragic problem like Out Of Memory.

In other words, when you use tcmalloc, your process will keep memory for longer time than before.

In conclusion, you need to pare down memory usage or optimization is necessary.