Seung-hwa(Steve)'s Lab

2014/10/22

Practical use of TCMalloc #1

Thread-Caching Malloc(tcmalloc) is one of new approaches for memory allocation proposed by Google.

With tcmalloc, Google also provides powerful tools for system resource profiling.

Here, let me explain how to use tcmalloc and some interesting experiments.

Here, my test code used in this page:
https://github.com/sshtel/practical_gperftools

1. What is tcmalloc?

Thread-caching malloc(tcmalloc) uses memory pool to manage memory.
In a word, although programmers allocate or release memory frequently, system call for memory allocation or release is not actually run.
Instead, tcmalloc merely search and return pointer of proper memory space which is already allocated for the process.
If only tcmalloc fail to search memory from the memory pool, it asks memory allocation to the OS.

Introduction to the Thread-Caching Malloc:
http://gperftools.googlecode.com/svn/trunk/doc/tcmalloc.html

2. How to use?

You can just download source code here.

gperftools project: https://code.google.com/p/gperftools/

If you build gperftools, you will get static library file, named libtcmalloc.a (if you use Linux)

According to manual, you are surely recommended to use static library, not dynamic library.

And build your application.
$ g++ your_program.cpp libtcmalloc.a -o your_program

This is my sample Makefile:
https://github.com/sshtel/practical_gperftools/blob/master/sample/test001/Makefile

**NOTE: When compiling with programs with gcc, that you plan to link
with libtcmalloc, it's safest to pass in the flags
-fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free

http://google-perftools.googlecode.com/svn/trunk/README

3. Process memory usage when you use tcmalloc

When you first use tcmalloc, you will experience interesting phenomenon.
First of all, you will see a little bit increase of memory usage of your program.
Secondly, you might think that your memory is not actually returned to the system after asking memory release such as delete and free.

Here, let me show you a sample code and change of memory usage.

Using a sample code below, I recorded change of a process memory for both of two test cases, using tcmalloc and not.
https://github.com/sshtel/practical_gperftools/tree/master/sample/test002/

This graph shows change of actual physical memory usage of a process. (I recorded VmRSS field of /proc/pid/status of Linux system)
This graph explains how tcmalloc holds memory even though you try to release memory of the process.

4. What I learned from practical use of tcmalloc

When you use tcmalloc, you have to consider a total memory usage of the process.

For most time of running your process, your process will keep maximum usage of memory with tcmalloc.
This does not happen forever actually, but I will explain it later.
However, if there are other processes and you design your program not considering this, you would get some tragic problem like Out Of Memory.

In other words, when you use tcmalloc, your process will keep memory for longer time than before.

In conclusion, you need to pare down memory usage or optimization is necessary.

2014/08/31

How to contribute to OpenCV

http://code.opencv.org/projects/opencv/wiki/How_to_contribute

OpenCL Conformant Products

http://www.khronos.org/conformance/adopters/conformant-products/

2014/08/22

이기종 멀티코어 환경을 위한 프로그래밍 언어 및 영상처리 오픈소스

click -> Slideshare presentation link

오늘날 멀티코어 프로세서 세상은 이기종 컴퓨팅 환경이 대부분이라 해도 과언이 아니다.
병렬 컴퓨팅은 비약적인 속도 향상과 전력 소비 감소라는 장점이 있지만 사용하기가 까다롭고 특히 다양한 아키텍처로 이루어진 이기종 컴퓨팅 환경에서는 소프트웨어 개발이 더욱 어려워진다.
이 프리젠테이션에서는 이기종 컴퓨팅 환경에서의 병렬 처리를 위한 프로그래밍 언어를 소개하고 OpenCV와 같은 영상처리 라이브러리에서의 활용 예시를 보여준다

Today, multi-core processor world almost equals heterogeneous computing platform.
Even though parallel computing can give performance enhancement and low power consumption, it is not easy to use especially form software development in heterogeneous computing environment which consists of many kind of architecture.
In this presentation, parallel computing programming languages for heterogeneous computing will be introduced and shows its application for image processing library such as OpenCV.

2014/08/21

OpenCV on GPU(Nvidia CUDA)

Video lecture

http://on-demand.gputechconf.com/gtc/2013/webinar/opencv.mp4

presentation file

http://on-demand.gputechconf.com/gtc/2013/webinar/opencv-gtc-express-shalini-gupta.pdf

2014/05/23

Trends of SW Platforms for Heterogeneous Multi-core systems and Open Source Community Activities

Date: 2014.05.23

presentation file link:
http://www.slideshare.net/sshtel/trends-of-sw-platforms-for-heterogeneous-multicore-systems-and-open-source-community-activities-iset2014

Heterogeneous system architecture today is not an early technology trend anymore. This architecture is already widely accepted in various computer industries such as personal computers and mobile devices.
Typical single-core architecture faced limitations of performance growth in the past. The multi-core architecture era arrived decades ago. Early multi-core architectures were basically based on homogeneous system architecture which gained performance by just adding cores. However, even though each core shows high performance and it is usable for general purpose, many-core architecture is not easily accepted for many domains except server clustering industry because of its price. Furthermore, end-user devices like PC or mobile phone require more various particular tasks rather than a few performance-oriented tasks. One of the needs is graphic processing which drives development of GPU. Many heterogeneous system architecture utilizes CPU and GPU usually on the same silicon chip.
Multi-core era also saw some interesting developments with advances of GPU. Since GPUs have parallel vector processing capabilities that enable them to compute large sets of data, people tried to utilize them for general purporse computation beyond graphic processing. And even parallel processing consumed much lower power relative to similar works on CPUs. Although GPUs have definate advantages above, vector processing is not always good answer. CPUs are still better for certain problems and we cannot dump typical abundant software libraries and solutions. This is because CPU-GPU coupled architecture trend has been risen up.
Heterogeneous system is very sophisticated. Thus software industry faced a truly hard portability issue that programmers cannot support all different platform by re-writing code. To overcome this issue, the HSA Foundation which is open industry standard organization for heterogeneous system was formed. The goal of HSA is to help system designers integrate different architecture easily and provide advanced approaches and standard software infrastructure such as compiler and language.
In this presentation, today’s trend of heterogeneous system and its software platform technologies will be introduced, especially CPU-GPU offloading and OpenCL. By these trend changes, there have been many efforts to improve heterogeneous system software platform in Korea. Researches driven by the Korean Electronics and Telecommunications Research Institute, the ETRI, will be introduced. Research works are not only included but an open source community to try and evaluate software technologies developed by the ETRI also organized and its activities will be introduced in this presentation too.

2014/02/21

Reason why you must not call function in condition

you cannot guarantee all functions are going to be called in condition.
See code below...

 #include <stdio.h>  
 int funcA(){  
   printf("funcA \n");  
   return 0;  
 }  
 int funcB(){  
   printf("funcB \n");  
   return 1;  
 }  
 int main()  
 {  
   int a = 0;  
   int b = 1;  
   if( funcA() || funcB() )  
   {  
     printf("first condition! funcA() || funcB() \n\n");  
   }  
   if( funcB() || funcA() ){  
     printf("second condition! funcB() || funcA() \n\n");  
   }  
   if( funcA() && funcB() )  
   {  
     printf("third condition! funcA() && funcB() \n\n");  
   }  
   if( funcB() && funcA() ){  
     printf("fourth condition! funcB() && funcA() \n\n");  
   }  
   return 0;  
 }

funcA
funcB
first condition! funcA() || funcB()

funcB
second condition! funcB() || funcA()

funcA
funcB
funcA

* if the function calls are related to I/O, you may make a mistake and cause memory leak.