Seung-hwa(Steve)'s Lab: 2014

2014/11/20

boost::shared_ptr Assertion `px != 0' failed.

a.out: /usr/local/include/boost/smart_ptr/shared_ptr.hpp:653: typename boost::detail::sp_member_access<T>::type boost::shared_ptr<T>::operator->() const [with T = classA, typename boost::detail::sp_member_access<T>::type = classA*]: Assertion `px != 0' failed.

When this assertion comes to me, this is because assigned object to the shared_ptr is NULL.

See sample code below.

#include <boost/shared_ptr.hpp>
#include <iostream>

class classA{
private:
    int number_;
public:
    classA(){ number_ = 1;}
public:
    int number() { return number_; }
};

typedef boost::shared_ptr<classA> classA_Ptr;

void makeNull(){
    std::cout << "@sshtel @@@@@@@@@ start function" << std::endl;

    classA_Ptr ap1(new classA());
    std::cout << "@sshtel @@@@@@@@@ ap1:" << ap1->number() <<  std::endl;

    classA *ca = 0;
    classA_Ptr ap2(ca);
    std::cout << "@sshtel @@@@@@@@@ ap2:" << ap2->number() <<  std::endl;

    std::cout << "@sshtel @@@@@@@@@ end of function" << std::endl;
}

int main(){
    makeNull();
    return 0;
}

-----------------------------------------------------------------------------

sshtel@rnd4svr4:~/test/SharedPtr$ ./a.out
@sshtel @@@@@@@@@ start function
@sshtel @@@@@@@@@ ap1:1
a.out: /usr/local/include/boost/smart_ptr/shared_ptr.hpp:653: typename boost::detail::sp_member_access<T>::type boost::shared_ptr<T>::operator->() const [with T = classA, typename boost::detail::sp_member_access<T>::type = classA*]: Assertion `px != 0' failed.
Aborted
sshtel@rnd4svr4:~/test/SharedPtr$

-----------------------------------------------------------------------------

To avoid this dangerous situation, you should check memory result of object allocation every time.

boost::shared_ptr<CLASS> c_ptr;
CLASS *c;
try{
    c = new CLASS();

    c_ptr.reset(c);
}
catch(std::bad_alloc){
    // exception
}

2014/11/01

Ara project's prototype and new era

http://www.youtube.com/watch?v=4qsGTXLnmKs

The Ara project is rising up.

Many people are not convinced about this project due to its limits such as size, price and market chain.

However, this is the only story in current smart phone industry.

There are a couple of reasons why I am really interested in this project.

First of all, the Ara project gives users and developers more freedom of choice.

While typical smart phone ecosystem has opened software-freedom to users and developers , this project will open a new era of ecosystem not only providing further freedom of software but also hardware to users and developers .

For example, it is hardly able to challenge that a tiny venture company enter smartphone market with a new hardware and software because the company should make whole necessary device which is harshly competing already.

If imagine that this project is successful, small ventures or even private developers can develop the modules freely and sell it on the market.

I goes further and say that the hardware module is not limited within typical well-known modules such as AP(Application Processor which consists of CPU, RAM, and Storage device), cameras, and batteries.

Look into the Apple app store or Google play store and count how many kinds of applications are on sale. And imagine a new era of hardware module store with well-optimized software.

Second, this can change typical industrial structure.
To be more specific, many small and middle-sized smart phone module makers provide modules to some big smartphone vendors such as Samsung, LG, Sony, and HTC.

In contrast, if this change comes true, those module makers do not need to do this anymore.
They just can make modules and sell their products on market directly.
Furthermore, it is well known that many big companies have required unfair price down contracts or rebates. So the module makers do not need to depend on major smart phone makers.

2014/10/28

Practical use of TCMalloc #2

1. Does the TCMalloc reliable?

When I first used the TCMalloc, I thought this solution will be good for my application because many threads allocate and release memory resources frequently.

However, there some problems were occurred.

Sometimes, memory usage of a process increased sharply within short time and there were hardly available system memory.
Finally, Linux OS operated OOM killer and kill my process.

Even though my application has some possibilities that a logic cause huge memory usage, this has never occurred before I apply the TCMalloc.

What's going wrong?

Thank to googling, I could find some similar cases that the TCMalloc's memory management is not complete.

http://stackoverflow.com/questions/15566083/tcmallocs-fragmentation
tcmalloc tries to do some smart things to anticipate your memory use, but it isn't very good about releasing memory back to the system even after you have freed it. in fact, it might be resident in memory and leading to OOM.

It's true.
You cannot trust memory management of the TCMalloc.
There are some reports about memory fragmentation issues.

https://groups.google.com/forum/#!searchin/google-perftools/ReleaseFreeMemory/google-perftools/FmeMfZ2CAJM/U15HZzZ15JIJ

For this issue, you can try some solutions.

One thing is releasing 'free memory' to system using this API.
MallocExtension::instance()->ReleaseFreeMemory();

Another thing is that you can adjust free memory rate.

see section 'Modifying Behavior In Code' in the page below

http://google-perftools.googlecode.com/svn/trunk/doc/tcmalloc.html

Here let me show you a change of memory of one of my application as a good example.
This graph shows how much frequently memory is released when TCMALLOC_RELEASE_RATE is set to 10.
(Exact period is not guaranteed)

I will try another experiment with an open source program again in the future.

2. tc_malloc_stats

Remember that when you use the TCMalloc, you must not judge or analyze memory usage change of your process with report from OS.

To grab exact current tcmalloc stats, you have to use tc_malloc_stats() function.
(see gperftools/tcmalloc.h)

Or, you can use this too.
MallocExtension::instance()->GetStats(buffer, buffer_length);
std::cout << buffer;

refer to this:
http://gperftools.googlecode.com/svn/trunk/doc/tcmalloc.html

This is tc_malloc_stats() report.

------------------------------------------------
MALLOC: 16832 ( 0.0 MiB) Bytes in use by application
MALLOC: + 38682624 ( 36.9 MiB) Bytes in page heap freelist
MALLOC: + 97632 ( 0.1 MiB) Bytes in central cache freelist
MALLOC: + 0 ( 0.0 MiB) Bytes in transfer cache freelist
MALLOC: + 224 ( 0.0 MiB) Bytes in thread cache freelists
MALLOC: + 1175704 ( 1.1 MiB) Bytes in malloc metadata
MALLOC: ------------
MALLOC: = 39973016 ( 38.1 MiB) Actual memory used (physical + swap)
MALLOC: + 0 ( 0.0 MiB) Bytes released to OS (aka unmapped)
MALLOC: ------------
MALLOC: = 39973016 ( 38.1 MiB) Virtual address space used
MALLOC:
MALLOC: 10 Spans in use
MALLOC: 1 Thread heaps in use
MALLOC: 8192 Tcmalloc page size
------------------------------------------------
Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
Bytes released to the OS take up virtual address space but no physical memory.

you can also see the message that explains call ReleaseFreeMemory() to release freelist memory.

2014/10/22

Practical use of TCMalloc #1

Thread-Caching Malloc(tcmalloc) is one of new approaches for memory allocation proposed by Google.

With tcmalloc, Google also provides powerful tools for system resource profiling.

Here, let me explain how to use tcmalloc and some interesting experiments.

Here, my test code used in this page:
https://github.com/sshtel/practical_gperftools

1. What is tcmalloc?

Thread-caching malloc(tcmalloc) uses memory pool to manage memory.
In a word, although programmers allocate or release memory frequently, system call for memory allocation or release is not actually run.
Instead, tcmalloc merely search and return pointer of proper memory space which is already allocated for the process.
If only tcmalloc fail to search memory from the memory pool, it asks memory allocation to the OS.

Introduction to the Thread-Caching Malloc:
http://gperftools.googlecode.com/svn/trunk/doc/tcmalloc.html

2. How to use?

You can just download source code here.

gperftools project: https://code.google.com/p/gperftools/

If you build gperftools, you will get static library file, named libtcmalloc.a (if you use Linux)

According to manual, you are surely recommended to use static library, not dynamic library.

And build your application.
$ g++ your_program.cpp libtcmalloc.a -o your_program

This is my sample Makefile:
https://github.com/sshtel/practical_gperftools/blob/master/sample/test001/Makefile

**NOTE: When compiling with programs with gcc, that you plan to link
with libtcmalloc, it's safest to pass in the flags
-fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free

http://google-perftools.googlecode.com/svn/trunk/README

3. Process memory usage when you use tcmalloc

When you first use tcmalloc, you will experience interesting phenomenon.
First of all, you will see a little bit increase of memory usage of your program.
Secondly, you might think that your memory is not actually returned to the system after asking memory release such as delete and free.

Here, let me show you a sample code and change of memory usage.

Using a sample code below, I recorded change of a process memory for both of two test cases, using tcmalloc and not.
https://github.com/sshtel/practical_gperftools/tree/master/sample/test002/

This graph shows change of actual physical memory usage of a process. (I recorded VmRSS field of /proc/pid/status of Linux system)
This graph explains how tcmalloc holds memory even though you try to release memory of the process.

4. What I learned from practical use of tcmalloc

When you use tcmalloc, you have to consider a total memory usage of the process.

For most time of running your process, your process will keep maximum usage of memory with tcmalloc.
This does not happen forever actually, but I will explain it later.
However, if there are other processes and you design your program not considering this, you would get some tragic problem like Out Of Memory.

In other words, when you use tcmalloc, your process will keep memory for longer time than before.

In conclusion, you need to pare down memory usage or optimization is necessary.

2014/08/31

How to contribute to OpenCV

http://code.opencv.org/projects/opencv/wiki/How_to_contribute

OpenCL Conformant Products

http://www.khronos.org/conformance/adopters/conformant-products/

2014/08/22

이기종 멀티코어 환경을 위한 프로그래밍 언어 및 영상처리 오픈소스

click -> Slideshare presentation link

오늘날 멀티코어 프로세서 세상은 이기종 컴퓨팅 환경이 대부분이라 해도 과언이 아니다.
병렬 컴퓨팅은 비약적인 속도 향상과 전력 소비 감소라는 장점이 있지만 사용하기가 까다롭고 특히 다양한 아키텍처로 이루어진 이기종 컴퓨팅 환경에서는 소프트웨어 개발이 더욱 어려워진다.
이 프리젠테이션에서는 이기종 컴퓨팅 환경에서의 병렬 처리를 위한 프로그래밍 언어를 소개하고 OpenCV와 같은 영상처리 라이브러리에서의 활용 예시를 보여준다

Today, multi-core processor world almost equals heterogeneous computing platform.
Even though parallel computing can give performance enhancement and low power consumption, it is not easy to use especially form software development in heterogeneous computing environment which consists of many kind of architecture.
In this presentation, parallel computing programming languages for heterogeneous computing will be introduced and shows its application for image processing library such as OpenCV.

2014/08/21

OpenCV on GPU(Nvidia CUDA)

Video lecture

http://on-demand.gputechconf.com/gtc/2013/webinar/opencv.mp4

presentation file

http://on-demand.gputechconf.com/gtc/2013/webinar/opencv-gtc-express-shalini-gupta.pdf

2014/05/23

Trends of SW Platforms for Heterogeneous Multi-core systems and Open Source Community Activities

Date: 2014.05.23

presentation file link:
http://www.slideshare.net/sshtel/trends-of-sw-platforms-for-heterogeneous-multicore-systems-and-open-source-community-activities-iset2014

Heterogeneous system architecture today is not an early technology trend anymore. This architecture is already widely accepted in various computer industries such as personal computers and mobile devices.
Typical single-core architecture faced limitations of performance growth in the past. The multi-core architecture era arrived decades ago. Early multi-core architectures were basically based on homogeneous system architecture which gained performance by just adding cores. However, even though each core shows high performance and it is usable for general purpose, many-core architecture is not easily accepted for many domains except server clustering industry because of its price. Furthermore, end-user devices like PC or mobile phone require more various particular tasks rather than a few performance-oriented tasks. One of the needs is graphic processing which drives development of GPU. Many heterogeneous system architecture utilizes CPU and GPU usually on the same silicon chip.
Multi-core era also saw some interesting developments with advances of GPU. Since GPUs have parallel vector processing capabilities that enable them to compute large sets of data, people tried to utilize them for general purporse computation beyond graphic processing. And even parallel processing consumed much lower power relative to similar works on CPUs. Although GPUs have definate advantages above, vector processing is not always good answer. CPUs are still better for certain problems and we cannot dump typical abundant software libraries and solutions. This is because CPU-GPU coupled architecture trend has been risen up.
Heterogeneous system is very sophisticated. Thus software industry faced a truly hard portability issue that programmers cannot support all different platform by re-writing code. To overcome this issue, the HSA Foundation which is open industry standard organization for heterogeneous system was formed. The goal of HSA is to help system designers integrate different architecture easily and provide advanced approaches and standard software infrastructure such as compiler and language.
In this presentation, today’s trend of heterogeneous system and its software platform technologies will be introduced, especially CPU-GPU offloading and OpenCL. By these trend changes, there have been many efforts to improve heterogeneous system software platform in Korea. Researches driven by the Korean Electronics and Telecommunications Research Institute, the ETRI, will be introduced. Research works are not only included but an open source community to try and evaluate software technologies developed by the ETRI also organized and its activities will be introduced in this presentation too.

2014/02/21

Reason why you must not call function in condition

you cannot guarantee all functions are going to be called in condition.
See code below...

 #include <stdio.h>  
 int funcA(){  
   printf("funcA \n");  
   return 0;  
 }  
 int funcB(){  
   printf("funcB \n");  
   return 1;  
 }  
 int main()  
 {  
   int a = 0;  
   int b = 1;  
   if( funcA() || funcB() )  
   {  
     printf("first condition! funcA() || funcB() \n\n");  
   }  
   if( funcB() || funcA() ){  
     printf("second condition! funcB() || funcA() \n\n");  
   }  
   if( funcA() && funcB() )  
   {  
     printf("third condition! funcA() && funcB() \n\n");  
   }  
   if( funcB() && funcA() ){  
     printf("fourth condition! funcB() && funcA() \n\n");  
   }  
   return 0;  
 }

funcA
funcB
first condition! funcA() || funcB()

funcB
second condition! funcB() || funcA()

funcA
funcB
funcA

* if the function calls are related to I/O, you may make a mistake and cause memory leak.