Garbage collection !!
This is the most critical as well as the most
mysterious issue. Having Nikhil one of the experts on this topic as co-writer
of this blog, I am going to take a look at Garbage collection, its strategy and
its implications.
Before I start with the memory allocation
discussion, I would like to know about the different memory types.
Any program/process has its own memory
structure allocated. The memory structure will be divided in below types:
- Code Segment
- Data Segment
- Stack
- Heap
1. Code Segment :
Code segment contains the
compiled code of the program. Whenever a program executes, the instructions are
loaded in memory and executed. This can be verified by taking a look at the
.exe file you are trying to execute or the .class file created of your java
program. It shows some instructions in binary format. The code segment contains
that data.
2. Data Segment :
This area stores the
static/global variables.
The next two sections that
we are going to discuss are used to store data.
Stack and Heap. The stack is a place in the computer memory where all the
variables that are declared and initialized before
runtime are stored. The heap is the section of computer memory where
all the variables created or initialized at
runtime are stored.
3.
Stack :
The stack is the
section of memory that is allocated for automatic variables within functions.
Data
is stored in stack using the Last In First Out (LIFO) method. This means that
storage in the memory is allocated and de-allocated at only one end of the
memory called the top of the stack. Thus we can remove the data item that is most recently stored.
4. Heap :
The heap is the area used
to allocate memory at the runtime [Dynamic Memory Allocation]. Unlike stack,
the data item is allocated/de-allocated arbitrarily depending on the program’s
need. The pattern of allocation and size of blocks is not known until run time.
Today, almost all the programming languages
use dynamic memory allocation.
This allows object to be allocated/deallocated even if their total size was not
known at the time that the program was compiled and its life time too. A
dynamically allocated object is stored on a heap, rather than on the stack or
statically (data segment). Heap allows to -
a.
Choose dynamically the size of
new objects
b.
Define and use recursive data
structures such as lists, trees and maps
c.
Return newly created objects to
the parent procedure
d.
Return a function as the result
of another function
Heap allocated objects are accessed through
references. Typically, a reference is a pointer to the object.
Why do I
need Automatic Memory Management?
This is the question that comes in mind
when we think about our conventional ways of handling memory allocation and
automatic memory management. I have written this code and I am completely aware
of what is going on into the code, then what is the need to rely on automatic
memory management? Let’s take a look at the possible issues with Explicit
memory allocation.
If I need to handle the memory management,
I can use operators like free or delete to free the memory. Still there are
some issues that can arise like dangling
pointers. Sometimes the memory can be freed prematurely even if it is
being referenced. This causes the program to go unstable. If the program uses
the pointer reference to go further with the execution, the results are
unpredictable. If you are lucky, the program will crash immediately and then
the debugging can begin. In other cases, the program can lead further with the
execution and provide you the incorrect result. This makes the situation worst
as you are not exactly aware of what’s happening behind the scene.
The next issue is that the developer fails
to free the memory- causing Memory
Leak. For the small applications, this loss can be benign but in case
of large applications, memory leak can show its significant appearance and
effect. The Memory leak can further lead to memory management issues for
allocating the memory and further may cause your application to go Out Of Memory.
This issue further becomes more serious
when we talk about multithreaded applications or multiprocessor machines. What
if the particular object is being shared by two different processes? Or even in
worst case, the object is being shared by multiple threads. Now there is the
need for more advanced algorithms to make the objects thread-safe. This further
brings me to the quote-“Liveness is the global property.”
But the way we deal with freeing the memory is local one. Underlines the need
of Automatic Memory Management.
Automatic
Memory Management:
The issues stated above can be addressed by
the Garbage Collection. The garbage collector reclaims memory only if the
object does not hold any references. Thus, Dangling Pointer issue is down now. Garbage
collector traces down the objects and
identifies the unreachable objects (the ones which are not being referenced - Garbage) and then are reclaimed. In case
of Automatic memory management, there will be only one collector running to
reclaim the memory, thus making the memory management thread-safe.
Garbage collection
marks every object currently not being referenced and frees the object. Thus
reducing the possibility of Memory Leak. Garbage Collector does not guarantee that
Memory Leak will not happen but still ensures that it brings down its
possibility. Every data structure that is not being referenced is released.
Also if it has children associated then they are also reclaimed. However it cannot
do anything if a particular data structure is going endlessly. Memory
management cannot be the issue of only Garbage Collector, but it is also responsibility
of Software Engineer.
Garbage
Collection Parameters:
Going forward we will be taking a look at
few collectors, any strategy/algorithm for garbage collections has its own
tradeoffs. We will list down and take a look at each performance parameter for
Garbage Collector.
Often, we are not able to find the perfect
algorithm for Garbage Collection. Every algorithm comes with its pros and cons.
Thus we need to trade-off on some parameters depending on our requirements. It
has been observed that every algorithm has its own standpoints and is 15%
faster than others on its few designated parameters.
1. Safety :
Most important parameter
for garbage collection. Garbage Collection should be safe. The garbage
collector must never reclaim the live objects. However this imposes a
performance cost on the implementation.
2. Throughput :
Every user wants his
program to run faster. Its one of the aspect concerned with garbage collection
is that the garbage collection should take very less time. This is the ratio of
marking/sweeping time. However the user is more interested to spend a very less
time for memory management task. Thus the complete process of memory
allocation/de-allocation should take less time. It is observed that the memory
allocation takes the more time as compared to collection. Thus there are few
approaches to reduce the memory allocation time. The approaches like
mark-sweep-compact, compacts the memory after collecting the objects to reduce
the fragmentation.
3. Completeness and
promptness :
Looking at garbage
collections ideally from the completeness
perspective, the garbage collection should collect all garbage in the memory, though
it is not always possible or desirable. However it is never advised to collect
the complete heap from performance perspective. Thus the heap can be divided in
generations and then collect each generation.
Taking a look further, the object which turns garbage
after a garbage collection is started is collected in next cycle. This object
is called as floating garbage. Thus
completeness cannot be the only property to look.
4. Pause Time :
It is always desired to
have less interference of garbage collector to the program execution. Many
garbage collectors introduce a pause time in the program execution as they stop
the memory operation during collection. It is always expected to have very less
pause time. Thus one of the approaches is to have generational garbage
collector where the heap is divided in generations. The more frequent and quick
collection takes place on the nursery generation and then very few collections
on old generation. This allows reducing the pause time. Still you need to deal
with the sizes of nursery and old generation to make sure that we achieve min
pause time.
Parallel garbage collectors
stop the world to perform collection but reduce the pause time by employing
multiple threads. Concurrent and incremental collectors aim to reduce pause times
still further by occasionally performing a small quantum of collection work
interleaved or in parallel with memory allocations.
We will take a look at other Garbage
Collection algorithms/policies in forthcoming post.