Saturday, July 14, 2018

Concurrency in Python - Multiprocessing

Concurrency in Python - Multiprocessing

In this chapter, we will focus more on the comparison between multiprocessing and multithreading.

Multiprocessing

It is the use of two or more CPUs units within a single computer system. It is the best approach to get the full potential from our hardware by utilizing full number of CPU cores available in our computer system.

Multithreading

It is the ability of a CPU to manage the use of operating system by executing multiple threads concurrently. The main idea of multithreading is to achieve parallelism by dividing a process into multiple threads.
The following table shows some of the important differences between them −
Sr.No.MultiprocessingMultiprogramming
1Multiprocessing refers to processing of multiple processes at same time by multiple CPUs.Multiprogramming keeps several programs in main memory at the same time and execute them concurrently utilizing single CPU.
2It utilizes multiple CPUs.It utilizes single CPU.
3It permits parallel processing.Context switching takes place.
4Less time taken to process the jobs.More Time taken to process the jobs.
5It facilitates much efficient utilization of devices of the computer system.Less efficient than multiprocessing.
6Usually more expensive.Such systems are less expensive.

Eliminating impact of global interpreter lock (GIL)

While working with concurrent applications, there is a limitation present in Python called the GIL (Global Interpreter Lock). GIL never allows us to utilize multiple cores of CPU and hence we can say that there are no true threads in Python. GIL is the mutex – mutual exclusion lock, which makes things thread safe. In other words, we can say that GIL prevents multiple threads from executing Python code in parallel. The lock can be held by only one thread at a time and if we want to execute a thread then it must acquire the lock first.
With the use of multiprocessing, we can effectively bypass the limitation caused by GIL −
  • By using multiprocessing, we are utilizing the capability of multiple processes and hence we are utilizing multiple instances of the GIL.
  • Due to this, there is no restriction of executing the bytecode of one thread within our programs at any one time.

Starting Processes in Python

The following three methods can be used to start a process in Python within the multiprocessing module −
  • Fork
  • Spawn
  • Forkserver

Creating a process with Fork

Fork command is a standard command found in UNIX. It is used to create new processes called child processes. This child process runs concurrently with the process called the parent process. These child processes are also identical to their parent processes and inherit all of the resources available to the parent. The following system calls are used while creating a process with Fork −
  • fork() − It is a system call generally implemented in kernel. It is used to create a copy of the process.p>
  • getpid() − This system call returns the process ID(PID) of the calling process.

Example

The following Python script example will help you understabd how to create a new child process and get the PIDs of child and parent processes −
import os

def child():
   n = os.fork()
   
   if n > 0:
      print("PID of Parent process is : ", os.getpid())

   else:
      print("PID of Child process is : ", os.getpid())
child()

Output

PID of Parent process is : 25989
PID of Child process is : 25990

Creating a process with Spawn

Spawn means to start something new. Hence, spawning a process means the creation of a new process by a parent process. The parent process continues its execution asynchronously or waits until the child process ends its execution. Follow these steps for spawning a process −
  • Importing multiprocessing module.
  • Creating the object process.
  • Starting the process activity by calling start() method.
  • Waiting until the process has finished its work and exit by calling join() method.

Example

The following example of Python script helps in spawning three processes
import multiprocessing

def spawn_process(i):
   print ('This is process: %s' %i)
   return

if __name__ == '__main__':
   Process_jobs = []
   for i in range(3):
   p = multiprocessing.Process(target = spawn_process, args = (i,))
      Process_jobs.append(p)
   p.start()
   p.join()

Output

This is process: 0
This is process: 1
This is process: 2

Creating a process with Forkserver

Forkserver mechanism is only available on those selected UNIX platforms that support passing the file descriptors over Unix Pipes. Consider the following points to understand the working of Forkserver mechanism −
  • A server is instantiated on using Forkserver mechanism for starting new process.
  • The server then receives the command and handles all the requests for creating new processes.
  • For creating a new process, our python program will send a request to Forkserver and it will create a process for us.
  • At last, we can use this new created process in our programs.

Daemon processes in Python

Python multiprocessing module allows us to have daemon processes through its daemonic option. Daemon processes or the processes that are running in the background follow similar concept as the daemon threads. To execute the process in the background, we need to set the daemonic flag to true. The daemon process will continue to run as long as the main process is executing and it will terminate after finishing its execution or when the main program would be killed.

Example

Here, we are using the same example as used in the daemon threads. The only difference is the change of module from multithreading to multiprocessing and setting the daemonic flag to true. However, there would be a change in output as shown below −
import multiprocessing
import time

def nondaemonProcess():
   print("starting my Process")
   time.sleep(8)
   print("ending my Process")
def daemonProcess():
   while True:
   print("Hello")
   time.sleep(2)
if __name__ == '__main__':
   nondaemonProcess = multiprocessing.Process(target = nondaemonProcess)
   daemonProcess = multiprocessing.Process(target = daemonProcess)
   daemonProcess.daemon = True
   nondaemonProcess.daemon = False
   daemonProcess.start()
   nondaemonProcess.start()

Output

starting my Process
ending my Process
The output is different when compared to the one generated by daemon threads, because the process in no daemon mode have an output. Hence, the daemonic process ends automatically after the main programs end to avoid the persistence of running processes.

Terminating processes in Python

We can kill or terminate a process immediately by using the terminate()method. We will use this method to terminate the child process, which has been created with the help of function, immediately before completing its execution.

Example

import multiprocessing
import time
def Child_process():
   print ('Starting function')
   time.sleep(5)
   print ('Finished function')
P = multiprocessing.Process(target = Child_process)
P.start()
print("My Process has terminated, terminating main thread")
print("Terminating Child Process")
P.terminate()
print("Child Process successfully terminated")

Output

My Process has terminated, terminating main thread
Terminating Child Process
Child Process successfully terminated
The output shows that the program terminates before the execution of child process that has been created with the help of the Child_process() function. This implies that the child process has been terminated successfully.

Identifying the current process in Python

Every process in the operating system is having process identity known as PID. In Python, we can find out the PID of current process with the help of the following command −
import multiprocessing
print(multiprocessing.current_process().pid)

Example

The following example of Python script helps find out the PID of main process as well as PID of child process −
import multiprocessing
import time
def Child_process():
   print("PID of Child Process is: {}".format(multiprocessing.current_process().pid))
print("PID of Main process is: {}".format(multiprocessing.current_process().pid))
P = multiprocessing.Process(target=Child_process)
P.start()
P.join()

Output

PID of Main process is: 9401
PID of Child Process is: 9402

Using a process in subclass

We can create threads by sub-classing the threading.Thread class. In addition, we can also create processes by sub-classing the multiprocessing.Process class. For using a process in subclass, we need to consider the following points −
  • We need to define a new subclass of the Process class.
  • We need to override the _init_(self [,args] ) class.
  • We need to override the of the run(self [,args] ) method to implement what Process
  • We need to start the process by invoking thestart() method.

Example

import multiprocessing
class MyProcess(multiprocessing.Process):
   def run(self):
   print ('called run method in process: %s' %self.name)
   return
if __name__ == '__main__':
   jobs = []
   for i in range(5):
   P = MyProcess()
   jobs.append(P)
   P.start()
   P.join()

Output

called run method in process: MyProcess-1
called run method in process: MyProcess-2
called run method in process: MyProcess-3
called run method in process: MyProcess-4
called run method in process: MyProcess-5

Python Multiprocessing Module – Pool Class

If we talk about simple parallel processing tasks in our Python applications, then multiprocessing module provide us the Pool class. The following methods of Pool class can be used to spin up number of child processes within our main program

apply() method

This method is similar to the.submit()method of .ThreadPoolExecutor.It blocks until the result is ready.

apply_async() method

When we need parallel execution of our tasks then we need to use theapply_async()method to submit tasks to the pool. It is an asynchronous operation that will not lock the main thread until all the child processes are executed.

map() method

Just like the apply() method, it also blocks until the result is ready. It is equivalent to the built-in map() function that splits the iterable data in a number of chunks and submits to the process pool as separate tasks.

map_async() method

It is a variant of the map() method as apply_async() is to the apply()method. It returns a result object. When the result becomes ready, a callable is applied to it. The callable must be completed immediately; otherwise, the thread that handles the results will get blocked.

Example

The following example will help you implement a process pool for performing parallel execution. A simple calculation of square of number has been performed by applying the square() function through the multiprocessing.Pool method. Then pool.map() has been used to submit the 5, because input is a list of integers from 0 to 4. The result would be stored in p_outputs and it is printed.
def square(n):
   result = n*n
   return result
if __name__ == '__main__':
   inputs = list(range(5))
   p = multiprocessing.Pool(processes = 4)
   p_outputs = pool.map(function_square, inputs)
   p.close()
   p.join()
   print ('Pool :', p_outputs)

Output

Pool : [0, 1, 4, 9, 16]

No comments:

Post a Comment