Python Multithreading vs. Java Multithreading - Important Considerations for High Performance Programming

Multithreading is critical for production applications because it unlocks features and capabilities that are otherwise out of reach. Most programming languages have support for multithreading, but it can be hard to decide which language is the best for writing your application.

In this article, we will look at concurrency using multithreaded programming in two of the most popular languages, Python and Java. By comparing the APIs and capabilities available with each language, and looking at code examples, you will be able to make the best choice based on your end-goal and performance requirements.

 
 

Introduction to Concurrency and Multithreading

Concurrency allows different parts of your application to be executed independently at the same time. This is typically implemented using multithreading, which uses the multi-tasking functionality of the operating system to execute multiple tasks, either on the same CPU core or on separate cores. Each unit of execution is known as a thread (or a Thread of Execution)

It’s important to note, that when we say multitasking, what we mean is that each task is making progress towards its completion. However, there’s no guarantee that at any moment, both of those tasks are being worked on at the same time.

In the diagram shown above, the program’s main thread starts two child threads, which take varying amounts of time to complete. In addition, the program continues being executed in the main thread. But after some time, the program waits for the threads to “join”, meaning that execution of the main thread is halted until the child threads complete execution. 

Multithreading in Python

Multithreaded programs in Python are typically implemented using the built-in threading module. This module provides an easy-to-use API for creating and managing threads. 

For example, here is a Python script implementing a simple multithreaded program, as shown the in the introduction diagram:

Example Multithreading with Python:

import threading
import time

def delayed_print(delay):
    time.sleep(delay)
    print(f"Hello after {delay} seconds!")

thread_one = threading.Thread(target=delayed_print, args=(5,))
thread_two = threading.Thread(target=delayed_print, args=(3,))

thread_one.start()
thread_two.start()

thread_one.join()
thread_two.join()

Output

Hello after 3 seconds!
Hello after 5 seconds!

This program demonstrates how multiple threads are executed simultaneously because even though thread_one (the 3-second delay) is started after thread_two (the 5-second delay), thread_one finishes first.

Python’s Multithreading Limitation - Global Interpreter Lock

For high-performance workloads, the program should process as much data as possible. Unfortunately, in CPython, the standard interpreter of the Python language, a mechanism known as the Global Interpreter Lock (GIL) obstructs Python code from running in multiple threads at the same time.

In other words, even if our computer has 64 CPU cores which allow multiple tasks to be run in parallel, at the same time, a Multithreaded application written in Python, will NOT be able to take any advantage of them and will run as fast (or slow) as if it ran on a single-core computer.

There is still a benefit for using multithreading with Python, but it is limited for workloads that involve simple multitasking, such as waiting for external resources (like network requests) or sleeping.

Multiprocessing in Python for High Performance

For Python programs where high throughput using true parallelism is desired, the best option may be another built-in module, called multiprocessing. This module allows you to spawn additional instances of the entire Python interpreter as separate processes, to sidestep the limitations imposed by the GIL. 

For example:

import multiprocessing
import time

def delayed_print(delay):
    cpu_intensive_work()
    print(f"Hello after {delay} seconds!")

if __name__ == '__main__':
    process_one = multiprocessing.Process(target=delayed_print, args=(5,))
    process_two = multiprocessing.Process(target=delayed_print, args=(3,))

    process_one.start()
    process_two.start()

    process_one.join()
    process_two.join()

In the above example, the work done in the cpu_intensive_work() method within process_one will have a high chance to run in parallel to the work performed by process_two.

Unfortunately, while this solution allows for higher performance through true parallelism, it imposes a critical limitation of its own. Sharing data among separate processes is extremely limited. For example, if you try to access a shared object from different processes, an error such as this may happen:

Sharing an Object Among Multiple Processes in Python - Error

# This does not work!

import multiprocessing
import time

shared_list = []

def access_list():
    print(shared_list[0]) # Print first element in list

if __name__ == '__main__':
    shared_list.append("abc") # Append "abc" to the list

    process = multiprocessing.Process(target=access_list)
    process.start()
    process.join()

Output

Output:
Traceback (most recent call last):
  File "...", line 315, in _bootstrap
     self.run()
  File "...", line 108, in run
     self._target(*self._args, **self._kwargs)
  File "...", line 7, in access_list
     print(shared_list[0]) # Print first element in list
IndexError: list index out of range

While the multiprocessing module includes a few mechanisms for sharing data between processes, as shown it is nonetheless limiting and introduces unnecessary overhead. Additionally, multiprocessing on its own has an inherent overhead in execution time and memory usage. 

Multithreading and High-Performance with Java

Similar to Python, the Java programming language includes an easy-to-use multithreading API

Unlike Python, though, the JVM is not limited by the Global Interpreter Lock (GIL). All mainstream implementations of the Java Virtual Machine, and Oracle’s Hotspot in particular support full parallel execution of multiple threads out-of-the-box, using a simple and easy-to-use API. 

If you want to get started with Java Multithreading and learn how to achieve the highest performance your computer can offer, then follow this link and check out the Java Multithreading, Concurrency & Performance Optimization.

In this course, you will learn how to write real-world, production-grade multithreaded Java applications, algorithms, and libraries. 

Throughout the course, you will follow industry best practices and patterns for high-performance programming in Java, and avoid all the common pitfalls and challenges which come with concurrent programming. 

More Articles

Previous
Previous

Top 5 Books for Software Engineers and Software Architects

Next
Next

Java PriorityBlockingQueue - Thread-Safe and Memory Efficient Concurrent Heap