Skip to content

2. NumPy Arrays vs. Python Lists

Bora Canbula edited this page Nov 25, 2023 · 1 revision

NumPy: Numerical Python

A Python package, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.

NumPy Arrays vs. Python Lists

Numpy Array Python List
Numerical operations General purpose
Fixed size Dynamic size
Homogeneous Heterogeneous
Fast Slow
Less memory More memory
Slicing returns views Slicing returns copies
Broadcasting No broadcasting
Vectorized operations Non-vectorized operations

The main aim for using NumPy Arrays instead of Python Lists is get better performance in terms of fast and less memory consuming operations. Take a look at the situations where and when the NumPy Arrays have this performance improvement.

Let's write a decorator to measure the time and memory usage of any function:

def performance(func):
    def _performance(*args, **kwargs):
        start = time.perf_counter()
        memory = memory_profiler.memory_usage((func, args, kwargs))
        end = time.perf_counter()
        print(f"Time taken: {end - start:.2f}s")
        print(f"Memory used: {max(memory) - min(memory):.2f}MiB")
    return _performance

Now we can compare two functions which are written to create a NumPy Array and Python List, respectively.

@performance
def create_numpy_array():
    return np.arange(10000000)

@performance
def create_list():
    return list(range(10000000))

create_numpy_array()
create_list()

We get the following output which shows, at least with this size, NumPy Array is a little bit faster but consumes significantly less memory.

Time taken: 0.89s
Memory used: 0.00MiB
Time taken: 0.93s
Memory used: 301.73MiB

However, the performance improvement is not a permanent output. Keep in mind that the size of a NumPy Array is immutable, so if you want to add a new number to your existing NumPy Array, the performance issue will be completely reversed.

@performance
def initialize_and_append_numpy_array():
    arr = np.array([])
    for i in range(10000000):
        arr = np.append(arr, i)
    return arr

@performance
def initialize_and_append_list():
    arr = []
    for i in range(10000000):
        arr.append(i)
    return arr

Output:

Time taken: 220.58s
Memory used: 160.91MiB
Time taken: 0.61s
Memory used: 32.12MiB

In conclusion, it is not a good idea to use NumPy Array in the stages that you want to change the size. Therefore, build your data and fix the size, then just before your calculations, be sure that your transition to NumPy Array is complete.