-
Notifications
You must be signed in to change notification settings - Fork 70
2. NumPy Arrays vs. Python Lists
A Python package, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.
Numpy Array | Python List |
---|---|
Numerical operations | General purpose |
Fixed size | Dynamic size |
Homogeneous | Heterogeneous |
Fast | Slow |
Less memory | More memory |
Slicing returns views | Slicing returns copies |
Broadcasting | No broadcasting |
Vectorized operations | Non-vectorized operations |
The main aim for using NumPy Arrays instead of Python Lists is get better performance in terms of fast and less memory consuming operations. Take a look at the situations where and when the NumPy Arrays have this performance improvement.
Let's write a decorator to measure the time and memory usage of any function:
def performance(func):
def _performance(*args, **kwargs):
start = time.perf_counter()
memory = memory_profiler.memory_usage((func, args, kwargs))
end = time.perf_counter()
print(f"Time taken: {end - start:.2f}s")
print(f"Memory used: {max(memory) - min(memory):.2f}MiB")
return _performance
Now we can compare two functions which are written to create a NumPy Array and Python List, respectively.
@performance
def create_numpy_array():
return np.arange(10000000)
@performance
def create_list():
return list(range(10000000))
create_numpy_array()
create_list()
We get the following output which shows, at least with this size, NumPy Array is a little bit faster but consumes significantly less memory.
Time taken: 0.89s
Memory used: 0.00MiB
Time taken: 0.93s
Memory used: 301.73MiB
However, the performance improvement is not a permanent output. Keep in mind that the size of a NumPy Array is immutable, so if you want to add a new number to your existing NumPy Array, the performance issue will be completely reversed.
@performance
def initialize_and_append_numpy_array():
arr = np.array([])
for i in range(10000000):
arr = np.append(arr, i)
return arr
@performance
def initialize_and_append_list():
arr = []
for i in range(10000000):
arr.append(i)
return arr
Output:
Time taken: 220.58s
Memory used: 160.91MiB
Time taken: 0.61s
Memory used: 32.12MiB
In conclusion, it is not a good idea to use NumPy Array in the stages that you want to change the size. Therefore, build your data and fix the size, then just before your calculations, be sure that your transition to NumPy Array is complete.