Quick start to Numpy
Before we actually dive into numpy let’s try to understand lists in python. Understanding how numpy is different from python lists is really important because once we get an idea like how python lists and numpy stores, access the data we will understand why numpy is efficient than python lists.
List in Python :
- List is a built in python data structure
- Used to store multiple values in a single variable
- Can store the heterogenous elements
- Internally list gets stored as an array
- List are mutable in nature, means they can grow or shrink in size
eg:
numbers = [1, 2, 3, 4, 5]
print(f"numbers = {numbers}, type = {type(numbers)}")print(f"type of elements = {type(numbers[0])}")print(f"memory required by each element to get stored = {sys.getsizeof(numbers[0])} bytes")print(f"total memory required = {sys.getsizeof(numbers[0]) * len(numbers)} bytes")
output:
import sysnumbers = [1, 2, 3, 4], type = <class'list'>type of elements = <class 'int'>memory required by each element to get stored = 28 bytestotal memory required = 140 bytes
Note: In the world of collections everything is an object
Numpy :
- Third party python package which has to be installed explicitly
- developed in c, c++, fortran
- faster alternative for python collections
- stores the data in optimized manner
- free and open source
- immutable
eg:
import numpy as np a1 = np.array([10, 20, 30, 40, 50])print(f"a1[0] = {a1[0]} type = {a1.dtype}")print(f"memory required to store one value = {a1.itemsize} bytes")print(f"length of numpy array = {a1.size}")print(f"total memory needed = {a1.size * a1.itemsize} bytes")
output:
a1[0] = 10 type = int64memory required to store one value = 8 byteslength of numpy array = 5total memory needed = 40 bytes
Now, if we notice the above two illustrations the amount of memory taken by numpy is very less compared with the python list. In python list values inside list gets stored as an reference object because they are collections. In numpy even further we can optimize the memory by explicitly mentioning the “dtype” property while creating a numpy array.
eg:
a1 = np.array([10, 20, 30, 40, 50], dtype='int64')print(f"a1[0] = {a1[0]} type = {a1.dtype}")print(f"memory required to store one value = {a1.itemsize} bytes")print(f"length of numpy array = {a1.size}")print(f"total memory needed = {a1.size * a1.itemsize} bytes")
print('-' * 50)
a1 = np.array([10, 20, 30, 40, 50], dtype='int32')print(f"a1[0] = {a1[0]} type = {a1.dtype}")print(f"memory required to store one value = {a1.itemsize} bytes")print(f"length of numpy array = {a1.size}")print(f"total memory needed = {a1.size * a1.itemsize} bytes")
print('-' * 50)
a1 = np.array([10, 20, 30, 40, 50], dtype='int16')print(f"a1[0] = {a1[0]} type = {a1.dtype}")print(f"memory required to store one value = {a1.itemsize} bytes")print(f"length of numpy array = {a1.size}")print(f"total memory needed = {a1.size * a1.itemsize} bytes")
print('-' * 50)
a1 = np.array([10, 20, 30, 40, 50], dtype='int8')print(f"a1[0] = {a1[0]} type = {a1.dtype}")print(f"memory required to store one value = {a1.itemsize} bytes")print(f"length of numpy array = {a1.size}")print(f"total memory needed = {a1.size * a1.itemsize} bytes")
output:
a1[0] = 10 type = int64memory required to store one value = 8 byteslength of numpy array = 5total memory needed = 40 bytes--------------------------------------------------a1[0] = 10 type = memory required to store one value = 4 byteslength of numpy array = 5total memory needed = 20 bytes--------------------------------------------------a1[0] = 10 type = int16memory required to store one value = 2 byteslength of numpy array = 5total memory needed = 10 bytes--------------------------------------------------a1[0] = 10 type = int8memory required to store one value = 1 byteslength of numpy array = 5total memory needed = 5 bytes
Why Numpy is designed to be fast ?
- The primary purpose of designing the numpy like package is to process large amounts of data.
- We usually get this large amount of data from various sources like databases, csv files etc.
- When we deal with such a large amount of data for processing we need to consider memory, performance while accessing the actual data.
Why Numpy is immutable ?
- Numpy is used for data processing and performing various stats on the data.
- Dealing with stats in the sense we read the data from some source and we apply certain methods like mean, mode on that data. So, in the process of applying stats we won’t do any modifications on the actual data then in that case making numpy mutable does not really make any sense.
I hope this blog helped you in understanding the numpy better
Thank you for reading this blog. If there are corrections in the above content please add them in the comments section.
Credits goes to : Amit Kulkarini