Quick start to Pandas

Madhukar Vissapragada
2 min readJan 28, 2021

--

Pre-requisite : Numpy

What is Pandas ?

  • Pandas is a third party package which has to be installed explicitly
  • Built on top of Numpy package
  • Used in data analysis
  • Feature loaded compared to numpy
  • Open source and free
  • Can be downloaded from either Python Package Index or from Conda environment

Pandas has basically two types of data structures. They are Series and data frames

Series :

  • series are one dimensional array
  • Build’s an index array to store the indices and values array to store the values
  • Internally uses a numpy array
  • Value mutable but not size mutable

eg:

import pandas as pds1 = pd.Series([10, 20, 30, 40, 50])
print(s1)
print(f"shape = {s1.shape}")
print(f"size = {s1.size}")
print(f"dimensions = {s1.ndim}")
print(f"data type = {s1.dtype}")
print(id(s1))
print(id(s1[0]))
print(type(s1[0]))

output:

0    10
1 20
2 30
3 40
4 50
dtype: int64
shape = (5,)
size = 5
dimensions = 1
data type = int64
1751034379184
1751154510320
<class 'numpy.int64'>
series internal structure

Data Frame :

  • Multi dimensional array
  • Creates a kind of ordered list to store the data in table like structure
  • Stored in the form of rows and columns
  • every column represents a series

eg:

import pandas as pdpatients = [
{"name": "p1", "bp": 80, "temperature": 37, "infected": 1},
{"name": "p2", "bp": 50, "temperature": 33, "infected": 0},
{"name": "p3", "bp": 100, "temperature": 34, "infected": 1},
{"name": "p4", "bp": 75, "temperature": 35, "infected": 0}
]
df = pd.DataFrame(patients)
print(type(df['name']))
print(df.describe())
print('-' * 50)print(df.info())

output:

<class 'pandas.core.series.Series'>
bp temperature infected
count 4.000000 4.000000 4.00000
mean 76.250000 34.750000 0.50000
std 20.564938 1.707825 0.57735
min 50.000000 33.000000 0.00000
25% 68.750000 33.750000 0.00000
50% 77.500000 34.500000 0.50000
75% 85.000000 35.500000 1.00000
max 100.000000 37.000000 1.00000
--------------------------------------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 name 4 non-null object
1 bp 4 non-null int64
2 temperature 4 non-null int64
3 infected 4 non-null int64
dtypes: int64(3), object(1)
memory usage: 256.0+ bytes
None
Data frames internal structure

Data frame comes with many attributes like:

  • size
  • ndmin
  • index
  • values
  • shape

Data frame also comes with various functions like:

  • head()
  • tail()
  • describe()
  • info()
  • reshape()

I hope this blog really helped you in understanding the basics of Pandas

Thank you for reading this blog. If you find any corrections please post them in the comments section

Credits goes to: Amit Kulkarini

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Madhukar Vissapragada
Madhukar Vissapragada

No responses yet

Write a response