Efficient way to read large PostgreSQL table with Python

Rizwan Butt
2 min readJun 29, 2019
Photo by Simon-Pierre Gingras on Medium

There are a lot of use cases in which you want to read the PostgreSQL table and perform some processing over that data and move/use it somewhere.

Ohhh! Really is this a new thing?

Don’t worry, I know you all are well aware of these kind of situations. But let’s talk about reading a large PostgreSQL table which have got ~5M rows or more, and you don’t want to use any big data processing system like Apace Spark etc. To overcome this problem… (we will discuss the remaining story with coding)

Let’s dive into coding..

  1. Connect with Postgres database using psycopg2
import psycopg2connection = psycopg2.connect(
dbname='database',
user='postgres',
password='postgres',
host='localhsot',
port=5432
)

2. Create cursor using above connection object and assign an chunk size to this cursor. For example assigning 1000 to cursor size will load 1000 records in memory instead of loading full table into memory. So in auto iteration of cursor each iteration will have 1000 records in memory.

with con.cursor(name='custom_cursor') as cursor:     cursor.itersize = 1000 # chunk size

3. Now execute a query and loop over its records

query = 'SELECT * FROM mytbale;'
cursor.execute(query)
for row in cursor:
print(row)

Now let’s see the above the code chunks as one block to have a better idea of above code.

import psycopg2connection = psycopg2.connect(
dbname='database',
user='postgres',
password='postgres',
host='localhsot',
port=5432
)
with con.cursor(name='custom_cursor') as cursor:
cursor.itersize = 1000 # chunk size query = 'SELECT * FROM mytbale;'
cursor.execute(query)
for row in cursor:
print(row)

So using above code you can loop through the large PostgreSQL table with python script instead of using any big data processing system like Apache Spark, PrestoDB etc.

If you guys think this is the optimized solution then go for the clap or otherwise please share your suggestions that how we can further improve this thing.

If you have any questions, please feel free to leave a comment below.

--

--

Rizwan Butt

A Python developer with 5 years of broad expertise in Django, Flask, JavaScript, Back-end Development, and web scraping/automation areas.