Database Engineering

How Database tables & Indexes are stored in Disk

brian | Published: March 6, 2024, 4:02 p.m. | Updated: May 25, 2025, 6:25 p.m.

Profile Picture

 

In PostgreSQL and other databases, they contain a "row id" and it's used to uniquely identify the entire row. 

Note: In mysql and other databases the row id is the same as the primary key.

 

Pages:Pages are used to organize and store data within the database, including table data, indexes, and other database objects. Note each page in PostgreSQL is 8kb by default but can be modifed. 

1. When a database reads, it will read the entire page, and in that page there will be many rows included. For example, lets say the database reads page 1, that means if page 1 has 50 rows, it will read all 50 rows

 

IO (Input Output): When we are reading from our database we are performing a read request from the disk. For faster queries it is best to minimize the usage of io's as it is expensive to perform. One way to reduce the io usage is to cache frequently accessed data in memory.

Heap: A collection of pages that represent points to your data table it contains everything about the table, all the pages (EVERYTHING!)

   1. Traversing the heap is expensive because there is so much data.

   2. This is where indexes come into play because they help us find which part of the heap we need to read, or what pages of the heap we need to retreive.

 

Index: An index in a database is a data structure seperate from the heap that provides a way to quickly locate specific rows in the heap without having to scan every row sequentially. An index has pointers to the heap

   * You can index one or more columns

   * When searching, the index tells you the exact page to fetch in the heap

   * Like the heap, the index is also stored in pages

   * Indexes use b-tree

 

Django: Django's ORM automatically creates some indexes for certain types of fields, such as primary keys (id fields) and foreign keys.In django, the filter() and .order_by() methods,  automatically translates into SQL queries that utilize indexes.