Database Engineering
brian | Published: March 7, 2024, 12:27 p.m. | Updated: May 25, 2025, 4:41 p.m.
ROW Column
ROW ORIENTED DATABASE:
1. Tables are stored as a row (like the name implies) on the disk
2. A single block io read to the table will retreieve multiple rows and all the columns. When we want to read a row, we will make an operation on disk called "io read", then the hard disk or ssd will read the data in blocks , and that's why you receive multiple rows and all of their columns.
We are working with this data for example
And its stored in blocks like this. Here we see that we can find all of Juans data, in one block.
3.designed to efficiently return data for an entire row, or record, in as few operations as possible.
4.Stores record's data in a single block on the disk, along with related records
5. Good at queries
6. Not good at using aggregation (function that operates on a set of values and returns a single value as a result) Ex: getting the average salary of all employees.
7. Good at writing/reading
8. Row-oriented databases are commonly used for transactional systems and OLTP (Online Transaction Processing) workloads, where individual records or transactions are frequently accessed and modified
Ex: an E-commerce store: processing orders, payments. Banking: Transfering money, receiving, etc. Healthcare: Patient records, appointments, etc..
NOTE: THIS IS WITHOUT USING INDEXES
1. We are trying to get the first name from the employee table where the social security number is 666
2. The search will begin at #1 (the database system iterates through the blocks sequentially ), we will get the first block, and in our example there are only 2 rows (there can be many but just to keep it simple we have 2)
3. We learn that there is no ssn number that matches 666 so we move on to the second block. In the second block we learn that there is also nothing matching the ssn and so we move to the third block
4. In the third block we have found the matching ssn. Now we have to retreieve the name since that's what we are trying to get. Note: We dont have to do any extra reading because like we mentioned before, when we read a block, we get all of the rows and columns, and in this case we have obtained the name already. Note 2: When we pulled that block it gets stored to memory/ram
COLUMN ORIENTED DATABASE:
1. Tables are stored as columns first in the disk.
2. in this example, all of the different IDS are stored in a block, and in another block all the different names are stored in one block, then state, etc....
Lets say we wanted to delete Juan. Well since Juans data is located in many blocks (first name, last name, id, state, etc), we now have to go through all the blocks containing Juans information to be able to delete. As you can see this is very inefficient compared to the row based because in row we would have all of Juans data in one block, and in column based we have his data scattered in different blocks.
3. A single block io read to the table retreives multiple columns with all matching rows
4.only the specific columns needed for the query are retrieved, rather than entire rows. This can lead to significant performance improvements, especially for analytical queries and aggregate operations (e.g obtaining the average age of all people)
5. less efficient for inserting new data and deleting
6. column-oriented databases are optimized for analytical workloads
7. Compression algorithms work best on Column based
1. If we delete delete a row for example 1004, now we have to delete it in evey block. This is why its not efficient at deleting or inserting.
2. Now lets say we want to sum all salaries, it would only take one read. This is great!