The Surprising Power of Humble Column Ordering in PostgreSQL

Hüseyin Demir
4 min readApr 1, 2024

--

Ever heard the one about how rearranging a few columns in a database table can make a table size shrink faster than a wheel of Gouda cheese at a Dutch cheese market?

Hi in this blog post we’re going to discuss a topic which effects table size. Therefore, we have to consider and evaluate while creating a new table in PostgreSQL.

The concept is called column ordering. Column ordering in PostgreSQL effects table size and query performance.

Column Ordering and Padding

Column ordering refers to the arrangement of columns within a table in a specific sequence. While the order of columns might seem trivial at first glance, it can have implications on database performance and storage efficiency, especially in scenarios involving frequent data retrieval and manipulation.

In PostgreSQL, altering the order of columns within a table doesn’t inherently affect the way data is stored on disk. Instead, it primarily influences the readability of queries, ease of maintenance, and potential optimizations such as index usage. It’s important to note that PostgreSQL stores data in a row-oriented manner, meaning each row occupies contiguous storage space regardless of column order.

Padding, on the other hand, refers to the process of adding extra bytes to data values to align them with storage boundaries, typically in fixed-width data types such as CHAR or VARCHAR. This alignment ensures that data retrieval operations can be performed more efficiently, as the database engine can access memory blocks in a structured manner without having to deal with irregular data sizes.

In PostgreSQL, certain data types, such as CHAR and VARCHAR, support padding to align data values with fixed storage sizes. By specifying a character length for these data types, PostgreSQL automatically pads shorter values with spaces to match the defined length, thereby optimizing storage and retrieval operations.

Column Ordering Example

The following table has some issues to solve about column ordering. For example,

  • there is a padding around 7 bytes between is_shipped and user_id
  • there is a padding around 4 bytes between order_total and order_dt
  • there is a padding around 6 bytes between order_type and ship_dt
  • there is a padding around 4 bytes between receive_dt and id
CREATE TABLE user_order (
is_shipped BOOLEAN NOT NULL DEFAULT false,
user_id BIGINT NOT NULL,
order_total NUMERIC NOT NULL,
order_dt TIMESTAMPTZ NOT NULL,
order_type SMALLINT NOT NULL,
ship_dt TIMESTAMPTZ,
item_ct INT NOT NULL,
ship_cost NUMERIC,
receive_dt TIMESTAMPTZ,
tracking_cd TEXT,
id BIGSERIAL PRIMARY KEY NOT NULL
);

Benefits of Column Ordering

  1. Reduced table size
  2. Less time consuming operations(ALTER,VACUUM etc..)

Column Ordering Benchmark

Create a table user_order_default and user_order_tweaked. The tweaked table is being constructed according to column ordering concern. Therefore it is expected that tweaked table is going to be a smaller table with same data and DDL.

CREATE TABLE user_order_default (
is_shipped BOOLEAN NOT NULL DEFAULT false,
user_id BIGINT NOT NULL,
order_total NUMERIC NOT NULL,
order_dt TIMESTAMPTZ NOT NULL,
order_type SMALLINT NOT NULL,
ship_dt TIMESTAMPTZ,
item_ct INT NOT NULL,
ship_cost NUMERIC,
receive_dt TIMESTAMPTZ,
tracking_cd TEXT,
id BIGSERIAL PRIMARY KEY NOT NULL
);

CREATE TABLE user_order_tweaked (
id BIGSERIAL PRIMARY KEY NOT NULL,
user_id BIGINT NOT NULL,
order_dt TIMESTAMPTZ NOT NULL,
ship_dt TIMESTAMPTZ,
receive_dt TIMESTAMPTZ,
item_ct INT NOT NULL,
order_type SMALLINT NOT NULL,
is_shipped BOOLEAN NOT NULL DEFAULT false,
order_total NUMERIC NOT NULL,
ship_cost NUMERIC,
tracking_cd TEXT
);

Populate table with records.

INSERT INTO user_order_default (
is_shipped, user_id, order_total, order_dt, order_type,
ship_dt, item_ct, ship_cost, receive_dt, tracking_cd
)
SELECT true, 1000, 500.00, now() - INTERVAL '7 days',
3, now() - INTERVAL '5 days', 10, 4.99,
now() - INTERVAL '3 days', 'X5901324123479RROIENSTBKCV4'
FROM generate_series(1, 1000000);

INSERT INTO user_order_tweaked (
is_shipped, user_id, order_total, order_dt, order_type,
ship_dt, item_ct, ship_cost, receive_dt, tracking_cd
)
SELECT true, 1000, 500.00, now() - INTERVAL '7 days',
3, now() - INTERVAL '5 days', 10, 4.99,
now() - INTERVAL '3 days', 'X5901324123479RROIENSTBKCV4'
FROM generate_series(1, 1000000);

Check table sizes

SELECT pg_relation_size('user_order_default') AS size_bytes,
pg_size_pretty(pg_relation_size('user_order_default')) AS size_pretty;

SELECT pg_relation_size('user_order_tweaked') AS size_bytes,
pg_size_pretty(pg_relation_size('user_order_tweaked')) AS size_pretty;



size_bytes | size_pretty
------------+-------------
141246464 | 135 MB
(1 row)

size_bytes | size_pretty
------------+-------------
117030912 | 112 MB
Table Size Comparision

Through conducting a comparative test with two tables, one created without considering column ordering and the other meticulously crafted to ensure optimal column ordering, we have gained valuable insights into the significance of this often-overlooked aspect of database design.

Our findings underscore the tangible impact that column ordering can have on table size and overall database performance. The table created with careful consideration of column ordering exhibited a smaller footprint compared to its counterpart. This reduction in size not only translates to more efficient storage utilization but also extends to associated indexes, potentially leading to faster query execution times and improved overall system performance.

By reviewing and optimizing column ordering before creating tables, database administrators and developers can harness the benefits of reduced table size and enhanced query performance. This proactive approach empowers them to design database schemas that are not only more efficient in terms of storage but also more conducive to optimal query execution.

In conclusion, while column ordering may seem like a subtle detail in database design, our test results emphasize its significant implications for database performance and storage efficiency in PostgreSQL. By incorporating thoughtful consideration of column ordering into database schema design practices, organizations can unlock tangible benefits in terms of reduced storage requirements, improved query performance, and ultimately, a more streamlined and responsive database environment.

Embrace the Dark Side as you delve into database optimization! Have questions or need guidance on your journey? Reach out to me on LinkedIn, connect on Twitter or Superpeer. May the Force guide us as we optimize our databases for greater efficiency.

Demir.

--

--

Responses (4)