Category Archives: SQL

When you run the migration to change the the column type from string to boolean, you may encounter this kind of error:

PG::DatatypeMismatch: ERROR:  column "blah" cannot be cast automatically to type boolean
HINT:  You might need to specify "USING blah::boolean".

This just tells you that you need a rule to convert your string to boolean. You can fix with using synthax. For example, if you want all columns to change to false:

change_table :table_name do |t|
  t.change :column_name, :boolean, using: 'false', default: false, null: false
end

Or if you want to convert your existing values from your column, you could do something like this:

change_table :table_name do |t|
  t.change :column_name, :boolean, using: 'cast(column_name as boolean)', default: false, null: false
end

Comparing PostgreSQL timestamps without timezones with dates with a timezone

January 12, 2023

Adam Zolo

SQL No Comments

Problem: you have a timestamp field and you need to compare it with something that has a timezone.

Let’s assume your database default timezone is UTC (can check it with show timezone;). Thus, we are making the assumption that your dates are stored in UTC.

Since timestamp does not have any timezone data, we first need to read it in UTC, so it adds the timezone data. Then we can convert it to the desired timezone:

select timestamp_field AT TIME ZONE 'UTC' AT TIME ZONE 'America/New_York'

Now, we have the timezone and proper daylight-saving offset.

Just for fun, let’s check if the current hour matches the hour from the saved timestamp field in the ‘America/New_York’ timezone:

EXTRACT ( HOUR FROM timestamp_field at time zone 'UTC' at time zone 'America/New_York') = EXTRACT ( HOUR FROM now() at time zone 'America/New_York')

Notice, that we do not read now() in UTC timezone first, because our DB default timezone is UTC and now() already has all of the timezone data.

The Cost of PostgreSQL Foreign Keys

February 10, 2022

Adam Zolo

Performance SQL No Comments

Acknowledgments: shout out to Steven Jones at Syncro for helping me better understand how Foreign Keys work in PostgreSQL.

Foreign keys are great for maintaining data integrity. However, they are not free.

As mentioned in this blog post by Shaun Thomas:

In PostgreSQL, every foreign key is maintained with an invisible system-level trigger added to the source table in the reference. At least one trigger must go here, as operations that modify the source data must be checked that they do not violate the constraint.
https://bonesmoses.org/2014/05/14/foreign-keys-are-not-free/

What this means is that when you modify your parent table, even if you don’t touch any of the referred keys, these triggers are still fired.

In the original post from 2014, the overhead made the updates up to 95% slower with 20 foreign keys. Let’s see if things changed since then.

These are slightly updated scripts we’ll use for testing

CREATE OR REPLACE FUNCTION fnc_create_check_fk_overhead(key_count INT)
RETURNS VOID AS
$$
DECLARE
  i INT;
BEGIN
  CREATE TABLE test_fk
  (
    id   BIGINT PRIMARY KEY,
    junk VARCHAR
  );

  INSERT INTO test_fk
  SELECT generate_series(1, 100000), repeat(' ', 20);

  CLUSTER test_fk_pkey ON test_fk;

  FOR i IN 1..key_count LOOP
    EXECUTE 'CREATE TABLE test_fk_ref_' || i || 
            ' (test_fk_id BIGINT REFERENCES test_fk (id))';
						
  END LOOP;

END;
$$ LANGUAGE plpgsql VOLATILE;


CREATE OR REPLACE FUNCTION fnc_check_fk_overhead(key_count INT)
RETURNS VOID AS
$$
DECLARE
  i INT;
BEGIN
  FOR i IN 1..100000 LOOP
    UPDATE test_fk SET junk = '    blah                '
     WHERE id = i;
  END LOOP;

END;
$$ LANGUAGE plpgsql VOLATILE;


CREATE OR REPLACE FUNCTION clean_up_overhead(key_count INT)
RETURNS VOID AS
$$
DECLARE
  i INT;
BEGIN
  DROP TABLE test_fk CASCADE;

  FOR i IN 1..key_count LOOP
    EXECUTE 'DROP TABLE test_fk_ref_' || i;
  END LOOP;
END;
$$ LANGUAGE plpgsql VOLATILE;

To validate that the overhead is caused strictly by the presence of the foreign keys, and not from the cost of looking up the child records, after the first benchmark, we’ll modify the first function and add indexes on each foreign key:

CREATE OR REPLACE FUNCTION fnc_create_check_fk_overhead(key_count INT)
RETURNS VOID AS
$$
DECLARE
  i INT;
BEGIN
  CREATE TABLE test_fk
  (
    id   BIGINT PRIMARY KEY,
    junk VARCHAR
  );

  INSERT INTO test_fk
  SELECT generate_series(1, 100000), repeat(' ', 20);

  CLUSTER test_fk_pkey ON test_fk;

  FOR i IN 1..key_count LOOP
    EXECUTE 'CREATE TABLE test_fk_ref_' || i || 
            ' (test_fk_id BIGINT REFERENCES test_fk (id))';
						
		EXECUTE 'CREATE index test_fk_ref_index_' || i ||
            ' on test_fk_ref_' || i || '(test_fk_id)';
  END LOOP;

END;
$$ LANGUAGE plpgsql VOLATILE;

We’ll run on on Mac i9, 2.3 GHz 8-Core, 64 GB Ram

PostgreSQL version 12

-- without an index

select fnc_create_check_fk_overhead(0);
SELECT fnc_check_fk_overhead(0); -- 2.6-2.829
select clean_up_overhead(0)


select fnc_create_check_fk_overhead(20);
SELECT fnc_check_fk_overhead(20); -- 3.186-3.5. ~20% drop
select clean_up_overhead(20)


-- after updating our initial function to add an index for each foreign key:
select fnc_create_check_fk_overhead(0);
SELECT fnc_check_fk_overhead(0); -- 2.6-2.8
select clean_up_overhead(0)

select fnc_create_check_fk_overhead(20);
SELECT fnc_check_fk_overhead(20); -- 3.1 same ~20% drop
select clean_up_overhead(20)

As we see from the benchmark, the drop in update performance on a parent table is about 20% after adding 20 tables with a foreign key to the parent. It’s not quite as bad as 95% in the original post, but the overhead is still clearly there.

Docker MySQL with a Custom SQL Script for Development

January 4, 2022

Adam Zolo

docker SQL No Comments

The setup is similar to setting up MariaDB.

Start with standard docker-compose file. If using custom SQL mode, specify the necessary options in the command options:

version: "3.7"
services:
    mysql:
        build:
            context: .
            dockerfile: dev.dockerfile
        restart: always
        command: --sql_mode="STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE"
        environment:
            MYSQL_ROOT_PASSWORD: root_password
            MYSQL_DATABASE: dev
            MYSQL_USER: dev_user
            MYSQL_PASSWORD: dev_password
        ports:
            - 3306:3306

Add dev.dockerfile:

FROM mysql:8.0.17

ADD init.sql /docker-entrypoint-initdb.d/ddl.sql

Finally, add your init.sql file. Let’s give all privileges to our dev_user and switch the default caching_sha2_password to mysql_native_password (don’t do it unless you rely on older packages that require the less secure au

GRANT ALL PRIVILEGES ON *.* TO 'dev_user'@'%';
ALTER USER 'dev_user'@'%' IDENTIFIED WITH mysql_native_password BY 'dev_password';

If you want to access the database container from other containers, while running them separately, you can specify host.docker.internal as the host address of your database.

Getting Min Max ids after splitting SQL table into n equal parts using ntile

June 10, 2020

Adam Zolo

SQL No Comments

This will split your_table into 10 equal parts, and give you the minimum and maximum id for each part:

with cte as
(
select
id,
ntile(10) over(order by a.id) as bucket_id
from your_table as a
group by a.id
)

select bucket_id, min(id), max(id)
from cte
group by bucket_id
order by bucket_id

Removing Duplicate Data from SQL Query Caused by New Line Character

September 16, 2013

Adam Zolo

SQL No Comments

We had a query retrieving data from a linked Oracle server. We needed unique rows only. This is the original query:

SELECT Column
     FROM OPENQUERY( LinkedServer, 'SELECT DISTINCT Column from TABLE;' );

However, this still returned some duplicates despite using DISTINCT. As it turned out, some rows had a new line character in them. The solution:

SELECT distinct replace(replace(Column,CHAR(13),''),CHAR(10),'')
     FROM OPENQUERY( LinkedServer, 'SELECT DISTINCT Column from TABLE;' )

Other Things

Category Archives: SQL

Rails Migration to Change string to boolean (PostgreSQL)

Comparing PostgreSQL timestamps without timezones with dates with a timezone

The Cost of PostgreSQL Foreign Keys

Docker MySQL with a Custom SQL Script for Development

Getting Min Max ids after splitting SQL table into n equal parts using ntile

Removing Duplicate Data from SQL Query Caused by New Line Character