SQL – Other Things

Rails Migration to Change string to boolean (PostgreSQL)

Adam Zolo — Fri, 19 Jan 2024 20:47:21 +0000

When you run the migration to change the the column type from string to boolean, you may encounter this kind of error:

PG::DatatypeMismatch: ERROR:  column "blah" cannot be cast automatically to type boolean
HINT:  You might need to specify "USING blah::boolean".

This just tells you that you need a rule to convert your string to boolean. You can fix with using synthax. For example, if you want all columns to change to false:

change_table :table_name do |t|
  t.change :column_name, :boolean, using: 'false', default: false, null: false
end

Or if you want to convert your existing values from your column, you could do something like this:

change_table :table_name do |t|
  t.change :column_name, :boolean, using: 'cast(column_name as boolean)', default: false, null: false
end

Comparing PostgreSQL timestamps without timezones with dates with a timezone

Adam Zolo — Thu, 12 Jan 2023 23:04:33 +0000

Problem: you have a timestamp field and you need to compare it with something that has a timezone.

Let’s assume your database default timezone is UTC (can check it with show timezone;). Thus, we are making the assumption that your dates are stored in UTC.

Since timestamp does not have any timezone data, we first need to read it in UTC, so it adds the timezone data. Then we can convert it to the desired timezone:

select timestamp_field AT TIME ZONE 'UTC' AT TIME ZONE 'America/New_York'

Now, we have the timezone and proper daylight-saving offset.

Just for fun, let’s check if the current hour matches the hour from the saved timestamp field in the ‘America/New_York’ timezone:

EXTRACT ( HOUR FROM timestamp_field at time zone 'UTC' at time zone 'America/New_York') = EXTRACT ( HOUR FROM now() at time zone 'America/New_York')

Notice, that we do not read now() in UTC timezone first, because our DB default timezone is UTC and now() already has all of the timezone data.

The Cost of PostgreSQL Foreign Keys

Adam Zolo — Thu, 10 Feb 2022 22:38:00 +0000

Acknowledgments: shout out to Steven Jones at Syncro for helping me better understand how Foreign Keys work in PostgreSQL.

Foreign keys are great for maintaining data integrity. However, they are not free.

As mentioned in this blog post by Shaun Thomas:

In PostgreSQL, every foreign key is maintained with an invisible system-level trigger added to the source table in the reference. At least one trigger must go here, as operations that modify the source data must be checked that they do not violate the constraint.
https://bonesmoses.org/2014/05/14/foreign-keys-are-not-free/

What this means is that when you modify your parent table, even if you don’t touch any of the referred keys, these triggers are still fired.

In the original post from 2014, the overhead made the updates up to 95% slower with 20 foreign keys. Let’s see if things changed since then.

These are slightly updated scripts we’ll use for testing

CREATE OR REPLACE FUNCTION fnc_create_check_fk_overhead(key_count INT)
RETURNS VOID AS
$$
DECLARE
  i INT;
BEGIN
  CREATE TABLE test_fk
  (
    id   BIGINT PRIMARY KEY,
    junk VARCHAR
  );

  INSERT INTO test_fk
  SELECT generate_series(1, 100000), repeat(' ', 20);

  CLUSTER test_fk_pkey ON test_fk;

  FOR i IN 1..key_count LOOP
    EXECUTE 'CREATE TABLE test_fk_ref_' || i || 
            ' (test_fk_id BIGINT REFERENCES test_fk (id))';
						
  END LOOP;

END;
$$ LANGUAGE plpgsql VOLATILE;


CREATE OR REPLACE FUNCTION fnc_check_fk_overhead(key_count INT)
RETURNS VOID AS
$$
DECLARE
  i INT;
BEGIN
  FOR i IN 1..100000 LOOP
    UPDATE test_fk SET junk = '    blah                '
     WHERE id = i;
  END LOOP;

END;
$$ LANGUAGE plpgsql VOLATILE;


CREATE OR REPLACE FUNCTION clean_up_overhead(key_count INT)
RETURNS VOID AS
$$
DECLARE
  i INT;
BEGIN
  DROP TABLE test_fk CASCADE;

  FOR i IN 1..key_count LOOP
    EXECUTE 'DROP TABLE test_fk_ref_' || i;
  END LOOP;
END;
$$ LANGUAGE plpgsql VOLATILE;

To validate that the overhead is caused strictly by the presence of the foreign keys, and not from the cost of looking up the child records, after the first benchmark, we’ll modify the first function and add indexes on each foreign key:

CREATE OR REPLACE FUNCTION fnc_create_check_fk_overhead(key_count INT)
RETURNS VOID AS
$$
DECLARE
  i INT;
BEGIN
  CREATE TABLE test_fk
  (
    id   BIGINT PRIMARY KEY,
    junk VARCHAR
  );

  INSERT INTO test_fk
  SELECT generate_series(1, 100000), repeat(' ', 20);

  CLUSTER test_fk_pkey ON test_fk;

  FOR i IN 1..key_count LOOP
    EXECUTE 'CREATE TABLE test_fk_ref_' || i || 
            ' (test_fk_id BIGINT REFERENCES test_fk (id))';
						
		EXECUTE 'CREATE index test_fk_ref_index_' || i ||
            ' on test_fk_ref_' || i || '(test_fk_id)';
  END LOOP;

END;
$$ LANGUAGE plpgsql VOLATILE;

We’ll run on on Mac i9, 2.3 GHz 8-Core, 64 GB Ram

PostgreSQL version 12

-- without an index

select fnc_create_check_fk_overhead(0);
SELECT fnc_check_fk_overhead(0); -- 2.6-2.829
select clean_up_overhead(0)


select fnc_create_check_fk_overhead(20);
SELECT fnc_check_fk_overhead(20); -- 3.186-3.5. ~20% drop
select clean_up_overhead(20)


-- after updating our initial function to add an index for each foreign key:
select fnc_create_check_fk_overhead(0);
SELECT fnc_check_fk_overhead(0); -- 2.6-2.8
select clean_up_overhead(0)

select fnc_create_check_fk_overhead(20);
SELECT fnc_check_fk_overhead(20); -- 3.1 same ~20% drop
select clean_up_overhead(20)

As we see from the benchmark, the drop in update performance on a parent table is about 20% after adding 20 tables with a foreign key to the parent. It’s not quite as bad as 95% in the original post, but the overhead is still clearly there.

Docker MySQL with a Custom SQL Script for Development

Adam Zolo — Tue, 04 Jan 2022 15:07:10 +0000

The setup is similar to setting up MariaDB.

Start with standard docker-compose file. If using custom SQL mode, specify the necessary options in the command options:

version: "3.7"
services:
    mysql:
        build:
            context: .
            dockerfile: dev.dockerfile
        restart: always
        command: --sql_mode="STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE"
        environment:
            MYSQL_ROOT_PASSWORD: root_password
            MYSQL_DATABASE: dev
            MYSQL_USER: dev_user
            MYSQL_PASSWORD: dev_password
        ports:
            - 3306:3306

Add dev.dockerfile:

FROM mysql:8.0.17

ADD init.sql /docker-entrypoint-initdb.d/ddl.sql

Finally, add your init.sql file. Let’s give all privileges to our dev_user and switch the default caching_sha2_password to mysql_native_password (don’t do it unless you rely on older packages that require the less secure au

GRANT ALL PRIVILEGES ON *.* TO 'dev_user'@'%';
ALTER USER 'dev_user'@'%' IDENTIFIED WITH mysql_native_password BY 'dev_password';

If you want to access the database container from other containers, while running them separately, you can specify host.docker.internal as the host address of your database.

Getting Min Max ids after splitting SQL table into n equal parts using ntile

Adam Zolo — Wed, 10 Jun 2020 13:11:10 +0000

This will split your_table into 10 equal parts, and give you the minimum and maximum id for each part:

with cte as
(
select
id,
ntile(10) over(order by a.id) as bucket_id
from your_table as a
group by a.id
)

select bucket_id, min(id), max(id)
from cte
group by bucket_id
order by bucket_id

Removing Duplicate Data from SQL Query Caused by New Line Character

Adam Zolo — Mon, 16 Sep 2013 21:08:03 +0000

We had a query retrieving data from a linked Oracle server. We needed unique rows only. This is the original query:

SELECT Column
     FROM OPENQUERY( LinkedServer, 'SELECT DISTINCT Column from TABLE;' );

However, this still returned some duplicates despite using DISTINCT. As it turned out, some rows had a new line character in them. The solution:

SELECT distinct replace(replace(Column,CHAR(13),''),CHAR(10),'')
     FROM OPENQUERY( LinkedServer, 'SELECT DISTINCT Column from TABLE;' )