steebe - HOME

May 9, 2023

Below are helpful commands I don't want to forget for Redshift admin and use. Those interested in a small torching of Redshift as a product are welcome to read the sibling post.


Querying

A Distinct "DISTINCT"

PostgreSQL users might be familiar with the ability to use DISTINCT as an expression evaluator, rather than a clause, to reduce the result set down to records that only include the value evaluated by the expression.

MySQL users might be used to leveraging GROUP BY in a different way than normal to do the same.

I.E: if I have a table, orders:

idcustomerId
11
21
31

Our first customer was keeping the lights on, and proved that the orders table has a many:one relationship with the customers table.

Suppose we need to pull the latest order for each customer, in a world where there is not just a sole customer placing orders. Additionally, the context of the report involves many other fields are being aggregated across dozens of other tables. LIMIT 1 is not coming to the rescue.

Postgres allows for:

SELECT
    DISTINCT ON(customerId) customerId,
    id
FROM orders
ORDER BY customerId, id DESC;

Okay, cool. Weird, but cool. The developer specifies they only want unique customerId values in their results, but want the respective order information along with it. Postgres constructs groups based on the argument provided to DISTINCT ON, then omits any records that break the unique value specified. The ORDER BY gives the developer control over precedence on which distinct records to honor.

MySQL perhaps helps visualize this better, literally making the developer create the groups themselves: ORDER BY:

WITH disctinct_customer_id_orders AS (
    SELECT
        id
    FROM orders
    GROUP BY customerId
)
SELECT * FROM distinct_customer_id_orders ORDER BY id DESC;

It's still not perfect, either, because the ORDER BY should happen in the first query, but hey, MySQL tries.

Redshift does neither...

...yet I don't hate how Redshift achieves the same outcome, strictly from the developer experience side of things:

SELECT
    *
FROM (
    SELECT
        id,
        customerId,
        RANK() OVER (PARTITION BY customerId ORDER BY id DESC) AS customer_id_rank
    FROM orders
) as ranked_orders
WHERE ranked_orders.customer_id_rank = 1;

Where Redshift could have continued reinventing an already finished, rehashed, re-spoked wheel of relational-algebra powered query engines, it instead took the modest route.

The developer is welcome to rewrite GROUP BY on Amazon's behalf. Also, the developer gets a bonus column in their toolkit to play with. It's okay that the result set grew though, because it's powered by a warehouse meant for Big Data!

Administering

Groups & Users

-- Create groups
CREATE GROUP MY_GROUP [WITH USER (user1, user2,...)];

-- View all user groups (as cluster admin)
SELECT * from PG_GROUP;

-- View all users in a group
SELECT usename FROM pg_user, pg_group
WHERE pg_user.usesysid = ANY(pg_group.grolist)
  AND pg_group.groname = '<group-name>';

-- Apply to entire schema
GRANT ALL ON SCHEMA my_schema TO GROUP my_group;

-- Apply to specific tables
GRANT SELECT ON TABLE my_table TO my_user;

-- Apply read-only access to specific (or all) tables in a single schema
GRANT SELECT ON ALL TABLES IN SCHEMA my_schema TO my_user;

-- Create user
CREATE USER my_user PASSWORD '<pass-string>';

-- Create user under specific group
CREATE USER my_user IN GROUP my_group PASSWORD 'pass';

-- Alter existing user
ALTER USER my_user PASSWORD 'new-pass';

Cluster Metadata

-- View tables within a Schema
select distinct tablename from pg_table_def where schemaname = '<schema-name>';

-- View all schemas in a cluster
select * from pg_namespace;

-- View connection activity within a cluster
select * from stl_connection_log
where recordtime > '2021-02-07 00:00:00' and username in ('user1', 'user2')
order by recordtime DESC;