This is a copy of my personal PostgreSQL notes. I write down things I've had to repeatedly look up.
Last updated: 2021-03-05
Order of execution is different than the written order:
I conceptually organize these verbs into:
Generate source data
Filter
COUNT(varname) > 1
, MAX(varname) = 10
, etc.)Reduce / Calculate / Select
Return data
⇧ Back to TOC ◊ PostgreSQL: 13.0
Using [psql](https://www.postgresql.org/docs/current/app-psql.html)
or
[pgcli](https://www.pgcli.com/)
(interchangeable in the examples below), the
prototype:
pgsql -h [HOST_ADDRESS] -p [PORT] -U [USER_NAME] -d [DATABASE_NAME]
⇧ Back to TOC ◊ PostgreSQL: 13.0
SELECT * FROM pg_stat_activity;
Information returned: datid
, datname
, pid
, usesysid
, usename
,
application_name
, client_addr
, client_hostname
, client_port
,
backend_start
, xact_start
, query_start
, state_change
,
wait_event_type
, wait_event
, state
, backend_xid
, backend_xmin
,
query
, and backend_type
.
This command isn't useful at a CLI because the information returned is too large to display in a single screen.
⇧ Back to TOC ◊ PostgreSQL: 13.0
Get the most useful information about all current processes: pid
, age
,
usename
, and query
.
SELECT pid, age(clock_timestamp(), query_start), usename, query
FROM pg_stat_activity
WHERE query != '<IDLE>' AND query NOT ILIKE '%pg_stat_activity%'
ORDER BY query_start desc;
This is a useful query to issue at a CLI to see current activity.
⇧ Back to TOC ◊ PostgreSQL: 13.0
SELECT DISTINCT
myvar
FROM
myschema.mytable;
⇧ Back to TOC ◊ PostgreSQL: 13.0
Get the first row, by date, for each grouped id_hash
. There are
two critical components:
SELECT DISTINCT ON (group_variable)
: Groups by group_variable
and keeps
the first row ORDER BY
: Order the elements within the group so that the row of interest
occurs first-- Grouping by id_hash
SELECT DISTINCT ON (id_hash)
id_hash AS id,
journal_date
FROM
myschema.mytable
ORDER BY
id_hash,
-- Earliest date should go first in each group
journal_date DESC,
⇧ Back to TOC ◊ PostgreSQL: 13.0
COUNT(*)
counts all records, COUNT(varname)
only counts records with
non-null values in varname
. The difference is the number of NULL
values
SELECT
count(*) - count(jrn_day) AS numbernull
FROM
sandbox.amo_tbm_alpha_with_emp_attributes17;
⇧ Back to TOC ◊ PostgreSQL: 13.0
-- Create a sequence of dates from 2019-01-01 to 2020-06-01
SELECT
generate_series::date AS cdate
FROM
generate_series(
'2019-01-01'::timestamp,
'2020-06-01'::timestamp,
interval '1 day')
⇧ Back to TOC ◊ PostgreSQL: 13.0
SELECT
t.day::date
FROM
generate_series(timestamp '2004-03-07',
timestamp '2004-08-16',
interval '1 week') AS t(day);
⇧ Back to TOC ◊ PostgreSQL: 12.4
PostgreSQL time differences rely on combinations of DATE_PART()
functions which
extract the timestamp or interval subfield in the unit specified. For example:
date_part('hour', timestamp '2021-02-16 20:38:40')
\(\rightarrow\) 20
.date_part('hour', timestamp '2021-02-16 20:38:40' - timestamp '2020-12-15 12:35:01')
\(\rightarrow\) 63 days 08:03:39
\(\rightarrow\) 8
DATE_PART()
only extracts the field in the specific unit position (e.g. year,
hour, minute, etc.). Building the total time difference involves combining
DATE_PART()
components. For example, to find the total number of minutes
between two dates.
\[ \Delta\,(\text{Total Minutes}) = (\Delta\, \text{Days} \cdot 24 + \Delta \text{Hours}) \cdot 60 + \Delta \cdot \text{Minutes}\]
And the corresponding SQL code.
SELECT
(DATE_PART('day', enddatetime::timestamp - startdatetime::timestamp) * 24 +
DATE_PART('hour', enddatetime::timestamp - startdatetime::timestamp)
) * 60 +
DATE_PART('minute', enddatetime::timestamp - startdatetime::timestamp)
);
In the samples below startdate
and enddate
are dates, e.g. 2021-03-01
,
and starttimestamp
and endtimestamp
are timestamps, e.g. 2021-03-01 12:56:12
.
SELECT
DATE_PART('year', enddate) - DATE_PART('year', startdate);
⇧ Back to TOC ◊ PostgreSQL: 12.4
diffyears
is the result of the in YEARS query above.
SELECT
(DATE_PART('year', enddate - startdate) - DATE_PART('year', startdate)) * 12 +
(DATE_PART('month', enddate) - DATE_PART('month', startdate))
⇧ Back to TOC ◊ PostgreSQL: 12.4
SELECT
TRUNC(DATE_PART('day', startdate - enddate) / 7);
⇧ Back to TOC ◊ PostgreSQL: 12.4
SELECT
DATE_PART('day', enddate - startdate);
⇧ Back to TOC ◊ PostgreSQL: 12.4
diffdays
is the result of the in days query above.
SELECT
DATE_PART('day', enddate - startdate) * 24 + DATE_PART('hour', end - start);
⇧ Back to TOC ◊ PostgreSQL: 12.4
SELECT (
DATE_PART('day', enddatetime::timestamp - startdatetime::timestamp) * 24 +
DATE_PART('hour', enddatetime::timestamp - startdatetime::timestamp)
) * 60 +
DATE_PART('minute', enddatetime::timestamp - startdatetime::timestamp)
);
⇧ Back to TOC ◊ PostgreSQL: 12.4
SELECT (
(
(
DATE_PART('day', enddatetime::timestamp - startdatetime::timestamp) * 24 +
DATE_PART('hour', enddatetime::timestamp - startdatetime::timestamp)
) * 60 +
DATE_PART('minute', enddatetime::timestamp - startdatetime::timestamp)
) * 60 +
DATE_PART('second', enddatetime::timestamp - startdatetime::timestamp)
);
⇧ Back to TOC ◊ PostgreSQL: 12.4
SELECT
WIDTH_BUCKET(nlabels, 1, 5, 1) AS buckets,
COUNT(*)
FROM (
SELECT
claim_id AS id,
COUNT(DISTINCT label) AS nlabels
FROM
myschema.mytable
GROUP BY
claim_id) AS A
GROUP BY
buckets;
⇧ Back to TOC ◊ PostgreSQL: 13.0