Hi Hipmunk! 👋
At Bradfield, we teach computer science to software engineers. I’m going to do my best to squeeze a 3-hour class from our databases course into 45 minutes for you! 😰
The goal of this session will be to understand how PostgreSQL’s query planner decides how to execute queries. We’ll use some sample data and queries, but they’re mostly just a conversation starter: really we’d like to use query plans as a window into understanding how this thing even works. This should hopefully help you make better use of PostgreSQL in the future.
For this exercise, let’s use the following schema:
Department(id integer, name string) Employee(id integer, dep_id integer, manager_id integer, name string, salary: integer) Bonus(id integer, emp_id integer, amount integer, date: datetime)
Here are some possible queries that we might want to play with:
- What is the total bonus amount for each employee?
- Which employees received the highest total compensation last year, and how much was it?
- What is the total bonus amount awarded by department?
- Which employees earn more salary than their managers?
- In which departments did employees earn more total compensations than their managers last year?
- What are the names of the employees who were invalidly awarded their bonuses?
(feel free to pick a subset, or invent your own… these are just ideas)
Download the file scratch.sql. We’ll use this to create our sample data and run queries. As it’s currently written, all the tables are temporary tables, meaning we can just watch for changes to the file and re-run it for a nice fast exploratory workflow.
I’d suggest that you
npm install -g wach which just watches for filesystem events and just runs the bash command you give it. So you could e.g.
wach 'clear; psql -f scratch.sql'
Now, for each query:
- Write a corresponding SQL query
- See if you can pick an ideal query plan by hand
- ANALYZE and EXPLAIN the query, and see if your query plan was picked
- Try adding indexes to see how this changes the query plan
- Advanced: Try increasing the total amount of data, or changing the distribution, and seeing how this affects the chosen plan
- Advanced: Figure out how to query and understand relevant parts of PostgreSQL’s catalog