SQL plans explained with Ruby

Robert Ulejczyk


Limit  (cost=251.68..334.94 rows=10 width=1634)
  ->  Nested Loop  (cost=168.43..5938.23 rows=693 width=1634)
    ->  Bitmap Heap Scan on users  (cost=168.00..2804.95 rows=711 width=1634)
      Recheck Cond: ((lower((username)::text) ~~ 'truerob'::text)
                      OR ((email)::text ~~ 'truerob'::text))
      Filter: ((lower((username)::text) ~~ 'truerob'::text)
                OR ((email)::text ~~ 'truerob'::text))
      ->  BitmapOr  (cost=168.00..168.00 rows=711 width=0)
        ->  Bitmap Index Scan on index_users_lowercase_username
                                             (cost=0.00..4.44 rows=1 width=0)
          Index Cond: (lower((username)::text) = 'truerob'::text)
        ->  Bitmap Index Scan on index_users_email_text_pattern_ops
                                             (cost=0.00..4.56 rows=1 width=0)
          Index Cond: ((email)::text = 'truerob'::text)
    ->  Index Only Scan using index_profiles_on_user_id on profiles
                                             (cost=0.43..4.40 rows=1 width=4)
          Index Cond: (user_id = users.id)

Reading the plan

plan is evaulated from bottom to top
nesting represents dependencies
-> implies action
lack of -> implies additional information
very top line is an action too, but has no ->

Reading the plan


world=# explain select * from city inner join country on country.code = city.countrycode;
                              QUERY PLAN                               
-----------------------------------------------------------------------
 Hash Join  (cost=10.38..173.65 rows=4180 width=143)
   Hash Cond: (city.countrycode = country.code)
   ->  Seq Scan on city  (cost=0.00..105.80 rows=4180 width=30)
   ->  Hash  (cost=7.39..7.39 rows=239 width=113)
         ->  Seq Scan on country  (cost=0.00..7.39 rows=239 width=113)

scan whole country table
build a hash of country table
scan city table
use hash join to find correct rows


world=# explain (costs false) select * from city ci inner join country co
on co.code = ci.countrycode inner join countrylanguage l on co.code = l.countrycode;

                     QUERY PLAN                     
----------------------------------------------------
 Hash Join
   Hash Cond: (ci.countrycode = co.code)
   ->  Seq Scan on city ci
   ->  Hash
         ->  Hash Join
               Hash Cond: (l.countrycode = co.code)
               ->  Seq Scan on countrylanguage l
               ->  Hash
                     ->  Seq Scan on country co

scan country table
build a hash of country table
scan countrylanguate table
use hash join to find correct rows
build a has of rows from previous join
scan city table
use hash join to find correct rows

Format in YAML if text is not readable.


world=# explain (format yaml, costs false) select * from city
        inner join country on country.code = city.countrycode;
                     QUERY PLAN                     
----------------------------------------------------
 - Plan:                                           
     Node Type: "Hash Join"                        
     Join Type: "Inner"                            
     Hash Cond: "(city.countrycode = country.code)"
     Plans:                                        
       - Node Type: "Seq Scan"                     
         Parent Relationship: "Outer"              
         Relation Name: "city"                     
         Alias: "city"                             
       - Node Type: "Hash"                         
         Parent Relationship: "Inner"              
         Plans:                                    
           - Node Type: "Seq Scan"                 
             Parent Relationship: "Outer"          
             Relation Name: "country"              
             Alias: "country"

What if I have no join but Postgres plan shows join?


world=# explain (costs false) select * from city, country
        where city.countrycode = country.code;

------------------------------------------------
 Hash Join
   Hash Cond: (city.countrycode = country.code)
   ->  Seq Scan on city
   ->  Hash
         ->  Seq Scan on country
(5 rows)

Technically this is a join.

Postgres knows better.

Accept it and move on.

Basic structure

Countries:


@countries = [
  { code: "NLD", name: "Netherlands" },
  { code: "ALB", name: "Albania" },
  { code: "AND", name: "Andorra" },
  { code: "BEL", name: "Belgium" },
  { code: "GER", name: "Germany" },                 # No match
  { code: "BIH", name: "Bosnia and Herzegovina" }
]

Basic structure

Cities:


@cities = [
  { country_code: "ALB", name: "Tirana" },
  { country_code: "AND", name: "Andorra la Vella" },
  { country_code: "BEL", name: "Schaerbeek" },
  { country_code: "BEL", name: "Brugge" },
  { country_code: "BIH", name: "Zenica" },
  { country_code: "BIH", name: "Banja Luka" },
  { country_code: "BIH", name: "Sarajevo" },
  { country_code: "NLD", name: "Alkmaar" },
  { country_code: "NLD", name: "Ede" },
  { country_code: "POL", name: "Krakow" }            # No match
]

Basic structure

Cities index on code:


@cities_code_index = @cities.inject({}) do |memo, city|
  memo[city[:country_code]] ||= []
  memo[city[:country_code]] << @cities.index(city)
  memo
end

Basic structure

Cities index on name:


@cities_name_index = @cities.inject({}) do |memo, city|
  memo[city[:name]] ||= []
  memo[city[:name]] << @cities.index(city)
  memo
end

SCAN METHODS


SELECT * FROM city WHERE country_code = 'BEL';

SCAN METHODS

Three methods are supported:

Sequential scan
Index scan
Bitmap Index scan

Sequential scan

Reads the data directly from the disk page by page

single page contains multiple records
pages are read sequentially - fastest possible way
no logical ordering of the data
optimal for reading large parts of a table
always works

Sequential scan


results = []
i = 0

@cities.each do |city|
  i = i + 1

  results << city if city[:country_code] == 'BEL'
end

Index scan

Reads the index directly from the disk sequentially, then reads individual records from the heap using random access

index is read sequentially - fastest possible way
data pages (heap) are read in random order (as per index) - slowest possible way
heap page is read immediately
guarantees order and can be read backward (at no extra cost)
can read the same heap page multiple times
optimal for reading small parts of a table

Index scan


results = []
i = 0

@cities_code_index['BEL'].each do |row_position|
  i = i + 1

  results << @cities[row_position]
end

Bitmap Index scan

Reads whole index, builds a bitmap, sorts it and then reads the heap

requires to load whole index to memory
optimize heap reads by sorting the bitmap - ensure a page is read at most one time
heap is read using random order
most of the time requires additional step - index heap scan to filter proper rows
optimal for reading small (but larger than Index scan) parts of the table
allows using multiple indexes
results are unordered
inefficient when ordering is necessary

Bitmap Index scan


SELECT * FROM city
WHERE country_code = 'BEL' OR name = 'Tirana' OR name = 'Brugge';


name_bitmap_tirana = @cities_name_index['Tirana']
name_bitmap_brugge = @cities_name_index['Brugge']
code_bitmap_bel    = @cities_code_index['BEL']

merged_bitmap = name_bitmap_tirana | name_bitmap_brugge | code_bitmap_bel
sorted_bitmap = merged_bitmap.sort

sorted_bitmap.each do |row_position|
  i = i + 1
  results << @cities[row_position]
end

Is it all true?


EXPLAIN SELECT * FROM countrylanguage WHERE language = '--language--';

But first create an index


CREATE INDEX idx_countrylanguage_language ON
             countrylanguage (language text_pattern_ops);
ANALYZE;

Capture the explain results


CREATE OR REPLACE FUNCTION parse_explain(IN query text, 
-- http://stackoverflow.com/questions/7682102/putting-explain-results-into-a-table
OUT startup numeric,
OUT totalcost numeric, 
OUT planrows numeric, 
OUT planwidth numeric,
OUT type text)
AS
$BODY$
DECLARE
    query_explain  text;
    explanation       xml;
    nsarray text[][];
BEGIN
nsarray := ARRAY[ARRAY['x', 'http://www.postgresql.org/2009/explain']];
query_explain :=e'EXPLAIN(FORMAT XML) ' || query;
EXECUTE query_explain INTO explanation;
startup := (xpath('/x:explain/x:Query/x:Plan/x:Startup-Cost/text()', explanation, nsarray))[1];
totalcost := (xpath('/x:explain/x:Query/x:Plan/x:Total-Cost/text()', explanation, nsarray))[1];
planrows := (xpath('/x:explain/x:Query/x:Plan/x:Plan-Rows/text()', explanation, nsarray))[1];
planwidth := (xpath('/x:explain/x:Query/x:Plan/x:Plan-Width/text()', explanation, nsarray))[1];
type      := (xpath('/x:explain/x:Query/x:Plan/x:Node-Type/text()', explanation, nsarray))[1];
RETURN;
END;
$BODY$
LANGUAGE plpgsql;

Run it for all countries


with langs as (
  select count(*), language from countrylanguage group by 2 order by 1 desc
),
queries as (
  select sub.planrows as plan_count, sub.type, sub.totalcost, langs.language, count as real_count
  from langs
  inner join lateral ( select totalcost, langs.language, planrows, type
                       from parse_explain(
                       concat(E'select * from countrylanguage where language = \'', langs.language, E'\''))
                     ) sub
  on langs.language = sub.language
)
select language, real_count, plan_count, totalcost, type from queries;


        language | real_count | plan_count | totalcost |       type       
-----------------+------------+------------+-----------+------------------
English          |         60 |         60 |     11.49 | Bitmap Heap Scan
Arabic           |         33 |         33 |     10.94 | Bitmap Heap Scan
Spanish          |         28 |         28 |     10.84 | Bitmap Heap Scan
French           |         25 |         25 |     10.78 | Bitmap Heap Scan
   -- snap --         -- snap --                  -- snap --
Wolof            |          3 |          3 |      9.97 | Bitmap Heap Scan
Shona            |          3 |          3 |      9.97 | Bitmap Heap Scan
Gurma            |          3 |          3 |      9.97 | Bitmap Heap Scan
Hebrew           |          2 |          1 |      8.29 | Index Scan
Arawakan         |          2 |          1 |      8.29 | Index Scan
Teke             |          2 |          1 |      8.29 | Index Scan
Papuan Languages |          2 |          1 |      8.29 | Index Scan

JOINs

Will use only inner join as an example.


SELECT country.code FROM country
INNER JOIN city ON country.code = city.countrycode;

JOINs

Three methods are supported:

Nested Loop
Hash Join
Merge Join

Nested loop


For each row r in R do
   For each row s in S do
      If r and s satisfy the join condition
         Then output the row

Nested loop

has no upstart cost - super efficient for few rows
can get very expensive quickly - number of iterations is product of both tables
explain analyze and pay attention to estimated vs actual rows

Nested loop


results = []
i = 0

@countries.each do |country|
  @cities.each do |city|
    i = i + 1
    results << country if city[:country_code] == country[:code]
  end
end


["NLD", "NLD", "ALB", "AND", "BEL", "BEL", "BIH", "BIH", "BIH"]

Hash Join


for each row R1 in the build table
    begin
        calculate hash value on R1 join key(s)
        insert R1 into the appropriate hash bucket
    end
for each row R2 in the probe table
    begin
        calculate hash value on R2 join key(s)
        for each row R1 in the corresponding hash bucket
            if R1 joins with R2
                return (R1, R2)
    end

Hash Join

requires building a hash structure in advance (time and memory) - upstart cost
the less key columns the better
will always use smaller table to build a hash
support only equality (==) operator

Hash Join


results = []
i = 0

temp_hash = @cities.inject(Hash.new([])) do |memo, city|
  memo[city[:country_code]] = memo[city[:country_code]] + [city]
  memo
end

@countries.each do |country|
  temp_hash[country[:code]].each do
    i = i + 1
    results << country
  end
end


["NLD", "NLD", "ALB", "AND", "BEL", "BEL", "BIH", "BIH", "BIH"]

Merge Join


get first row R1 from input 1
get first row R2 from input 2

while not at the end of either input
    begin
        if R1 joins with R2
            begin
                return (R1, R2)
                get next row R2 from input 2
            end
        else if R1 < R2
            get next row R1 from input 1
        else
            get next row R2 from input 2
    end

Merge Join

at most sum of rows in both tables is read
input tables have to be sorted - can use indexes
does no support inequality (!=) operator

Merge Join


results = []
i = 0

sorted_countries = @countries.sort_by {|country| country[:code]}
sorted_cities    = @cities.sort_by {|city| city[:country_code]}

x, y = 0, 0

loop do
  i = i + 1
  country = sorted_countries[x]
  city    = sorted_cities[y]

  if country[:code] == city[:country_code]
    results << country
    y = y + 1
  elsif country[:code] < city[:country_code]
    x = x + 1
  elsif country[:code] > city[:country_code]
    y = y + 1
  end

  break if (sorted_countries[x].nil? || sorted_cities[y].nil?)
end


["ALB", "AND", "BEL", "BEL", "BIH", "BIH", "BIH", "NLD", "NLD"]

AGGREGATES

Aggregate function computes a single result from multiple rows

require sorted input by all columns (order doesn't matter)
WHERE filters rows before aggregates are computed
HAVING filters rows after aggregates are computed

AGGREGATES


clear the current aggregate results
clear the current group by columns
for each input row
  begin
    if the input row does not match the current group by columns
      begin
        output the aggregate results
        clear the current aggregate results
        set the current group by columns to the input row
      end
    update the aggregate results with the input row
  end

AGGREGATES


results = []
i = 0

sorted_cities     = @cities.sort_by {|city| city[:country_code]}
current_aggregate = {}

sorted_cities.each do |row|
  i = i + 1

  if (current_aggregate[:country_code] != row[:country_code])
    results << current_aggregate unless current_aggregate[:count].nil?
    current_aggregate = {}
  end

  current_aggregate[:country_code] = row[:country_code]
  current_aggregate[:count]        = current_aggregate[:count].to_i + 1
end

results << current_aggregate

WINDOW FUNCTIONS

Window function performs a calculation across a set of table rows that are somehow related to the current row. Unlike regular aggregate functions, use of a window function does not cause rows to become grouped into a single output row — the rows retain their separate identities.

Out of scope, sorry!

Hints (cheats)

Query Optimizer decides on Scan Method, Join Method and Join Order.

Postgres does not provide syntax for giving hints to it.

But some sort of control is still possible.

Hints - prevent using an index

There are no statistics available for expressions so Postgres assumes default estimations.


world=# explain analyze select * from city where countrycode = 'POL';
                                                          QUERY PLAN                                                           
-----------------------------------------------------------------------------------
 Index Scan using idx_city_countrycode on city  (cost=0.28..22.87 rows=44 width=31)
                                         (actual time=0.031..0.044 rows=44 loops=1)

   Index Cond: (countrycode = 'POL'::bpchar)
 Total runtime: 0.075 ms
(3 rows)


world=# explain analyze select * from city where countrycode = concat('POL');
                                                           QUERY PLAN                                                           
-----------------------------------------------------------------------------------
 Seq Scan on city  (cost=0.00..135.38 rows=20 width=31)
            (actual time=6.926..9.500 rows=44 loops=1)

   Filter: ((countrycode)::text = concat('POL'))
   Rows Removed by Filter: 4035
 Total runtime: 9.546 ms
(4 rows)

Hints - suggest number of rows

We know there is only one row matching the condition so we give a hint to fetch only one row.


world=# explain analyze select * from city where countrycode = concat('POL');
-----------------------------------------------------------------------------
 Seq Scan on city  (cost=0.00..135.38 rows=20 width=31)
             (actual time=7.154..8.968 rows=44 loops=1)

   Filter: ((countrycode)::text = concat('POL'))
   Rows Removed by Filter: 4035
 Total runtime: 9.011 ms


world=# explain analyze select * from city where countrycode = concat('POL') limit 1;
-------------------------------------------------------------------------------------
 Limit  (cost=0.00..6.77 rows=1 width=31) (actual time=7.254..7.255 rows=1 loops=1)
   ->  Seq Scan on city  (cost=0.00..135.38 rows=20 width=31)
                    (actual time=7.252..7.252 rows=1 loops=1)
         Filter: ((countrycode)::text = concat('POL'))
         Rows Removed by Filter: 2994
 Total runtime: 7.297 ms

Hints - all your base

You can use CTE, Query Optimizer won't inline WITH queries.

It's a feature, not a bug.

Details:

http://blog.2ndquadrant.com/postgresql-ctes-are-optimization-fences/