SQL plans explained with Ruby

Robert Ulejczyk


Limit  (cost=251.68..334.94 rows=10 width=1634)
  ->  Nested Loop  (cost=168.43..5938.23 rows=693 width=1634)
    ->  Bitmap Heap Scan on users  (cost=168.00..2804.95 rows=711 width=1634)
      Recheck Cond: ((lower((username)::text) ~~ 'truerob'::text)
                      OR ((email)::text ~~ 'truerob'::text))
      Filter: ((lower((username)::text) ~~ 'truerob'::text)
                OR ((email)::text ~~ 'truerob'::text))
      ->  BitmapOr  (cost=168.00..168.00 rows=711 width=0)
        ->  Bitmap Index Scan on index_users_lowercase_username
                                             (cost=0.00..4.44 rows=1 width=0)
          Index Cond: (lower((username)::text) = 'truerob'::text)
        ->  Bitmap Index Scan on index_users_email_text_pattern_ops
                                             (cost=0.00..4.56 rows=1 width=0)
          Index Cond: ((email)::text = 'truerob'::text)
    ->  Index Only Scan using index_profiles_on_user_id on profiles
                                             (cost=0.43..4.40 rows=1 width=4)
          Index Cond: (user_id = users.id)
        

Reading the plan

  • plan is evaulated from bottom to top
  • nesting represents dependencies
  • -> implies action
  • lack of -> implies additional information
  • very top line is an action too, but has no ->

Reading the plan


world=# explain select * from city inner join country on country.code = city.countrycode;
                              QUERY PLAN                               
-----------------------------------------------------------------------
 Hash Join  (cost=10.38..173.65 rows=4180 width=143)
   Hash Cond: (city.countrycode = country.code)
   ->  Seq Scan on city  (cost=0.00..105.80 rows=4180 width=30)
   ->  Hash  (cost=7.39..7.39 rows=239 width=113)
         ->  Seq Scan on country  (cost=0.00..7.39 rows=239 width=113)
          
  • scan whole country table
  • build a hash of country table
  • scan city table
  • use hash join to find correct rows

world=# explain (costs false) select * from city ci inner join country co
on co.code = ci.countrycode inner join countrylanguage l on co.code = l.countrycode;

                     QUERY PLAN                     
----------------------------------------------------
 Hash Join
   Hash Cond: (ci.countrycode = co.code)
   ->  Seq Scan on city ci
   ->  Hash
         ->  Hash Join
               Hash Cond: (l.countrycode = co.code)
               ->  Seq Scan on countrylanguage l
               ->  Hash
                     ->  Seq Scan on country co
          
  • scan country table
  • build a hash of country table
  • scan countrylanguate table
  • use hash join to find correct rows
  • build a has of rows from previous join
  • scan city table
  • use hash join to find correct rows

Format in YAML if text is not readable.


world=# explain (format yaml, costs false) select * from city
        inner join country on country.code = city.countrycode;
                     QUERY PLAN                     
----------------------------------------------------
 - Plan:                                           
     Node Type: "Hash Join"                        
     Join Type: "Inner"                            
     Hash Cond: "(city.countrycode = country.code)"
     Plans:                                        
       - Node Type: "Seq Scan"                     
         Parent Relationship: "Outer"              
         Relation Name: "city"                     
         Alias: "city"                             
       - Node Type: "Hash"                         
         Parent Relationship: "Inner"              
         Plans:                                    
           - Node Type: "Seq Scan"                 
             Parent Relationship: "Outer"          
             Relation Name: "country"              
             Alias: "country"
          

What if I have no join but Postgres plan shows join?


world=# explain (costs false) select * from city, country
        where city.countrycode = country.code;

------------------------------------------------
 Hash Join
   Hash Cond: (city.countrycode = country.code)
   ->  Seq Scan on city
   ->  Hash
         ->  Seq Scan on country
(5 rows)
          

Technically this is a join.


Postgres knows better.


Accept it and move on.

Basic structure

Countries:


@countries = [
  { code: "NLD", name: "Netherlands" },
  { code: "ALB", name: "Albania" },
  { code: "AND", name: "Andorra" },
  { code: "BEL", name: "Belgium" },
  { code: "GER", name: "Germany" },                 # No match
  { code: "BIH", name: "Bosnia and Herzegovina" }
]
          

Basic structure

Cities:


@cities = [
  { country_code: "ALB", name: "Tirana" },
  { country_code: "AND", name: "Andorra la Vella" },
  { country_code: "BEL", name: "Schaerbeek" },
  { country_code: "BEL", name: "Brugge" },
  { country_code: "BIH", name: "Zenica" },
  { country_code: "BIH", name: "Banja Luka" },
  { country_code: "BIH", name: "Sarajevo" },
  { country_code: "NLD", name: "Alkmaar" },
  { country_code: "NLD", name: "Ede" },
  { country_code: "POL", name: "Krakow" }            # No match
]
          

Basic structure

Cities index on code:


@cities_code_index = @cities.inject({}) do |memo, city|
  memo[city[:country_code]] ||= []
  memo[city[:country_code]] << @cities.index(city)
  memo
end
          

Basic structure

Cities index on name:


@cities_name_index = @cities.inject({}) do |memo, city|
  memo[city[:name]] ||= []
  memo[city[:name]] << @cities.index(city)
  memo
end
          

SCAN METHODS


SELECT * FROM city WHERE country_code = 'BEL';
          

SCAN METHODS

Three methods are supported:

  • Sequential scan
  • Index scan
  • Bitmap Index scan

Sequential scan

Reads the data directly from the disk page by page

  • single page contains multiple records
  • pages are read sequentially - fastest possible way
  • no logical ordering of the data
  • optimal for reading large parts of a table
  • always works

Sequential scan


results = []
i = 0

@cities.each do |city|
  i = i + 1

  results << city if city[:country_code] == 'BEL'
end
          

Index scan

Reads the index directly from the disk sequentially, then reads individual records from the heap using random access

  • index is read sequentially - fastest possible way
  • data pages (heap) are read in random order (as per index) - slowest possible way
  • heap page is read immediately
  • guarantees order and can be read backward (at no extra cost)
  • can read the same heap page multiple times
  • optimal for reading small parts of a table

Index scan


results = []
i = 0

@cities_code_index['BEL'].each do |row_position|
  i = i + 1

  results << @cities[row_position]
end
          

Bitmap Index scan

Reads whole index, builds a bitmap, sorts it and then reads the heap

  • requires to load whole index to memory
  • optimize heap reads by sorting the bitmap - ensure a page is read at most one time
  • heap is read using random order
  • most of the time requires additional step - index heap scan to filter proper rows
  • optimal for reading small (but larger than Index scan) parts of the table
  • allows using multiple indexes
  • results are unordered
  • inefficient when ordering is necessary

Bitmap Index scan


SELECT * FROM city
WHERE country_code = 'BEL' OR name = 'Tirana' OR name = 'Brugge';
          

name_bitmap_tirana = @cities_name_index['Tirana']
name_bitmap_brugge = @cities_name_index['Brugge']
code_bitmap_bel    = @cities_code_index['BEL']

merged_bitmap = name_bitmap_tirana | name_bitmap_brugge | code_bitmap_bel
sorted_bitmap = merged_bitmap.sort

sorted_bitmap.each do |row_position|
  i = i + 1
  results << @cities[row_position]
end
          

Is it all true?


EXPLAIN SELECT * FROM countrylanguage WHERE language = '--language--';
          

But first create an index


CREATE INDEX idx_countrylanguage_language ON
             countrylanguage (language text_pattern_ops);
ANALYZE;
          

Capture the explain results


CREATE OR REPLACE FUNCTION parse_explain(IN query text, 
-- http://stackoverflow.com/questions/7682102/putting-explain-results-into-a-table
OUT startup numeric,
OUT totalcost numeric, 
OUT planrows numeric, 
OUT planwidth numeric,
OUT type text)
AS
$BODY$
DECLARE
    query_explain  text;
    explanation       xml;
    nsarray text[][];
BEGIN
nsarray := ARRAY[ARRAY['x', 'http://www.postgresql.org/2009/explain']];
query_explain :=e'EXPLAIN(FORMAT XML) ' || query;
EXECUTE query_explain INTO explanation;
startup := (xpath('/x:explain/x:Query/x:Plan/x:Startup-Cost/text()', explanation, nsarray))[1];
totalcost := (xpath('/x:explain/x:Query/x:Plan/x:Total-Cost/text()', explanation, nsarray))[1];
planrows := (xpath('/x:explain/x:Query/x:Plan/x:Plan-Rows/text()', explanation, nsarray))[1];
planwidth := (xpath('/x:explain/x:Query/x:Plan/x:Plan-Width/text()', explanation, nsarray))[1];
type      := (xpath('/x:explain/x:Query/x:Plan/x:Node-Type/text()', explanation, nsarray))[1];
RETURN;
END;
$BODY$
LANGUAGE plpgsql;
          

Run it for all countries


with langs as (
  select count(*), language from countrylanguage group by 2 order by 1 desc
),
queries as (
  select sub.planrows as plan_count, sub.type, sub.totalcost, langs.language, count as real_count
  from langs
  inner join lateral ( select totalcost, langs.language, planrows, type
                       from parse_explain(
                       concat(E'select * from countrylanguage where language = \'', langs.language, E'\''))
                     ) sub
  on langs.language = sub.language
)
select language, real_count, plan_count, totalcost, type from queries;
          

        language | real_count | plan_count | totalcost |       type       
-----------------+------------+------------+-----------+------------------
English          |         60 |         60 |     11.49 | Bitmap Heap Scan
Arabic           |         33 |         33 |     10.94 | Bitmap Heap Scan
Spanish          |         28 |         28 |     10.84 | Bitmap Heap Scan
French           |         25 |         25 |     10.78 | Bitmap Heap Scan
   -- snap --         -- snap --                  -- snap --
Wolof            |          3 |          3 |      9.97 | Bitmap Heap Scan
Shona            |          3 |          3 |      9.97 | Bitmap Heap Scan
Gurma            |          3 |          3 |      9.97 | Bitmap Heap Scan
Hebrew           |          2 |          1 |      8.29 | Index Scan
Arawakan         |          2 |          1 |      8.29 | Index Scan
Teke             |          2 |          1 |      8.29 | Index Scan
Papuan Languages |          2 |          1 |      8.29 | Index Scan
          

JOINs

Will use only inner join as an example.


SELECT country.code FROM country
INNER JOIN city ON country.code = city.countrycode;
          

JOINs

Three methods are supported:

  • Nested Loop
  • Hash Join
  • Merge Join

Nested loop


For each row r in R do
   For each row s in S do
      If r and s satisfy the join condition
         Then output the row
          

Nested loop

  • has no upstart cost - super efficient for few rows
  • can get very expensive quickly - number of iterations is product of both tables
  • explain analyze and pay attention to estimated vs actual rows

Nested loop


results = []
i = 0

@countries.each do |country|
  @cities.each do |city|
    i = i + 1
    results << country if city[:country_code] == country[:code]
  end
end
          

["NLD", "NLD", "ALB", "AND", "BEL", "BEL", "BIH", "BIH", "BIH"]
          

Hash Join


for each row R1 in the build table
    begin
        calculate hash value on R1 join key(s)
        insert R1 into the appropriate hash bucket
    end
for each row R2 in the probe table
    begin
        calculate hash value on R2 join key(s)
        for each row R1 in the corresponding hash bucket
            if R1 joins with R2
                return (R1, R2)
    end
          

Hash Join

  • requires building a hash structure in advance (time and memory) - upstart cost
  • the less key columns the better
  • will always use smaller table to build a hash
  • support only equality (==) operator

Hash Join


results = []
i = 0

temp_hash = @cities.inject(Hash.new([])) do |memo, city|
  memo[city[:country_code]] = memo[city[:country_code]] + [city]
  memo
end

@countries.each do |country|
  temp_hash[country[:code]].each do
    i = i + 1
    results << country
  end
end
          

["NLD", "NLD", "ALB", "AND", "BEL", "BEL", "BIH", "BIH", "BIH"]
          

Merge Join


get first row R1 from input 1
get first row R2 from input 2

while not at the end of either input
    begin
        if R1 joins with R2
            begin
                return (R1, R2)
                get next row R2 from input 2
            end
        else if R1 < R2
            get next row R1 from input 1
        else
            get next row R2 from input 2
    end
          

Merge Join

  • at most sum of rows in both tables is read
  • input tables have to be sorted - can use indexes
  • does no support inequality (!=) operator

Merge Join


results = []
i = 0

sorted_countries = @countries.sort_by {|country| country[:code]}
sorted_cities    = @cities.sort_by {|city| city[:country_code]}

x, y = 0, 0

loop do
  i = i + 1
  country = sorted_countries[x]
  city    = sorted_cities[y]

  if country[:code] == city[:country_code]
    results << country
    y = y + 1
  elsif country[:code] < city[:country_code]
    x = x + 1
  elsif country[:code] > city[:country_code]
    y = y + 1
  end

  break if (sorted_countries[x].nil? || sorted_cities[y].nil?)
end
          

["ALB", "AND", "BEL", "BEL", "BIH", "BIH", "BIH", "NLD", "NLD"]
          

AGGREGATES

Aggregate function computes a single result from multiple rows


  • require sorted input by all columns (order doesn't matter)
  • WHERE filters rows before aggregates are computed
  • HAVING filters rows after aggregates are computed

AGGREGATES


clear the current aggregate results
clear the current group by columns
for each input row
  begin
    if the input row does not match the current group by columns
      begin
        output the aggregate results
        clear the current aggregate results
        set the current group by columns to the input row
      end
    update the aggregate results with the input row
  end

          

AGGREGATES


results = []
i = 0

sorted_cities     = @cities.sort_by {|city| city[:country_code]}
current_aggregate = {}

sorted_cities.each do |row|
  i = i + 1

  if (current_aggregate[:country_code] != row[:country_code])
    results << current_aggregate unless current_aggregate[:count].nil?
    current_aggregate = {}
  end

  current_aggregate[:country_code] = row[:country_code]
  current_aggregate[:count]        = current_aggregate[:count].to_i + 1
end

results << current_aggregate
          

WINDOW FUNCTIONS

Window function performs a calculation across a set of table rows that are somehow related to the current row. Unlike regular aggregate functions, use of a window function does not cause rows to become grouped into a single output row — the rows retain their separate identities.


Out of scope, sorry!

Hints (cheats)

Query Optimizer decides on Scan Method, Join Method and Join Order.

Postgres does not provide syntax for giving hints to it.

But some sort of control is still possible.

Hints - prevent using an index

There are no statistics available for expressions so Postgres assumes default estimations.


world=# explain analyze select * from city where countrycode = 'POL';
                                                          QUERY PLAN                                                           
-----------------------------------------------------------------------------------
 Index Scan using idx_city_countrycode on city  (cost=0.28..22.87 rows=44 width=31)
                                         (actual time=0.031..0.044 rows=44 loops=1)

   Index Cond: (countrycode = 'POL'::bpchar)
 Total runtime: 0.075 ms
(3 rows)
          

world=# explain analyze select * from city where countrycode = concat('POL');
                                                           QUERY PLAN                                                           
-----------------------------------------------------------------------------------
 Seq Scan on city  (cost=0.00..135.38 rows=20 width=31)
            (actual time=6.926..9.500 rows=44 loops=1)

   Filter: ((countrycode)::text = concat('POL'))
   Rows Removed by Filter: 4035
 Total runtime: 9.546 ms
(4 rows)
          

Hints - suggest number of rows

We know there is only one row matching the condition so we give a hint to fetch only one row.


world=# explain analyze select * from city where countrycode = concat('POL');
-----------------------------------------------------------------------------
 Seq Scan on city  (cost=0.00..135.38 rows=20 width=31)
             (actual time=7.154..8.968 rows=44 loops=1)

   Filter: ((countrycode)::text = concat('POL'))
   Rows Removed by Filter: 4035
 Total runtime: 9.011 ms
          

world=# explain analyze select * from city where countrycode = concat('POL') limit 1;
-------------------------------------------------------------------------------------
 Limit  (cost=0.00..6.77 rows=1 width=31) (actual time=7.254..7.255 rows=1 loops=1)
   ->  Seq Scan on city  (cost=0.00..135.38 rows=20 width=31)
                    (actual time=7.252..7.252 rows=1 loops=1)
         Filter: ((countrycode)::text = concat('POL'))
         Rows Removed by Filter: 2994
 Total runtime: 7.297 ms
          

Hints - all your base

You can use CTE, Query Optimizer won't inline WITH queries.


It's a feature, not a bug.


Details:

http://blog.2ndquadrant.com/postgresql-ctes-are-optimization-fences/

Thank you