Robert Ulejczyk
Limit (cost=251.68..334.94 rows=10 width=1634)
-> Nested Loop (cost=168.43..5938.23 rows=693 width=1634)
-> Bitmap Heap Scan on users (cost=168.00..2804.95 rows=711 width=1634)
Recheck Cond: ((lower((username)::text) ~~ 'truerob'::text)
OR ((email)::text ~~ 'truerob'::text))
Filter: ((lower((username)::text) ~~ 'truerob'::text)
OR ((email)::text ~~ 'truerob'::text))
-> BitmapOr (cost=168.00..168.00 rows=711 width=0)
-> Bitmap Index Scan on index_users_lowercase_username
(cost=0.00..4.44 rows=1 width=0)
Index Cond: (lower((username)::text) = 'truerob'::text)
-> Bitmap Index Scan on index_users_email_text_pattern_ops
(cost=0.00..4.56 rows=1 width=0)
Index Cond: ((email)::text = 'truerob'::text)
-> Index Only Scan using index_profiles_on_user_id on profiles
(cost=0.43..4.40 rows=1 width=4)
Index Cond: (user_id = users.id)
world=# explain select * from city inner join country on country.code = city.countrycode;
QUERY PLAN
-----------------------------------------------------------------------
Hash Join (cost=10.38..173.65 rows=4180 width=143)
Hash Cond: (city.countrycode = country.code)
-> Seq Scan on city (cost=0.00..105.80 rows=4180 width=30)
-> Hash (cost=7.39..7.39 rows=239 width=113)
-> Seq Scan on country (cost=0.00..7.39 rows=239 width=113)
world=# explain (costs false) select * from city ci inner join country co
on co.code = ci.countrycode inner join countrylanguage l on co.code = l.countrycode;
QUERY PLAN
----------------------------------------------------
Hash Join
Hash Cond: (ci.countrycode = co.code)
-> Seq Scan on city ci
-> Hash
-> Hash Join
Hash Cond: (l.countrycode = co.code)
-> Seq Scan on countrylanguage l
-> Hash
-> Seq Scan on country co
Format in YAML if text is not readable.
world=# explain (format yaml, costs false) select * from city
inner join country on country.code = city.countrycode;
QUERY PLAN
----------------------------------------------------
- Plan:
Node Type: "Hash Join"
Join Type: "Inner"
Hash Cond: "(city.countrycode = country.code)"
Plans:
- Node Type: "Seq Scan"
Parent Relationship: "Outer"
Relation Name: "city"
Alias: "city"
- Node Type: "Hash"
Parent Relationship: "Inner"
Plans:
- Node Type: "Seq Scan"
Parent Relationship: "Outer"
Relation Name: "country"
Alias: "country"
What if I have no join but Postgres plan shows join?
world=# explain (costs false) select * from city, country
where city.countrycode = country.code;
------------------------------------------------
Hash Join
Hash Cond: (city.countrycode = country.code)
-> Seq Scan on city
-> Hash
-> Seq Scan on country
(5 rows)
Technically this is a join.
Postgres knows better.
Accept it and move on.
Countries:
@countries = [
{ code: "NLD", name: "Netherlands" },
{ code: "ALB", name: "Albania" },
{ code: "AND", name: "Andorra" },
{ code: "BEL", name: "Belgium" },
{ code: "GER", name: "Germany" }, # No match
{ code: "BIH", name: "Bosnia and Herzegovina" }
]
Cities:
@cities = [
{ country_code: "ALB", name: "Tirana" },
{ country_code: "AND", name: "Andorra la Vella" },
{ country_code: "BEL", name: "Schaerbeek" },
{ country_code: "BEL", name: "Brugge" },
{ country_code: "BIH", name: "Zenica" },
{ country_code: "BIH", name: "Banja Luka" },
{ country_code: "BIH", name: "Sarajevo" },
{ country_code: "NLD", name: "Alkmaar" },
{ country_code: "NLD", name: "Ede" },
{ country_code: "POL", name: "Krakow" } # No match
]
Cities index on code:
@cities_code_index = @cities.inject({}) do |memo, city|
memo[city[:country_code]] ||= []
memo[city[:country_code]] << @cities.index(city)
memo
end
Cities index on name:
@cities_name_index = @cities.inject({}) do |memo, city|
memo[city[:name]] ||= []
memo[city[:name]] << @cities.index(city)
memo
end
SELECT * FROM city WHERE country_code = 'BEL';
Three methods are supported:
Reads the data directly from the disk page by page
results = []
i = 0
@cities.each do |city|
i = i + 1
results << city if city[:country_code] == 'BEL'
end
Reads the index directly from the disk sequentially, then reads individual records from the heap using random access
results = []
i = 0
@cities_code_index['BEL'].each do |row_position|
i = i + 1
results << @cities[row_position]
end
Reads whole index, builds a bitmap, sorts it and then reads the heap
SELECT * FROM city
WHERE country_code = 'BEL' OR name = 'Tirana' OR name = 'Brugge';
name_bitmap_tirana = @cities_name_index['Tirana']
name_bitmap_brugge = @cities_name_index['Brugge']
code_bitmap_bel = @cities_code_index['BEL']
merged_bitmap = name_bitmap_tirana | name_bitmap_brugge | code_bitmap_bel
sorted_bitmap = merged_bitmap.sort
sorted_bitmap.each do |row_position|
i = i + 1
results << @cities[row_position]
end
EXPLAIN SELECT * FROM countrylanguage WHERE language = '--language--';
But first create an index
CREATE INDEX idx_countrylanguage_language ON
countrylanguage (language text_pattern_ops);
ANALYZE;
Capture the explain results
CREATE OR REPLACE FUNCTION parse_explain(IN query text,
-- http://stackoverflow.com/questions/7682102/putting-explain-results-into-a-table
OUT startup numeric,
OUT totalcost numeric,
OUT planrows numeric,
OUT planwidth numeric,
OUT type text)
AS
$BODY$
DECLARE
query_explain text;
explanation xml;
nsarray text[][];
BEGIN
nsarray := ARRAY[ARRAY['x', 'http://www.postgresql.org/2009/explain']];
query_explain :=e'EXPLAIN(FORMAT XML) ' || query;
EXECUTE query_explain INTO explanation;
startup := (xpath('/x:explain/x:Query/x:Plan/x:Startup-Cost/text()', explanation, nsarray))[1];
totalcost := (xpath('/x:explain/x:Query/x:Plan/x:Total-Cost/text()', explanation, nsarray))[1];
planrows := (xpath('/x:explain/x:Query/x:Plan/x:Plan-Rows/text()', explanation, nsarray))[1];
planwidth := (xpath('/x:explain/x:Query/x:Plan/x:Plan-Width/text()', explanation, nsarray))[1];
type := (xpath('/x:explain/x:Query/x:Plan/x:Node-Type/text()', explanation, nsarray))[1];
RETURN;
END;
$BODY$
LANGUAGE plpgsql;
Run it for all countries
with langs as (
select count(*), language from countrylanguage group by 2 order by 1 desc
),
queries as (
select sub.planrows as plan_count, sub.type, sub.totalcost, langs.language, count as real_count
from langs
inner join lateral ( select totalcost, langs.language, planrows, type
from parse_explain(
concat(E'select * from countrylanguage where language = \'', langs.language, E'\''))
) sub
on langs.language = sub.language
)
select language, real_count, plan_count, totalcost, type from queries;
language | real_count | plan_count | totalcost | type
-----------------+------------+------------+-----------+------------------
English | 60 | 60 | 11.49 | Bitmap Heap Scan
Arabic | 33 | 33 | 10.94 | Bitmap Heap Scan
Spanish | 28 | 28 | 10.84 | Bitmap Heap Scan
French | 25 | 25 | 10.78 | Bitmap Heap Scan
-- snap -- -- snap -- -- snap --
Wolof | 3 | 3 | 9.97 | Bitmap Heap Scan
Shona | 3 | 3 | 9.97 | Bitmap Heap Scan
Gurma | 3 | 3 | 9.97 | Bitmap Heap Scan
Hebrew | 2 | 1 | 8.29 | Index Scan
Arawakan | 2 | 1 | 8.29 | Index Scan
Teke | 2 | 1 | 8.29 | Index Scan
Papuan Languages | 2 | 1 | 8.29 | Index Scan
Will use only inner join as an example.
SELECT country.code FROM country
INNER JOIN city ON country.code = city.countrycode;
Three methods are supported:
For each row r in R do
For each row s in S do
If r and s satisfy the join condition
Then output the row
results = []
i = 0
@countries.each do |country|
@cities.each do |city|
i = i + 1
results << country if city[:country_code] == country[:code]
end
end
["NLD", "NLD", "ALB", "AND", "BEL", "BEL", "BIH", "BIH", "BIH"]
for each row R1 in the build table
begin
calculate hash value on R1 join key(s)
insert R1 into the appropriate hash bucket
end
for each row R2 in the probe table
begin
calculate hash value on R2 join key(s)
for each row R1 in the corresponding hash bucket
if R1 joins with R2
return (R1, R2)
end
results = []
i = 0
temp_hash = @cities.inject(Hash.new([])) do |memo, city|
memo[city[:country_code]] = memo[city[:country_code]] + [city]
memo
end
@countries.each do |country|
temp_hash[country[:code]].each do
i = i + 1
results << country
end
end
["NLD", "NLD", "ALB", "AND", "BEL", "BEL", "BIH", "BIH", "BIH"]
get first row R1 from input 1
get first row R2 from input 2
while not at the end of either input
begin
if R1 joins with R2
begin
return (R1, R2)
get next row R2 from input 2
end
else if R1 < R2
get next row R1 from input 1
else
get next row R2 from input 2
end
results = []
i = 0
sorted_countries = @countries.sort_by {|country| country[:code]}
sorted_cities = @cities.sort_by {|city| city[:country_code]}
x, y = 0, 0
loop do
i = i + 1
country = sorted_countries[x]
city = sorted_cities[y]
if country[:code] == city[:country_code]
results << country
y = y + 1
elsif country[:code] < city[:country_code]
x = x + 1
elsif country[:code] > city[:country_code]
y = y + 1
end
break if (sorted_countries[x].nil? || sorted_cities[y].nil?)
end
["ALB", "AND", "BEL", "BEL", "BIH", "BIH", "BIH", "NLD", "NLD"]
Aggregate function computes a single result from multiple rows
clear the current aggregate results
clear the current group by columns
for each input row
begin
if the input row does not match the current group by columns
begin
output the aggregate results
clear the current aggregate results
set the current group by columns to the input row
end
update the aggregate results with the input row
end
results = []
i = 0
sorted_cities = @cities.sort_by {|city| city[:country_code]}
current_aggregate = {}
sorted_cities.each do |row|
i = i + 1
if (current_aggregate[:country_code] != row[:country_code])
results << current_aggregate unless current_aggregate[:count].nil?
current_aggregate = {}
end
current_aggregate[:country_code] = row[:country_code]
current_aggregate[:count] = current_aggregate[:count].to_i + 1
end
results << current_aggregate
Window function performs a calculation across a set of table rows that are somehow related to the current row. Unlike regular aggregate functions, use of a window function does not cause rows to become grouped into a single output row — the rows retain their separate identities.
Out of scope, sorry!
Query Optimizer decides on Scan Method, Join Method and Join Order.
Postgres does not provide syntax for giving hints to it.
But some sort of control is still possible.
There are no statistics available for expressions so Postgres assumes default estimations.
world=# explain analyze select * from city where countrycode = 'POL';
QUERY PLAN
-----------------------------------------------------------------------------------
Index Scan using idx_city_countrycode on city (cost=0.28..22.87 rows=44 width=31)
(actual time=0.031..0.044 rows=44 loops=1)
Index Cond: (countrycode = 'POL'::bpchar)
Total runtime: 0.075 ms
(3 rows)
world=# explain analyze select * from city where countrycode = concat('POL');
QUERY PLAN
-----------------------------------------------------------------------------------
Seq Scan on city (cost=0.00..135.38 rows=20 width=31)
(actual time=6.926..9.500 rows=44 loops=1)
Filter: ((countrycode)::text = concat('POL'))
Rows Removed by Filter: 4035
Total runtime: 9.546 ms
(4 rows)
We know there is only one row matching the condition so we give a hint to fetch only one row.
world=# explain analyze select * from city where countrycode = concat('POL');
-----------------------------------------------------------------------------
Seq Scan on city (cost=0.00..135.38 rows=20 width=31)
(actual time=7.154..8.968 rows=44 loops=1)
Filter: ((countrycode)::text = concat('POL'))
Rows Removed by Filter: 4035
Total runtime: 9.011 ms
world=# explain analyze select * from city where countrycode = concat('POL') limit 1;
-------------------------------------------------------------------------------------
Limit (cost=0.00..6.77 rows=1 width=31) (actual time=7.254..7.255 rows=1 loops=1)
-> Seq Scan on city (cost=0.00..135.38 rows=20 width=31)
(actual time=7.252..7.252 rows=1 loops=1)
Filter: ((countrycode)::text = concat('POL'))
Rows Removed by Filter: 2994
Total runtime: 7.297 ms
You can use CTE, Query Optimizer won't inline WITH queries.
It's a feature, not a bug.
Details:
http://blog.2ndquadrant.com/postgresql-ctes-are-optimization-fences/