mardi 31 mars 2015

Amazon Redshift doing Hash Join even when joined on column that is both Dist Key and Sort Key

I have a fact table in Redshift having about 1.3 Billion rows with DISTribution key c1 and sort key c1, c2.


I need to join this table with itself with a join clause on c1 (i.e. c1 from 1st instance of table = c1 from 2nd instance of table).


As I see query plan of my query, Redshift appears to be doing a Hash Join with DS_DIST_NONE. Though DS_DIST_NONE is expected as I have both dist key and sort key on the column c1, but I expected Redshift to do a Merge Join instead of Hash Join (again because of the same reason).


I believe this is slowing down my query.


Can anyone please explain as to why Redshift may be doing a Hash Join instead of Merge Join (even though I have both DIST Key and SORT key on the joining column) and Redshift is doing DS_DIST_NONE for the query?





Aucun commentaire:

Enregistrer un commentaire