I have a fact table in Redshift having about 1.3 Billion rows with DISTribution key c1 and sort key c1, c2.
I need to join this table with itself with a join clause on c1 (i.e. c1 from 1st instance of table = c1 from 2nd instance of table).
As I see query plan of my query, Redshift appears to be doing a Hash Join with DS_DIST_NONE. Though DS_DIST_NONE is expected as I have both dist key and sort key on the column c1, but I expected Redshift to do a Merge Join instead of Hash Join (again because of the same reason).
I believe this is slowing down my query.
Can anyone please explain as to why Redshift may be doing a Hash Join instead of Merge Join (even though I have both DIST Key and SORT key on the joining column) and Redshift is doing DS_DIST_NONE for the query?
Aucun commentaire:
Enregistrer un commentaire