+The optimizer relies on statistics to estimate the number of rows that will be returned by a query, allowing it to choose the most plan, best move operation to perform (i.e, Shuffle Move Operation, Broad Cast Move Operation) to align the data during the join condiation (Depends on the table distribution type). For example, if the actual number of rows for a given table is 60 million, and the estimated number of rows is 1000 (At control node level), the optimizer may choose a Broadcast move operation. This is because the cost is perceived to be lower compared to a Shuffle Move, given the optimizer's assumption that the table contains only 1000 rows. However, once the actual execution begins, the engine will move 60 million rows as part of the execution using a Broadcast move, which can be an expensive operation considering both the data size and not just the row count. Consequently, if the data size is substantial, it might lead to performance issues for the query itself and other quries (e.g., High CPU usage which will lead to overall performacne issues, concurrency issues as the queries will take longer time execute while you have new incoming queries,..etc).
0 commit comments