7QC Tools: Scatter Plot — Caution! Misuse of Statistics!Amrendra Roy
Let’s take the of scatter plot of X3 Vs. Y from the previous example
We can see that there is a very weak correlation between X3 Vs. Y as the scatter plot is almost horizontal to the X-axis. Now manipulate the Y-axis, instead of starting from zero, start the Y-axis from 20 and in another scatter plot start it from 37. See what happens
This is called as misuse of statistics, initially there seems to be no correlation between X3 and Y, but as you change the Y-axis, there seems to be a strong correlation, even though R2 value remains constant.
It always better to quote the R2 values along with the scatter plot.
Second issue with Scatter Plot is that it represents the correlation between the two variables which may or may not have a cause and effect relationship. What we want to say is that, the two variables are correlated by chance but in reality, they don’t affect each other. Hence, after scatter plot we need to establish the cause and effect relationship between X and Y by deliberately varying X and measuring its effect on Y. This is done more systematically with the help of Design of Experiments (DoE).