7QC Tools: How to Extract More Information from the Scatter Plot?Amrendra Roy
We saw the use of scatter plot for understanding the correlation between two variables in earlier blogs.
Now let’s see what we can do to extract more information from the scatter plot. Let’s take the earlier example, X7 Vs, Y
The graph above indicates no correlation, which is also evident by the R2 value of 0.0465. However, if we look carefully, it appears that most of the observations (blue circle) is trying to show some trend but, the two points (outside the blue circle) is influencing that trend in their direction!
Now question to be asked is whether these two outlier or influential points is because of typo error or there are some special causes associated with these observations? Let’s assume that an investigation was carried out and it was found that there was no typo error but these points appeared as exception because of some special causes. What we should do now?
Since these two observation are because of some special causes hence, it is appropriate to ignore these points and re-construct the scatter plot as shown below.
The re-constructed scatter plot starts showing a trend of negative correlation between the above two variables.
Another way of extracting information is by dividing the scatter plot into four quadrant by plotting the mean of X and Y. After that we can focus on the quadrant where there is maximum concentration of observations.
Another way of analyzing scatter plot is by augmenting it by regression analysis, where we can have a quantitative equation describing the relationship between the variables. This can easily be done with excel sheet. This will be discussed in subsequent blog.