## Rice Cooking and the Inferential Statistics

Every one of us must have cooked the rice in kitchens. What actually we do is to draw a sample of 4-5 grains of rice from the pot and press it between the fingers and decide whether the rice in the cooking pot is cooked or not. Isn’t it surprising? Based on the property of few rice grains we are taking decision on whole pot of rice!That’s the inferential statistics.

Same hypothesis testing is going in your mind when you are buying 5 Kg of tomatoes based on a sample of 5-6 tomatoes that you tested before you took 5 Kg of it. After reaching homes you found that some of them have to be thrown immediately.

There will always be some degree of uncertainty whenever we are using a sample to predict the property of the population.

What we are doing unknowingly is that we develop a hypothesis in our mind that the degree of cooking of the sample is equal to the degree of cooked rice in the pot, this is called as null hypothesis. Then we draw a sample from the pot for testing the hypothesis. There can be two possible outcomes,

1. Rice is well cooked, in that case we accept the null hypothesis.
2. Rice is not well cooked (it can be either undercooked or over cooked). This becomes the alternate hypothesis

Now we subject the sample of rice to test by pressing it between our fingers. This is called as test statistics. Based on the test statistics we can take following two decisions

1. There is not enough evidence to reject the null hypothesis (we accept that the rice in the pot is well cooked). Indirectly we are accepting the null hypothesis, this is because, there are chances that sample statistics is wrong (Type-II or β error)
2. Reject the null hypothesis (the rice in the pot is either overcooked or under cooked).

You might be wondering that why we can’t directly take decision to accept or reject the null hypothesis. Why we are making above indirect statement about the null hypothesis?

This is because there will always be some uncertainty in taking the correct decision. This is because hypothesis testing is performed for predicting the characteristics of the population based on the sample information and we must allow for the possibility of errors because of sampling error. These errors may occur because of sampling issues (e.g. sample was taken from the top of the pot, but at bottom the rice was burnt). There can be two kinds of errors that can be made in above hypothesis testing.

In above examples we have access to the population whether it is a pot of rice or the 5 Kg of tomatoes. Imagine the scenarios where we don’t have access to the population’s information.

1. Your office requires 500 diaries for the employees, based on the sample of 1-2 diaries provided by some of the vendors, order is released to one of them.
2. You are in production planning and need to place an order for a raw material for the entire month of the production. The QC department approved 2-3 test samples which your vendor has provided to you. Based on this 2-3 samples you released the purchase order for the entire month to that vendor.
3. A lot of 50000 LED has reached your warehouse to be used latter in one of the electronic gadget. Have you ever wondered that by testing few of those LED by the QC would result in acceptance or rejection of the whole lot!

Summary:

Statistics provide us a handle that enables us to take decisions in presence of uncertainty.

Hence in order to deduce inferences from data one should take into account not only the mean, but also the variance embedded in the system. Six-sigma help us in reducing known source of variations thereby reducing the margin of error so that we come close to the population parameters.

Is this information useful to you?

## Concept of Process Robustness — 1

Before we understand the concept of robustness, we must understand “How Variance Gets Transmitted To The Out-put (y)?”

Let’s have three scenarios where we are measuring the effect of factor A on the response.

We can see that as we increase the factor A there is a increase in response Y in all cases. Also in all three cases factor-A got fluctuated by an amount = A2-A1.

Case-1: A small fluctuation in  factor A (denoted by A1 to A2) which could be possible at commercial scale can result in ΔY1 fluctuation in response. This ΔY1 fluctuation is depends on the slope of response. Since the slope in case-1 is least, therefore fluctuation in ΔY1 because of the factor-A fluctuation is least.

As the slope increases from case-1 to case-2, the fluctuation (variance) that is transmitted to response Y (denoted by ΔY2) increases, even though the variation in factor A remains constant (A2-A1).

Matter becomes worst as the slope further increases (case-3).

If we are looking for a robust process during commercialization, we should focus not only on the output of the DoE but also the slope of the response.

Suppose we are optimizing a reaction at two different temperatures as shown below

Even though temperature-1 can give me better yield if we keep on increasing A, I would prefer to take a hit on yield and commercialize the process at temperature-2 because the slope of the response is negligible and hence the yield of the process would be consistent at commercial scale.

Summary

1. A small change (deviation) in input can cause huge fluctuations in the response depending on the slope
2. Higher the slope of response, more is the fluctuation in response due to slight change in reaction parameter

Related topic

Is this information useful to you?

## But How Six-Sigma Tools Compresses the Variation?

In order to understand this, let’s take the following equation

Now if I ask you the value of ‘y’ for x1 = 3 and x2 = 7. The value of ‘y’ would be 38.

Point to be noted here is that “you were in a position to calculate the value of ‘y’ because you have a mathematical equation describing the relationship between ‘y’ and x1 & x2.

Similarly in six-sigma we find out the variables (x) that impact the response (y) and then we find a quantitative relationship between them. In six-sigma language we describe it as “y is a function of x1, x2,….”

For example

Time taken to reach office (y) is a function of following variables

1. When he slept last night?                                      (x1)
2. Did he had drinks last night?                                (x2)
3. When he woke-up?                                                (x2)
4. When he started from the home?                         (x4)
5. How was the traffic in the morning?                    (x5)
6. How fast he was driving?                                      (x6)
7. Which route he took?                                             (x7)

Let’s assume for the time being that x2, x4, x5 and x7 were found to be important during investigation using six-sigma[1] and the relationship between the time taken to reach office and all of the 4 factors can be described arbitrary for the time being as

Using the above equation, the response (time taken to reach office) could be optimized.

[1] This investigation is usually done using a famous methodology called as DIMAC. I hope everyone is acquainted with it. Followed by ANOVA and regression to get the equation.

Is this information useful to you?

## 6sigma is like a clamp that compresses the variability

We have seen that we can’t change the garage’s width (or customer’s specifications), the only way out is to adjust the process variability (car’s width) according to the customer’s specification. This is done by continuous improvement of the process using 6sigma tools.

6sigma tools is like a clamp where we gradually tighten (continuous improvement) the screw to compress a thing (variability in the process)!

Is this information useful to you?

## What do we mean by garage’s width = 12σ and car’s width = 6σ?

Right now we are not in a position of going into the details of the standard normal distribution hence, for the time being let’s assume that my manufacturing process is stabilized, which is represented by a symmetrical curve shown below

The main characteristic of this curve is that the 99.7% of the product would be between LCL & UCL or within ±3σ distance from the mean (μ). Only 0.3% or 3000ppm products would be beyond ±3σ or defective products. So width of the car is equivalent to the width of the process = UCL-LCL = voice of the process = VOP = 6σ = ±3σ.

Second point is that the curves never touches the x-axis à means that there will always be some probability of failure even if you move to infinity from the mean (probability can be negligible but will be there).

Now let’s overlap the above process curve with the customer’s specifications (=12σ = ±6σ) or the garage’s specifications.

We can see that there is a safety margin of 3σ on both side of the process control limits (LCL & UCL). In layman words, in order to produce a defective product, my process has to deviate by another 3σ, which has very remote possibility. Statistically ± (position of LSL & USL) from the mean would account for only ~3.4 ppm failure (don’t bother about the calculation right now, just understand the concept). For this has to happen, someone has to disturb the process deliberately. Compare this failure of 3.4 ppm at ±6σ level with 3000ppm at ±3σ level!

Even if the mean of the process deviate by ±1.5σ, there is enough margin of safety and it will not impact the quality and in regular production, this deviation of ±1.5σ is quite common.

Car Parking & Six-Sigma

What’s the big deal, let’s rebuild the garage to fit the bigger car!

How the garage/car example and the six-sigma (6σ) process are related?

Now Let’s start talking about 6sigma

Is this information useful to you?

## Now it’s important to understand the concept of sigma or the standard deviation

We have seen that we need to restrict the width of the car for a given width of the garage. This is analogous to the with of the process (voice of customer, VOP) Vs the width of the customer’s specification (voice of the customer or VOC).
The width of the process is measured in terms of standard deviation denoted by σ (sigma).

The target of the 6sigma methodology is to reduce this variance (width of the car) to such an extent that even by mistake it should not cross the customer’s specification (or should not hit the wall of the garage).

Before we work towards reducing the σ, we should know about this monster very well as we will be encountering him at every step during the 6sigma journey.

There are two very important characteristics of any data set

Location and the spread of the data set.

Location represents the point in the data set where there is maximum clustering of the data –> Mean and median.

Spread represents the variability in the data set, there will be some observations that will be above the mean and there will be some that will be below the mean. Standard deviation σ measures the average spread of the data from the mean in either direction of the mean.

Office arrival time for last 5 days with average time are given below, deviation of each observation from the mean is also captured.

Let’s calculate the average deviation

Note that sum of all positive deviations = sum of all negative deviations which indicates that the mean divided the data in two equal halves.

Sum of all deviation itself becomes zero, hence we need some other way to calculate this average deviation about the mean.

In order to circumvent the issue, a very simple idea was used

Square of negative number → positive number → square root of this number → ±parent number

Hence square of all the deviations are calculated and summed-up to give sum of squares (simply SS) [1]. This SS is then divided by total number of observations to give average variance around the mean.[2] The square root of this variance gives standard deviation s, the most common measure of variability.

What it typically means that “on an average data is 7.42 units (= 1 standard deviation ±1σ) in either direction of the mean in the given data set. Mean of the data set is at ZERO standard deviation.

If process a stabilized and normally distributed then following holds true

i.e. 99.7 % of the observation in the data set would be between ±3σ.

Now we can understand whey we have taken 12σ as the width of the garage and 6σ as the width of the car!

The concept of ‘σ’ is the most important concept in understanding 6sigma. If we can understand it, downstream we wouldn’t be having any problem in understanding other topics. At this moment one important point to be noted here is that the calculation of σ depend on the type of data or data distribution we are handling.

Calculation of mean and σ would be different depending on whether we are dealing with normal distribution, binomial distribution, Poisson distribution etc. The importance of this would be realized when we would be studying the various types of control charts. At that time we just have to remember that “we must calculate mean and σ according to the distribution”.

[1] Popularly known as sum of squares, this most widely term used in ANOVA and Regression analysis

[2] SS divided by its degree of freedom → mean sum of squares or MSE, these concepts would appear in ANOVA & Regression analysis.

Is this information useful to you?

## Now Let’s start talking about 6sigma

We have seen that variation is a part of life, we need to learn to live with it. At most we can make an effort to reduce it by using 6sigma tools.

This happens because you can’t control everything involved in any process. There are some uncontrollable factors known as “common causes” in six-sigma. For example

1. You are producing some part to be used in automobiles, there will be a variation in product specification as there will be wear and tear of machines, change of operators etc.

If we repeat any process 100 times, all product/output of the process would not have same specifications, it might happen all 100 are within the desired specification. If we plot a histogram of the product specification from a stabilized process, it would look like

We can see that maximum products would be clustered around the mean and as we move away from the mean, number of products decreases.

Width of the customer’s specification is analogous to garage’s width and the process variation is analogous to car’s width. If you don’t have proper control on your process (driving) you are going to crash your process (car) against the customer’s specifications (garage walls).

Now I feel that everyone agrees that variation is a part of life and we need to learn to cope with it. The only thing we can do is to minimize it by using some proven methodology so that whatever we are producing (product or services) should always meets customer’s specifications or should have enough safety margin. This proven methodology of reducing variability is called as 6sigma.

Car Parking & Six-Sigma

What’s the big deal, let’s rebuild the garage to fit the bigger car!

How the garage/car example and the six-sigma (6σ) process are related?

What do we mean by garage’s width = 12σ and car’s width = 6σ?

Is this information useful to you?

## How the garage/car example and the six-sigma (6σ) process are related?

Let the width of the garage (D) and that of the car (d) is measured in some units called as sigma or σ, further

Width of the garage is sacrosanct = 12σ (assume for the time being), then following three cases can occur depending on the ratio of D/d = Cp.

The process sigma level is when the Cp =2, this is represented by case ‘C’ given below. Point to be noted is that there is a margin of safety (=3σ) on both side of the car before car touches the garage. The ideal width of car is taken as 6σ (don’t ask why, right now!)

Process capability Cpk: is measured in the terms of the σ distance between the center of the of the car (C1) and the wall of the garage. Cpk tell us “how far is the car from left wall or the right wall of the garage (or customer’s specifications).

Hence, the are two Cpk vales,

Cpk is given by

Car Parking & Six-Sigma

What’s the big deal, let’s rebuild the garage to fit the bigger car!

What do we mean by garage’s width = 12σ and car’s width = 6σ?

Now Let’s start talking about 6sigma