Understanding the Difference Between Long and Short Term Sigma

for postsWe have seen that the main difference between Cpk and the Ppk is the way in which the value of sigma (standard deviation) is being calculated.

In Cpk, the value of sigma comes from the control chart and usually given by the formula

Where  is the average of the absolute value of range (obtained as a difference of two consecutive points when, data is arranged in a time order). The term d2 is a statistical constant that depend on the sample size.

This sigma-short is affected by the time order to the data i.e. every time you change the time order, sigma-short would change.

Whereas, in Ppk the sigma is calculated using traditional formula and is also called as the overall sigma or sigma-long.

In this case, sigma-long is not affected by the time order of the data points. This is called as overall standard deviation.

Usually, sigma-short is less than sigma-long.

Let’s do a simulation in R to check whether sigma-short is really affected by the time order or not

 #setting the seed for reproducibility
 #load library QCC
 # Generate a normal sample of 50 data points
 # Generate a data set for storing output of the control chart, sigma-short and   sigma-long
 # Generate a blank matrix of 10 rows and 50 columns to store 10 10   random samples each having 50 data points.
sam<-matrix(nrow=10,ncol=50,byrow = TRUE)
 # Code for generating 10 random samples from the normal sample   generated as (d) above
for(i in 1:10){
sam[i,]<-sample(d,50,replace=FALSE) #generate ith sample and store in   the matrix sam.#generate I-MR chart of the ith sample.

#calculate sigma-short of the ith sample.

#calculate sigma-long of the ith sample.

#print data frame   containing sigma-short and sigma-long of all 10 sample.

Table-1: Short and long sigma generated from the same simulated data but with different time order.

sigma_short   sigma_long
1.1168596          1.09059
1.1462365          1.09059
1.1023853 1.09059
0.9902320 1.09059
1.1419678 1.09059
1.2173854 1.09059
0.9941954 1.09059
1.0408088 1.09059
1.1038588 1.09059
1.2275286 1.09059

It is evident from the simulation that sigma-short do get affected by the time order of the data. Therefore, the sigma or the standard deviation calculated from the control charts (short sigma) and the overall sigma are different.

for more on Cpk and Ppk see below links

Car Parking & Six-Sigma

What Taguchi Loss Function has to do with Cpm?

What do we mean by garage’s width = 12σ and car’s width = 6σ?


7QC Tools — The Control Charts


The Control Charts

This is the most important topic to be covered in the 7QC tools. But in order to understand it, just remember following point for the moment as right now we can’t go into the details

  1. Two things that we must understand beyond doubt are
    1. There is a customer’s specifications, LSL & USL (upper and lower specification limits)
    2. Similarly there is a process capability, LCL & UCL (upper and lower control limits)
    3. The Process capability and customer’s specifications are two independent things however, it is desired that UCL-LCL < USL-LSL. The only way we can achieve this relationship is by decreasing the variation in the process as we can’t do anything about the customer’s specifications (they are sacrosanct).
    4. Picture13
  2. If a process is stable, will follow the bell shaped curve called as normal curve. It means that, if we plot all historical data obtained from a stable process – it will give a symmetrical curve as shown below. The σ represents the standard deviation (a measurement of variation)
    • picture88
  3. The main characteristic of the above curve is shown below. Example, the area under ±2σ would contain 95% of the total data
    • picture19
  4. Any process is affected by two types of input variables or factors. Input variables which can be controlled are called as assignable or special causes (e.g., person, material, unit operation, and machine), and factors which are uncontrollable are called noise factors or common causes (e.g., fluctuation in environmental factors such as temperature and humidity during the year).
  5. From the point number 2, we can conclude that, as long as the data is within ±3σ, the process is considered stable and whatever variation is there it is because of the common causes of variation. Any data point beyond ±3σ would represent an outlier indicating that the given process has deviated or there is an assignable or a special cause of variation which, needs immediate attention.
    • picture89
  6. Measurement of mean (μ) and σ used for calculating control limits, depends on the type and the distribution of the data used for preparing control chart.

Having gone through the above points, let’s go back to the point number 2. In this graph, the entire data is plotted after all the data has been collected. But, these data were collected over a time! Now if we add a time-axis in this graph and try to plot all data with respect to time, then it would give a run-chart as shown below.


The run-chart thus obtained is known as the control chart. It represents the data with respect to the time and ±3σ represents the upper and lower control limits of the process. We can also plot the customer’s specification limits (USL & LSL) if desired onto this graph. Now we can apply point number 3 and 4 in order to interpret the control chart or we can use Western Electric Rules if we want to interpret it in more detail.

The Control Charts and the Continuous Improvement

A given process can only be improved, if there are some tools available for timely detection of an abnormality due to any assignable causes. This timely and online signal of an abnormality (or an outlier) in the process could be achieved by plotting the process data points on an appropriate statistical control chart. But, these control charts can only tell that there is a problem in the process but cannot tell anything about its cause. Investigation and identification of the assignable causes associated with the abnormal signal allows timely corrective and preventive actions which, ultimately reduces the variability in the process and gradually takes the process to the next level of the improvement. This is an iterative process resulting in continuous improvement till abnormalities are no longer observed in the process and whatever variation is there, is because of the common causes only.

It is not necessarily true that all the deviations on control charts are bad (e.g. the trend of an impurity drifting towards LCL, reduced waiting time of patients, which is good for the process). Regardless of the fact that the deviation is goodor badfor the process, the outlier points must be investigated. Reasons for good deviation then must be incorporated into the process, and reasons for bad deviation needs to be eliminated from the process. This is an iterative process till the process comes under statistical control. Gradually, it would be observed that the natural control limits become much tighter than the customer’s specification, which is the ultimate aim of any process improvement program like 6sigma.

The significance of these control charts is evident by the fact that it was discovered in the 1920s by Walter A. Shewhart, since then it has been used extensively across the manufacturing industry and became an intrinsic part of the 6σ process.


To conclude, the statistical control charts not only help in estimating these process control limits but also raises an alert when the process goes out of control. These alerts trigger the investigation through root cause analysis leading to the process improvements which in turn leads to the decreased variability in the process leading to a statistical controlled process.

Protected: Proposal for Six Sigma Way of Investigating OOT & OOS in Pharmaceutical Products-2

This content is password protected. To view it please enter your password below:

Why & How Cpm came into existence? Weren’t Cp & Cpk enough to trouble us?

for posts

In the earlier post (see earlier post “what is Taguchi Loss function?”) we end up the discussion stating that Cp need to be penalized for the deviation of the process mean from the specification mean.

If you are producing goods near to LSL or USL hence, the chances of rejection increases which in turn increases the chances of reprocessing and rework thereby increasing the cost. Even if you manage to pass the quality on borderline then your customer has to adjust his process accordingly to accommodate your product thereby, increasing his set-up time and cost involved in readjusting his process. Moreover, the variance from your product and the variance from the customer’s process just get adds up to given final product with more variance (remember! Variance has an additive property).

It’s fine that we need to produce goods and services at the center of the specification, which means that we should know the position of process mean with respect to the center of the customer’s specifications. Hence another index was created called as Cpm was introduced which compensates for the deviation of process mean from the specification mean.

For calculating Cpm, the Cp formula is modified where the total variance of the system becomes


Where μ = process mean & T = specification mean or target specification

Hence, Cp formula



is modified to



This is necessary because if I can keep the process mean and the specification mean near to each other, the chances of touching the specification limits would be less which in turn would reduce the chances of reprocessing and we can control the process in a better way.

If μ = T, then Cpm = Cpk = Cp

Related Posts

What Taguchi Loss Function has to do with Cpm?

Car Parking & Six-Sigma

What’s the big deal, let’s rebuild the garage to fit the bigger car!

How the garage/car example and the six-sigma (6σ) process are related?

Now Let’s start talking about 6sigma

What do we mean by garage’s width = 12σ and car’s width = 6σ?

Kindly provide feedback for our continuous journey

What Taguchi Loss Function has to do with Cpm?

for posts

The traditional way of quality control can be called as “GOAL-POST” approach where, the possible out-come is goal or no-goal. Similarly, QA used to focus only on the end product’s quality with two possible outcomes, pass or fail.


Later on Taguchi gave the concept of producing products with quality targeted at the center of the customer’s specifications. He stated that as we move away from the center of the specification, we incur cost either at the producer’s end or at the consumer’s end in the form of re-work and re-processing. Holistically, it’s a loss to the society.


For example;

Buying a readymade suit, it is very difficult to find a suit that perfectly matches your body’s contour, hence you end up going for alterations. This incurs cost. Whereas, if you get a suit stitched by a tailor that fits your body contour (specification), it would not incur any extra cost in rework.

Let’s revise what we learned in “car parking” example (see links below). The Cp only focuses on how far the process control limits (UCL & LCL) are from the customer’s specification limits (USL & LSL) …. it doesn’t take into the account the deviation of process mean from the specification mean. Hence, we  require another index which can penalize the Cp for the above deviation and this new index is called as Cpm.

Related Posts

Why & How Cpm came into existence? Isn’t Cpk was not enough to trouble us?

Car Parking & Six-Sigma

What’s the big deal, let’s rebuild the garage to fit the bigger car!

How the garage/car example and the six-sigma (6σ) process are related?

Now Let’s start talking about 6sigma

What do we mean by garage’s width = 12σ and car’s width = 6σ?

Kindly provide feedback for our continuous journey

What are Seven QC Tools & How to Remember them?

Understanding and using hard-core statistics for continuous improvement is an issue with the shop-floor people. In order to overcome this issue it was felt necessary to present statistics in graphical forms so that everyone can understand it.

The 7QC tools made the quality control more simpler so that it could be comprehended easily by all. Now statistics is not a prerogative of some experts in the company. It could easily be percolated down the ranks, irrespective whether someone has a statistical background or not.

7QC tools is a collection of statistical tools which need not to be applied in a particular sequence. However, to understand and remember it we need to connect them with each other.

  1. Flow chart
  2. Cause & Effect diagram
  3. Control charts
  4. Check list
  5. Histogram
  6. Pareto Chart
  7. Scatter Plot

One can easily remember the list by using following relationship between the above tools (you can develop some other relationship).


If you want to remember 7QC tools then remember these sequence of events used in continuous improvement.

For starting any continuous improvement program, the first step is about defining the problem (quality characteristic ‘Y’ to be addressed). Once we define the problem, we need to understand the process in-depth using Process Flow Diagram to find the problem areas and non-value adding steps.

From the process flow diagram, find the probable sources of variations (X)  affecting the desired output (Y) using Cause & Effect Diagram.

Once we have identified the probable cause (X), then start monitoring ‘X’ and ‘Y’ using proper Control Charts. This will drop some of the ‘X’s’ came from the cause and effect diagram. Make note of ‘X’ that really affects the ‘Y’.

Once you have real ‘X’ that can affect ‘Y’ then prepare a plan for data collection using Check List to support the cause and effect relationship.

Data thus collected using check list is then arranged in graphical form using Histogram to have a quantitative pictorial view of the effect of ‘X’.

The bars of the histogram constructed above is then re-arranged in descending order to give Pareto Chart. This arranges the causes (X) in descending order of their effect on ‘Y’. Take the list of ‘X’ (usually top 3) having prominent effect on ‘Y’ for continuous improvement.

Finally show a quantitative relationship between top three ‘X’ and ‘Y’ using Scatter Plot in laboratory or by collecting more data from the plant and propose the improvement strategy by providing best conditions for ‘X’ so that ‘Y’ remains within the desired limits.

Related Blogs

7QC Tools: Flow Chart, Know Your Process Thoroughly

7QC Tools: Fish Bone or Ishikawa Diagram

7QC Tools: How to Extract More Information from the Scatter Plot?

7QC Tools: How to Draw a Scatter Plot?

7QC Tools: Scatter Plot — Caution! Misuse of Statistics!

7QC Tools: Scatter Plot

7QC Tools — How to Prioritize Your Work Using Pareto Chart?

7QC Tools — How to Interpret a Histogram?

7QC Tools — How to Draw a Histogram?

7QC Tools — Histogram of Continuous Data

7QC Tools — Histogram of Discrete Data

7QC tools — Check List

Excellent Templates for 7QC tools from ASQ

Kindly do provide feedback for continuous improvement

Understanding the Monster “Variance” part-1

for posts

This is one of the ways of calculating the variability in the data set.  Variance helps us in understanding how the data is arranged around the mean. In order to do so, we need to calculate the deviation of each observation from the mean in the data set .

For example: following is the time taken by me during the week to reach the office. The deviation of each  observation from the mean  time is given below.


Now next step is to calculate the average deviation from the mean using well-known formula


Note that the sum of all positive deviations = sum of all negative deviations which indicates that the mean divided the data set in two equal halves. As a result the sum of all deviation becomes zero, hence we need some other way to calculate this average deviation about the mean.

In order to avoid the issue, a very simple idea was used

Negative number → Square of negative number → positive number → square root of this number → parent number

Hence square of all the deviations are calculated and summed-up to give sum of squares (simply SS) [1]. This SS is then divided by total number of observations to give average variance s² around the mean.[2] The square root of this variance gives standard deviation s, the most common measure of variability.


What it physically means is that on an average data is deviating 7.42 units or simply one standard deviation (±1s) in either of the directions in a given data set.


Above discussion about the sample standard deviation represented by s. For population, variance is represented by σ² and standard deviation by σ.


The sample variance s² is the estimator of the population variance σ². The standard deviation is easier to interpret than the variance because the standard deviation is measured in the same units as the data.

 [1] Popularly known as sum of squares, this most widely term used in ANOVA and Regression analysis

[2] SS divided by its degree of freedom → mean sum of squares or MSE, these concepts would appear in ANOVA & Regression analysis.

Related articles:

Why it is so Important to Know the Monster “Variance”? — part-2

You just can’t knock down this Monster “Variance” —- Part-3

Is this information useful to you?

Kindly provide your feedback

ANOVA by Prof. Hunter

for posts

We are excited about the quality of videos available on youtube, on almost every topic. Look at this video on ANOVA by none other than Prof. Hunter himself.  These video was shot in 1966 in black & white but experience the contents.