Why Standard Normal Distribution Table is so important?

for posts

Abstract

Since we are entering the technical/statistical part of the subject hence, it would be better for us to understand the concept first

For many business decisions, we need to calculate the likelihood or probability of an event to occur. Histograms along with relative frequency of a dataset can be used to some extent.. But for every problem we come across we need to draw the histogram and relative frequency to find the probability using area under the curve (AUC).

In order to overcome this limitation a standard normal distribution or Z-distribution or Gaussian distribution was developed and the AUC or probability between any two points on this distribution is well documented in the statistical tables or can be easily found by using excel sheet.

But in order to use standard normal distribution table, we need to convert the parent dataset (irrespective of the unit of measurement) into standard normal distribution using Z-transformation. Once it is done, we can look into the standard normal distribution table to calculate the probabilities.

picture11


From my experience, I found the books belonging to category “statistics for business & economics” are much better for understanding the 6sigma concepts rather than a pure statistical book. Try any of these books as a reference guide.


Introduction

Let’s understand by this example

A company is trying to make a job description for the manager level position and most important criterion was the years of experience a person should possess. They collected a sample of ten manager from their company, data is tabulated below along with its histogram.

picture5

As a HR person, I want to know the mean years of experience of a manager and the various probabilities as discussed below

Average experience = 3.9 years

What is the probability that X ≤ 4 years?

What is the probability that X ≤ 5 years?

What is the probability that 3 < X ≤ 5 years?

In order to calculate the above probabilities, we need to calculate the relative frequency and cumulative frequency

picture6

Now we can answer above questions

What is the probability that X ≤ 4 years? = 0.7 (see cumulative frequency)

What is the probability that X ≤ 5 years? = 0.9

What is the probability that 3 < X ≤ 5 years? = (probability X ≤ 5) – (probability X < 3) = 0.9-0.3 = 0.6 i.e. 60% of the managers have experience between 3 to 5 years.

Area under the curve (AUC) as a measure of probability:

Width of a bar in the histogram = 1 unit

Height of the bar = frequency of the class

Area under the curve for a given bar = 1x frequency of the class

Total area under the curve (AUC) = total area under all bars = 1×1+1×2+1×4+1×2+1×1 = 10

Total area under the curve for class 3 < x ≤ 5 = (AUC of 3rd class + AUC of 4th class) /total AUC = (4+2)/10 = 0.6 = probability of finding x between 3 and 5 (excluding 3)

Now, what about the probability of (3.2 < x ≤ 4.3) =? It will be difficult to calculate by this method, as requires the use of calculus.

Yes, we can use calculus for calculating various probabilities or AUC for this problem. Are we going to do this whole exercise again and again for each and every problem we come across?

With God’s grace, our ancestors gave us the solution in the form of Z-distribution or Standard normal distribution or Gaussian distribution, where the AUC between any two points is already documented.

This Standard normal distribution or Gaussian distribution is widely used in the scientific measurements and for drawing statistical inferences. This normal curve is shown by a perfectly symmetrical and bell shaped curve.

The Standard normal probability distribution has following characteristics

  1. The normal curve is defined by two parameters, µ = 0 and σ = 1. They determine the location and shape of the normal distribution.
  2. The highest point on the normal curve is at the mean which is also the median and mode.
  3. The normal distribution is symmetrical and tails of the curve extend to infinity i.e. it never touches the x-axis.
  4. Probabilities of the normal random variable are given by the AUC. The total AUC for normal distribution is 1. The AUC to the right of the mean = AUC to the left of mean = 0.5.
  5. Percentage of observations within a given interval around the mean in a standard normal distribution is shown below

picture6b

The AUC for standard normal distribution have been calculated for all given value of p ≥ z and are available in tables that can be used for calculating probabilities.

picture7

Note: be careful whenever you are using this table as some table give area for ≤ z and some gives area between two z-values.

Let’s try to calculate some of the probabilities using above table

Problem-1:

Probability p(z ≥ 1.25). This problem is depicted below

picture9

Look for z = 1.2 in vertical column and then look for z = 0.05 for second decimal place in horizontal row of the z-table, p(z ≤ -1.25) = 0.8944

Note! The z-distribution table given above give the cumulative probability for p(z ≤ 1.25), but here we want p(z ≥ 1.25). Since total probability or AUC = 1, p(z ≥ 1.25) will be given by 1- p(z ≤ 1.25)

Therefore

p(z ≥ 1.25) = 1- p(z ≤ -1.25) = 1-0.8944 = 0.1056

Problem-2:

Probability p(z ≤ -1.25). This problem is depicted below

picture8

Note! Since above z-distribution table doesn’t contain -1.25 but the p(z ≤ -1.25) = p(z ≥ 1.25) as standard normal curve is symmetrical.

Therefore

Probability p(z ≤ -1.25) = 0.1056

Problem-3:

Probability p(-1.25 ≤ z ≤ 1.25). This problem is depicted below

picture10

For the obvious reasons, this can be calculated by subtracting the AUC of yellow region from one.

p(-1.25 ≤ z ≤ 1.25) = 1- {p(z ≤ -1.25) + p(z ≥ 1.25)} = 1 – (2 x 0.1056) = 0.7888

From the above discussion, we learnt that a standard normal distribution table (which is readily available) could be used for calculating the probabilities.

Now comes the real problem! Somehow I have to convert my original dataset into the standard normal distribution, so that calculating any probabilities becomes easy. In simple words, my original dataset has a mean of 3.9 years with σ = 1.37 years and we need to convert it into the standard normal distribution with a mean of 0 and σ = 1.

The formula for converting any normal random variable x with mean µ and standard deviation σ to the standard normal distribution is by z-transformation and the value so obtained is called as z-score.

picture22

Note that the numerator in the above equation = distance of a data point from the mean. The distance so obtained is divided by σ, giving distance of a data point from the mean in terms of σ i.e. now we can say that a particular data is 1.25σ away from the mean. Now the data becomes unit less!

picture12

Let’s do it for the above example discussed earlier

picture11

Note: Z-distribution table is used only in the cases where number of observations ≥ 30. Here we are using it to demonstrate the concept. Actually we should be using t-distribution in this case.

We can say that the managers with 4 years of experience are 0.073σ away from the mean and on the right hand side. Whereas the managers with 3 years of experience are -0.657σ away from the mean on left hand side.

Now, if you look at the distribution of the Z-scores, it resembles the standard normal distribution with mean = 0 and standard deviation =1.

But, still one question need to be answered. What is the advantage of converting a given data set into standard normal distribution?

There are three advantages, first being, it enables us to calculate the probability between any two points instantaneously. Secondly, once you convert your original data into standard normal distribution, you are ending in a unit less distribution (both numerator & denominator in Z-transformation formula has same units)! Hence, it makes possible to compare an orange with an apple. For example, I wish to compare the variation in the salary of the employees with the variation in their years of experience. Since, salary and experience has different unit of measurements, it is not possible to compare them but, once both distributions are converted to standard normal distribution, we can compare them (now both are unit less).

Third advantage is that, while solving problems, we needn’t to convert everything to z-scores as explained by following example

Historical 100 batches from the plant has given a mean yield of 88% with a standard deviation of 2.1. Now I want to know the various probabilities

Probability of batches having yield between 85% and 90%

Step-1: Transform the yield (x) data into z-scores

What we are looking for is the probability of yield between 85 and 90% i.e. p(85 ≤ x ≤ 90)

picture23

picture24

Step-2: Always draw rough the standard normal curve and preempt what area one is interested in

picture28

Step-3: Use the Z-distribution table for calculating probabilities.

The Z-distribution table given above can be used in following way to calculate p(-1.43 ≤ z ≤ 0.95)

Diagrammatically, p(-1.43 ≤ z ≤ 0.95) = p(z ≤ 0.95) – p(z ≤ -1.43), is represented below

picture29

p(-1.43 ≤ z ≤ 0.95) = p(z ≤ 0.95) – p(z ≤ -1.43)= 0.83-0.076 = 0.75

75% of the batches or there is a probability of 0.75 that the yield will be between 85 and 90%.

It can also be interpreted as “probability of getting a sample mean between 85 and 90 given that population mean is 88% with standard deviation of 2.1”.

Probability of yield ≥ 90%

What we are looking for is the probability of yield ≥ 90% i.e. p(x ≥ 90)

picture25

= p(z ≥ 0.95)

picture15

Diagrammatically, p(-1.43 ≤ z ≤ 0.95) = p(z ≤ 0.95) – p(z ≤ -1.43), is represented below

picture16

p(x ≥ 90) = p(z ≥ 0.95) = 1-p(z ≤ 0.95) = 1- 0.076 = 0.17, there is only 17% probability of getting yield ≥ 90%

Probability of yield between ≤ 90%

This is very easy, just subtract p(x ≥ 90) from 1

Therefore,

p(x ≤ 90) = 1- p(x ≥ 90) = 1- 0.17 = 0.83 or 83% of the batches would be having yield ≤ 90%.

Now let’s work the problem in reverse way, I want to know the yield corresponding to the probability of ≥ 0.85.

Graphically it can be represented as

picture18

Since the table that we are using gives the probability value ≤ z value hence, first we need to find the z-value corresponding to the probability of 0.85. Let’s look into the z-distribution table and find the probability close to 0.85

picture17

The probability of 0.8508 correspond to the z-value of 1.04

Now we have z-value of 1.04 and we need find corresponding x-value (i.e. yield) using the Z-transformation formula

picture22

picture27

Solving for x

x = 90.18

Therefore, there is 0.85 probability of getting yield ≤ 90.18% (as z-distribution table we are using give probability for ≤ z) hence, there is only 0.15 probability that yield would be greater than 90.18%.

Above problem can be represented by following diagram

picture19

Exercise:

The historical data shows that the average time taken to complete the BB exam is 135 minutes with a standard deviation of 15 minutes.

Fins the probability that

  1. Exam is completed in less than 140 minutes
  2. Exam is completed between 135 and 145 minutes
  3. Exam takes more than 150 minutes

Summary:

This articles shows the limitations of histogram and relative frequency methods in calculating probabilities, as for every problem we need to draw them. To overcome this challenge, a standardized method of using standard normal distribution is adopted where, the AUC between any two points on the curve gives the corresponding probability can easily be calculated using excel sheet or by using z-distribution table. The only thing we need to do is to convert the given data into standard normal distribution using Z-transformation. This also enables us to compare two unrelated things as the Z-distribution is a unit less with mean = 0 and standard deviation = 1. If the population standard deviation is known, we can use z-distribution otherwise we have to work with sample’s standard deviation and we have to use Student’s t-distribution.

7QC Tools: Case Study on Interpreting the Control Charts

for posts

A process was running in a chemical plant. The final stage of the process was the crystallization, which gave the pure product. There were two crystallizer used for the purpose, each operated by a different individual. The SOP says that crystallizer has to be maintained between 30-40°C and for 110 to 140 minutes. The data for a month is captured below

picture109 In order to understand the process, I-MR control chart was plotted (for simplicity, R-chart is not captured).

picture110

As we have learned from the earlier blog, the alternate points above and below the central line represents some short of stratification (see the short connecting arms and the concentration of data points in zone B and C).

We plotted the histogram of the above data set and kept on increasing the number of classes. What we saw was the emergence of a bimodal distribution as we kept on increasing the number of classes.

picture112

So, one thing was sure, there were two processes running in the plant. Now question that was to be answered was “What is causing this stratification?”

We started with crystallizer, as soon as we plotted the simple run chart of the process with groups using Minitab®, we could see the difference. Crystallizer-2 was always giving better yield. This should not happen because both the crystallizer were identical and were connected to same utilities. Then we thought about the different operators might be the reason for this behavior, as this was the only factor that was different for both the crystallizer.

picture114When we plotted the same run chart with grouping, but this time operator was used for the purpose of grouping. We got the same result as was found with the crystallizers, the operator-2 working on the crystallizer-2 was producing more quantity of the product. This run chart is not shown here.

We further grilled down to the operating procedure adopted by the two operators. We studied temperature and the maintenance time using scatter plot. The results are shown below

picture115

Finally, it was found that operator-2 was maintaining the crystallizer-2 at the lower end of the prescribed temperature and for longer duration. Hence, specification for temperature and the maintenance time was revised.

7QC Tools: Interpretation of Control Charts Made Easy

for posts

picture106

Visual Inspection of the Control Charts for Unnatural Patterns

Besides above famous rules, there are patterns on the control charts that needs to be understood by every quality professionals. Let’s understand these patterns using following examples. It would be easier to understand them if we can imagine the type of distribution of the data displayed on the control chart.

Case-1: Non-overlapping distribution

As a production-in-charge, I am using two different grades of raw material with different quality attributes (non-overlapping but at the edge of the specification limits) and I am assuming that the quality attributes of the final product will be normally distributed i.e. I am assuming that most of final product will hit the center of the process control limits.

If the quality of the raw material is detrimental to the quality of the final product then my assumption about the output is wrong. Because the distribution of the final product quality would take a bimodal shape with only few data at the junction of the distribution. Same information would be reflected onto the control chart with high concentration of data points near the control limits and fewer or no points near the center. Here is the control chart of the final product

picture96

In this completely non-overlapping distribution, there will be unusual long connecting arms in the control charts. There will be absence of points near the central line.

If we plot the histogram of this data set and go on increasing the number of classes, the two distribution would get separated.

picture97

So, whenever we see a control charts with the data points concentrated towards the control limits and no points at the center of the control charts, immediately we should assume that it is a mixture of two non-overlapping distribution. Remember long connecting arms and few data points at the center of the control chart.

Case-2: Partially overlapping distribution

Assume this scenario: A product is being produced in my facility in two shifts by two different operators. Each day I have two batches, one in each shift. There is a well written batch manufacturing record indicating that the temperature of the reactor should be between 50 to 60 °C. The control chart of a quality attribute of the product is represented by following control chart.

picture98

picture99

We can see that the data points on the control chart are arranged in an alternate fashion around the central line. The first batch (from the 1st shift) is below the central line and next batch (from the 2nd shift) is above the central line. This control chart shows that even we are following the same manufacturing process, there is a slight difference in the process. It was found that the 1st shift in-charge was operating towards 50 °C and the 2nd shift in-charge was operating towards 60 °C. This type of alternate arrangement is indication of stratification (due to operators, machines etc.) and is characterized by short connecting arms.

There are the cases of partially overlapping distribution resulting in a bimodal distribution, which means that there will be few points in the central region of the control charts but, majority of the data points would be distributed in zone C or B. In such cases, it would be appropriate to plot the histogram with groups (like operator, shift etc).

Case-3: Significant Overlapping distribution

If there is significant overlap between the two input distributions then it would be difficult to differentiate them in the final product and the combined distribution would give a picture of a single normal distribution. Suppose the operators in the above case-2 were performing the activity at 55 °C and 60 °C respectively. This would result in an overlapping distribution as shown below

picture100

Case-4: Mixture of unequal proportion

As a shift-in-charge, I am running short of the production target. What I did to meet the production target was to mix the current batch with some of the material produced earlier for some other customer with slightly different specification. I hoped that it wouldn’t be caught by the QA!. The final control chart of the process looked like

picture101

We can see from the control chart that if two distributions are mixed in an unequal proportions then the combined distribution would be an unsymmetrical distribution. In this case one-half of the control chart (in present case the lower half) would have maximum data points and other half would have less data points.

Case-5: Cyclic trends

If one observe a repetition of the trend on the control chart, then there is a cyclic effect like sales per month of the year. Sales in some of the specific months are higher than the sales in some other months.

picture104

Case-6: Gradual shift in the trend

A gradual change in the process is indicated by the change in the location of the data points on the control charts. This chart is most commonly encountered during the continuous improvement programs when we compare the process performance before and after the improvement program.

picture105

If it is observed that this shift is gradual on the control charts, then there must be a reason for the same, like wear and tear of machine, problem with the calibration of the gauges etc.

Case-7: Trend

If one observe that the data points on the control charts are gradually moving up or down, then it is a case of trend. This is usually cause by gradual shift in the operating procedure due to wear and tear of machines, gauges going out of calibration etc.

picture103

Summary of unnatural pattern on the control charts
Unnatural pattern Pattern Description Symptom in control chart
Large shift (strays, freaks) Sudden and high change Points near and or beyond control limits
Smaller sustained shift Sustained smaller change Series of points on the same side of the central line
Trends A continuous changes in one direction Steadily increasing or decreasing run of points
Stratification Small differences between values in a long run, absence of points near the control limits A long run of points near the central line on the both sides
Mixture Saw-tooth effect, absence of points near the central line A run of consecutive points on both sides of central line, all far from the central line
Systematic Variation or stratification Regular alternation of high and low values A long run of consecutive points alternating up and down
Cycle Recurring periodic

movement

Cyclic recurring patterns of points

For the case study see next blog

7QC Tools — Pareto Chart, How to Prioritize Your Work?

for posts

Pareto Principle: It has been found that 80% of the trouble (defects) are because of 20% of the reasons. Hence, 6sigma or continuous improvement program focuses on controlling these 20% of the causes so that 80% of the defects are under control. This helps in setting the priority (focus area) for a 6sigma project.

We have seen that the histogram provides the frequency (number of observations) in a given class. However, the classes are not arranged in descending order.

Picture9

If the bars of a histogram are arranged in descending order of the frequency then that type of sorted histogram is called as Pareto Chart. Also, cumulative frequency is also plotted along with the frequency on the Pareto Chart.

Picture12

Picture13

In above example, if we can control reasons C, D and G then we can reduce the failures by 82% (see cumulative frequency).

Hence, Pareto Chart helps you in simplifying the bigger problem by identifying the vital few significant variables and enables you identify the focus on them with your limited resources.

Cumulative Frequency:

Picture14

Picture15

Related Blogs

7QC Tools: Flow Chart, Know Your Process Thoroughly

7QC Tools: Fish Bone or Ishikawa Diagram

7QC Tools: How to Extract More Information from the Scatter Plot?

7QC Tools: How to Draw a Scatter Plot?

7QC Tools: Scatter Plot — Caution! Misuse of Statistics!

7QC Tools: Scatter Plot

7QC Tools — How to Interpret a Histogram?

7QC Tools — How to Draw a Histogram?

7QC Tools — Histogram of Continuous Data

7QC Tools — Histogram of Discrete Data

7QC tools — Check List

Excellent Templates for 7QC tools from ASQ

What are Seven QC Tools & How to Remember them?

Kindly do provide feedback for continuous improvement

7QC Tools — How to Interpret a Histogram?

for posts

Different data set would give histograms of different shapes as shown below. However, always remember to draw the histogram of your dataset first before taking any decision or starting a continuous improvement.

Picture15

Symmetrical histogram: A given process is stable and is normally distributed around the mean. If this kind of histogram is seen during a six-sigma project then probably you are required to reduce the variance in the process to reduce the width of the histogram so that you move away from the customer’s specifications.

Left or right skewed histogram: In both of these case, maximum number of data is scattered around the median (caution:  not the mean!) which represents the real measure of central tendency instead of mean in these cases of skewed data. Efforts in six-sigma should be on identifying the cause of skewedness and eliminating those causes.

Picture11Taking mean as the measure of central tendency in these case would be misleading as mean is affected by the extreme values in the data set.

If you are supposed to work on this type of data, find the reasons for those extreme observations and eliminate them. These are the low hanging fruit which we must take advantage of.

Picture17

Let’s take an example of the yield if a process is represented by following left skewed histogram

Picture19

Median gives an estimate that 50% of the batches are having yield at some value (say 85%) however, if we present mean as the measure of central tendency and tell to management that mean yield of the process is just 80%. Is it the right approach? We then just found the reason for those extreme left values and tell the management that by applying 6sigma, you have increased the yield to 85%!

Bimodal histogram: It means that the data set contains observations from two different populations. If it happens during any process then it must be assumed that there are two processes are running or two operators are working differently (which they are not supposed to do).

Picture16Note: If number of classes are less (<5) you can’t see the bimodal histogram. Just increase the number of classes to 7, 9, 11 etc. it would be evident if bimodal process is running.

Picture16

Be it management or a 6sigma practitioner, always draw the histogram of your data OR by default take median as the measure of the central tendency as it is not affected by the extreme values. Also see the effect of increasing the number of classes on the shape of the histogram for any deviation in the standard operating procedure.

Lesson learnt: while analyzing your data, always ask?

Where is the center? Which measurement of the central tendency I should consider for my data set based on the histogram?

Related Blogs

7QC Tools: Flow Chart, Know Your Process Thoroughly

7QC Tools: Fish Bone or Ishikawa Diagram

7QC Tools: How to Extract More Information from the Scatter Plot?

7QC Tools: How to Draw a Scatter Plot?

7QC Tools: Scatter Plot — Caution! Misuse of Statistics!

7QC Tools: Scatter Plot

7QC Tools — How to Prioritize Your Work Using Pareto Chart?

7QC Tools — How to Draw a Histogram?

7QC Tools — Histogram of Continuous Data

7QC Tools — Histogram of Discrete Data

7QC tools — Check List

Excellent Templates for 7QC tools from ASQ

What are Seven QC Tools & How to Remember them?

Kindly do provide feedback for continuous improvement

7QC Tools — How to Draw a Histogram?

for posts

First approach is by using the readymade template from ASQ, Histogram

Second approach is by using Excel sheet, following are the steps involved

Excel → data → data analysis → Histogram

Picture1

Third approach is manually, it involves following steps

Step-1: Calculate the range (maximum-minimum)

Step-2: Divide range by number of classes you want (usually 5 or 7). This will give you class width.

Step-3: Defining class intervals. This is done starting from the minimum value in the data set and by adding the class width (step-2) to it. This will give you 1st class interval.

Second class interval is obtained by adding class width to the upper class limit of the first class interval.

Step-4: Step-3 is repeated 5 to 7 times depending on the number of class interval (step-2)

Step-5: segregate the data according to the class interval, this will give the frequency of each class.

Step-6: plot the bar graph between classes and the frequency to give histogram.

Fourth Approach: Using Minitab

Step-1: Go to Graph in menu section and then click histogram

Picture27

Step-2: Once you click the histogram, a menu will appear where you require to select the type of histogram desired.

Picture28

Step-3: After selecting the type of histogram, another menu will appear which needs to be filled

Picture29

After filling the above data, click OK to get the histogram

Picture30

Related Blogs

7QC Tools: Flow Chart, Know Your Process Thoroughly

7QC Tools: Fish Bone or Ishikawa Diagram

7QC Tools: How to Extract More Information from the Scatter Plot?

7QC Tools: How to Draw a Scatter Plot?

7QC Tools: Scatter Plot — Caution! Misuse of Statistics!

7QC Tools: Scatter Plot

7QC Tools — How to Prioritize Your Work Using Pareto Chart?

7QC Tools — How to Interpret a Histogram?

7QC Tools — Histogram of Continuous Data

7QC Tools — Histogram of Discrete Data

7QC tools — Check List

Excellent Templates for 7QC tools from ASQ

What are Seven QC Tools & How to Remember them?

Kindly do provide feedback for continuous improvement

7QC Tools — Histogram of Continuous Data

for posts

Let’s consider the percentage yield of a process, which ranges from 74 to 94. This is a case of continuous data.

As we did for discrete data, we have constructed the histogram of the yield data by dividing the yield into some sub-classes followed by putting the data into the class it belongs.

Picture10

Note:

Unlike discrete data, the bars of the present histogram are touching each other as this is a case of continuous data.

Above histogram tells us that the maximum data is clustered within 78-84. Looking at the graph, it appears that the batches with yield > 88 are outliers! but it’s true. What we should do to improve the process?

What we can do it to compare the process with yield range of 74-82 with the process having yield range of 88-94 and find out the difference.

Hence, histogram gave a direction for continuous improvement.

Let’s go a step ahead and plot the customer’s specification (LSL & USL) along with the histogram as shown below. This gives you the idea about process capability and outliers.

Picture11

 

Related Blogs

7QC Tools: Flow Chart, Know Your Process Thoroughly

7QC Tools: Fish Bone or Ishikawa Diagram

7QC Tools: How to Extract More Information from the Scatter Plot?

7QC Tools: How to Draw a Scatter Plot?

7QC Tools: Scatter Plot — Caution! Misuse of Statistics!

7QC Tools: Scatter Plot

7QC Tools — How to Prioritize Your Work Using Pareto Chart?

7QC Tools — How to Interpret a Histogram?

7QC Tools — How to Draw a Histogram?

7QC Tools — Histogram of Discrete Data

7QC tools — Check List

Excellent Templates for 7QC tools from ASQ

What are Seven QC Tools & How to Remember them?

Is this information useful to you?

Kindly do provide feedback for continuous improvement

7QC Tools — Histogram of Discrete Data

for posts

Histogram is a pictorial view of the data set, it is like a portrait of your data. It tells you how your data looks like, where your most of the data is clustered (whether it is the mean or median is the true measure of the central tendency). In short it tells you about the distribution of the data set.

Histogram divides the data set into small sub-units, called as “classes”. This is followed by arranging data according to the class it belongs. In this way we have “class-range” along with their frequencies (i.e. how many data points are there in that class). This is followed by plotting a bar-graph of class Vs frequency, which is known as histogram.

For example

In the check list example we have collected the data about the Reason for failure (= class) and number of defects because of that reason (= frequency).

Picture5

Now, if we plot the bar graph of the above data, we will get a histogram as shown below, it appears that defects due to reason C, D and G is more prominent and these three requires immediate attention.

Picture9

Note:

The above data set is a discrete variable hence, the  bars of the histogram are not touching each other.

Related Blogs

7QC Tools: Flow Chart, Know Your Process Thoroughly

7QC Tools: Fish Bone or Ishikawa Diagram

7QC Tools: How to Extract More Information from the Scatter Plot?

7QC Tools: How to Draw a Scatter Plot?

7QC Tools: Scatter Plot — Caution! Misuse of Statistics!

7QC Tools: Scatter Plot

7QC Tools — How to Prioritize Your Work Using Pareto Chart?

7QC Tools — How to Interpret a Histogram?

7QC Tools — How to Draw a Histogram?

7QC Tools — Histogram of Continuous Data

7QC tools — Check List

Excellent Templates for 7QC tools from ASQ

What are Seven QC Tools & How to Remember them?

Kindly do provide feedback for continuous improvement