M10. Dispersion, Skewness Correlation and Regression:

1. List of Formulae of partition values:

I. Individual Series:
1) Mean$(\bar{X})$ = $\frac{\sum X}{n}$;           [∵ n = total numbers of observation]. 
2) Median $(M_d)$ = $\left ( \frac{n+1}{2} \right )^{th}$ items. 
3) Mode = Maximum no of repeation. 
4) $Q_1$ = Value of $\left ( \frac{n+1}{4} \right )^{th}$ items. 
5) $Q_2$ = Value of $\left ( \frac{n+1}{2} \right )^{th}$ items.               [$Q_2 $ = Median] 
6) $Q_3$ = Value of $\left ( \frac{3(n+1)}{4} \right )^{th}$ items. 
7) $D_3$ = Value of $\left ( \frac{3(n+1)}{10} \right )^{th}$ items.           [called $3^{rd}$ decile] 
8) $P_{30}$ = Value of $\left ( \frac{30(n+1)}{100} \right )^{th}$ items.         [called $30^{th}$ percentile]

II. Discrete Series:
1) Mean$(\bar{X})$ = $\frac{\sum fX}{N}$;           [∵ N = $\sum f$ = total frequency]. 
2) Median $(M_d)$ = $\left ( \frac{N+1}{2} \right )^{th}$ items. 
3) Mode = Maximum no of repeation. 
4) $Q_1$ = Value of $\left ( \frac{N+1}{4} \right )^{th}$ items. 
5) $Q_2$ = Value of $\left ( \frac{N+1}{2} \right )^{th}$ items.               [$Q_2 $ = Median] 
6) $Q_3$ = Value of $\left ( \frac{3(N+1)}{4} \right )^{th}$ items. 
7) $D_4$ = Value of $\left ( \frac{4(N+1)}{10} \right )^{th}$ items.           [called $4^{th}$ decile] 
8) $P_{20}$ = Value of $\left ( \frac{20(N+1)}{100} \right )^{th}$ items.         [called $20^{th}$ percentile]

III. Continuous Series:
1) Mean$(\bar{X})$ = $\frac{\sum fX}{N}$;           [∵ N = $\sum f$ = total frequency]. 
2) Median $(M_d)$ = $l + \frac{\frac{N}{2}-c.f}{f}*h$  [∵ Class interval is lies in $\left ( \frac{N}{2} \right )^{th}$. items] 
[∵ $l$ = lower limit of the class; $N$ = total frequency; $f$ = corresponding frequency; $c.f.$ = cumulative frequency preceding the class; $h$ = class size]. 
3) Mode$(M_0)$ = $l+\frac{\Delta_1}{\Delta_1+\Delta_2}*h$   [∵ $\Delta_1$ = $f_1 - f_0$;  $\Delta_2$ = $f_1 - f_2$] 
[∵ $f_1$ = Maximum frequency; $f_0$ = frequency preceding model class; $f_2$ = frequency following model class] 
4) Mode = 3Median - 2Mean. 
5) $Q_1$ = $l + \frac{\frac{N}{4}-c.f}{f}*h$ 
6) $Q_2$ = $l + \frac{\frac{N}{2}-c.f}{f}*h$          [$Q_2 $ = Median] 
7) $Q_3$ = $l + \frac{\frac{3N}{2}-c.f}{f}*h$ 
8) $D_8$ = $l + \frac{\frac{8N}{10}-c.f}{f}*h$

2. Method of Measuring Dispersion:
I. - Range = $L$ - $S$;  [∵ Largest item - Smallest item]. 
    - Coefficient of Range = $\frac{L-S}{L+S}$ 
II. - Semi interquartile (Quartile Deviation) = $\frac{Q_3 - Q_1}{2}$ 
     - Coefficient  of  Q. D. = $\frac{Q_3 - Q_1}{Q_3 + Q_1}$
III. Mean Deviation (M. D.)/Average Deviation:
1) M. D. from Mean = $\frac{\sum \left | X-\bar{X} \right |}{n}$          [for Discrete] 
                                  = $\frac{\sum f\left | X-\bar{X} \right |}{N}$        [for Continuous] 
2) M. D. from Median = $\frac{\sum \left | X-M_d \right |}{n}$      [for Discrete] 
                                     = $\frac{\sum f\left | X-M_d \right |}{N}$    [for Continuous] 
          3) Coefficient of M.D. from Mean = $\frac{M.D\;from\;Mean}{Mean}$  
4) Coefficient of M.D. from Median = $\frac{M.D\;from\;Median}{Median}$

IV. Standard Deviation (S.D.):
1) Standard Deviation $(\sigma)$ = $\sqrt{\frac{\sum(X-\bar{X})^2}{n}} = \sqrt{\frac{\sum X^2}{n}-\left (\frac{\sum X}{n}\right )^2}$   [for Discrete] 
2)  Standard Deviation $(\sigma)$ = $\sqrt{\frac{\sum f(X-\bar{X})^2}{N}} = \sqrt{\frac{\sum fX^2}{N}-\left (\frac{\sum fX}{N}\right )^2}$   [for Continuous] 
3) Variance $(\sigma)^2$ = $\frac{\sum f(X-\bar{X})^2}{N}$ = $\frac{\sum fX^2}{N}-\left (\frac{\sum fX}{N}\right )^2$  
4) Coefficient of S.D. = $\frac{S.D.}{Mean}$ = $\frac{\sigma}{\bar{X}}$ 
5) Coefficient of Variation  =  $\frac{S.D.}{Mean}$ * 100   =  $\frac{\sigma}{\bar{X}} * 100$

3. Skewness:
(Measure of central tendency gives the information about the concentration of the items around the central value). 
I. Measures of Skewness:
1) Karl Pearson's coefficient of Skewness $(S_k(P))$ = $\frac{Mean\; -\;Mode}{Std.\; Deviation}$ = $\frac{\bar{X}-M_0}{\sigma}$ 
(It is also called Pearsonial coefficient of Skewness). 
2)  $S_k(P)$ = $\frac{3(Mean\; -\;Median)}{Std.\; Deviation}$ = $\frac{3(\bar{X}-M_d)}{\sigma}$ 
[∵ Mean - Mode = 3(Mean - Median)]

4. Correlation:
1) Karl Pearson's correlation Coefficient (r) = $\frac{Cov(X,Y)}{\sqrt{Var(x)}.\sqrt{Var(Y)}}$ 
                                                                       = $\frac{\Sigma(X-\bar{X}).(Y-\bar{Y})}{\sqrt{\Sigma(X-\bar{X})^2}.\sqrt{\Sigma(Y-\bar{Y})^2}}$ 
                                                                       = $\frac{\Sigma{xy}}{\sqrt{\Sigma x^2} .\sqrt{\Sigma y^2}}$ 
[∵ Where $x = X - \bar{X}$ and $y = Y - \bar{Y}$]
2) Karl Pearson's correlation Coefficient (r) = $\frac{n\Sigma{XY}-\Sigma{X} .\Sigma{Y}}{\sqrt{n \Sigma{X^2}- (\Sigma{X})^2}\sqrt{n \Sigma{Y^2}- (\Sigma{Y})^2}}$

5. Regression:
The regression is a mathematical measure of the average relationship between two or more variables in terms of the original data.
Graph: Regression Line - y on x
Line of Regression: In scatter diagram, we find the point of cluster around a curve called regression curve.
If a curve is a st. line it is called the regression line. 
The regression line can be expressed by two different algebraic equation, such as follows: 
i) Regression equation of $y$ on $x$ is $y = a + bx$; where b is known as regression coefficient $(b_{yx})$ of $y$ on $x$. 
ii) Regression equation of $x$ on $y$ is $x = a + by$; where b is known as regression coefficient $(b_{xy})$ of $x$ on $y$. 
1) Correlation Coefficient between the two variables $x$ and $y$ is; 
$r = \sqrt{b_{yx}.b_{xy}}$ ................................ (i)
2) Regression equation of $y$ on $x$: 
$y - \bar{y} = b_{yx}.(x - \bar{x})$ .................................... (ii)  
[Note: Let the regression equation of $y$ on $x$ be, 
$y = a + bx$                                   (i)
⇒ $\Sigma{y} = na + b\Sigma{x}$
⇒ $\frac{y}{n} = a + b\frac{\Sigma{x}}{n}$
⇒ $\bar{y} = a + b\bar{x} $           (ii) 
Subtracting (ii) from (i); 
⇒ $y - \bar{y} = b_{yx}(x - \bar{x})$             (iii) 
This equation (iii) is the regression equation of $y$ on $x$. ]
3) Regression coefficient $(b_{yx})$ of $y$ on $x$ is, 
$b_{yx} = \frac{n\Sigma{xy} - \Sigma{x}.\Sigma{y}}{n\Sigma{x^2} - (\Sigma{x})^2}$ .................................... (iii)
4) Regression equation of $x$ on $y$: 
$x - \bar{x} = b_{xy}.(y - \bar{y})$ .................................... (iv)
5) Regression coefficient $(b_{xy})$ of $x$ on $y$ is, 
$b_{xy} = \frac{n\Sigma{xy} - \Sigma{x}.\Sigma{y}}{n\Sigma{y^2} - (\Sigma{y})^2}$ .................................. (v)
Return to Main Menu