What is a Normal Distribution?
Since the early days of my education when I was first introduced to probability in class 10, I carried forward the idea that the probability of getting a number from dice is fixed at (1/6). Later while I was pursuing higher mathematics in my college life, I came across a topic on ‘Sampling Terms’. This dictates that as we increase our sample size from 10, 100, 1000, and so on up to 10,000 we can see an interesting observation after plotting our results on photographic charts, the outcome showed a bell-shaped distribution generally referred to as ‘normal distribution’. Finally, I corrected my younger self and arrived at the conclusion that probabilities of events lie in intervals and not as constant values.
Let’s delve a little bit deeper into the concept of normal distribution…
For understanding its utility in our daily life, we can think of some examples for instance heights of different persons in a crowded population, the number of times we check our smartphones each day, and the graphical representation of the motion and position of a user's fingers to a relative position on the operating system. All of the examples covered above are random and independent representing spontaneous events although they can easily be represented with the help of normal distribution, having different means and variances.
What does having different means and variances denote?
The total sum of any variables(like height, count, etc.) divided by the number of samples is known as the mean, while variance is a measure of how far the data is spread out from the mean (also referred to as deviation). For a better understanding of terminology, I found this website useful.
Central Limit Theorem
Why then is it that the normal distribution alone may be used to model these seemingly unconnected events? The Central Limit Theorem, one of the most useful theorems in the field of probability, holds the key to the solution.
You'll hear about the Central Limit Theorem quite a bit in probability and statistics, particularly in hypothesis testing. The Central Limit Theorem (CLT) essentially states that as sample size rises, the distribution of sample means tends to resemble a normal distribution when independent random samples are taken repeatedly.
Even though the distribution's initial shape was uniform, the graphic above shows that as the value of n (the sample size) increased, the distribution began to resemble a normal distribution graph.
Understanding Central Limit Theorem with the help of a Financial Case Study
An investor is interested in estimating the return of the Nifty-500 stock market index which is comprised of 500 top-performing stocks. Due to the larger size of the sample, the investor is unable to analyze each stock independently, as he needs to complete his analysis by the evening he asked his friend who is a Data Scientist to help him out. His friend suggested he use random sampling and instructed him on further steps required to get an estimate of the overall return of the index.
Each sample of at least 30 to 50 equities is chosen at random by the investor from among the stocks. He made damn sure the samples are random and that any previously chosen samples had to be replaced in later samples in order to prevent bias.
If the first sample produces an average return of 8.9%, the next sample may produce an average return of 10%. Due to the nature of randomized sampling, each sample will produce a different result.
And as per the laws of CLT (Central Limit Theorem), As he will increase the size of the sample, with each sample he picks, the sample ‘means’ will start forming a graphical representation of the whole index(population). The distribution of the sample ‘means’ will move towards normal distribution as the value of the sample increases.
The investor can make more informed decisions by using the sample index's average return to estimate the return of the entire index of 500 stocks.
Key Takeaways
The central limit theorem (CLT) asserts that regardless of the distribution of the population, the distribution of sample "means" approaches a normal distribution as the sample size increases.
Sample sizes of 30 or more are frequently thought to be adequate for the CLT to handle.
The fact that the population's "mean" and standard deviation will match the average of the sample's "means" and standard deviations is a crucial component of CLT.
The features of a population may be predicted more precisely with a suitably high sample size.
Why is the Central Limit Theorem Minimize Sample Size 30?
In statistics, a sample size of 30 is quite normal. A sample size of 30 frequently stretches the population data set's confidence interval to the point where comments contradicting your findings are justified. The likelihood that your sample will be representative of your population set increases with the sample size.
What is the formula for Central Limit Theorem?
The Central Limit Theorem doesn't have its own formula, but it relies on sample mean and standard deviation. As sample means are gathered from the population, the standard deviation is used to distribute the data across a probability distribution curve.
Now let’s do a mental exercise, think about some daily life scenarios where you could apply which could be represented through Normal Distribution and CLT (Central Limit Theorem). Also, feel free to experiment and apply the fabulous insights that you will derive after analyzing them. If you encounter a sample size that does not form a Normal Distribution, even in such cases you can apply CLT (Central Limit Theorem) to gain wonderful insights out of the universe of your samples. Thanks for giving your valuable time by reading my article, Since you have made so far feel free to connect with me on social media handles for any queries, concerns, and feedback.
If you liked the article, please help share and spread the word. Also, don't forget to connect with me on Twitter.