Since the early days of my education when I was first introduced to probability in class 10, I carried forward the idea that the probability of getting a number from dice is fixed at (1/6). Later while I was pursuing higher mathematics in my college life, I came across a topic on Sampling Terms. This dictates that as we increase our sample size from 10, 100, 1000, and so on up to 10,000 we can see an interesting observation after plotting our results on photographic charts, the outcome showed a bell-shaped distribution generally referred to as normal distribution. Finally, I corrected my younger self and arrived at the conclusion that probabilities of events lie in intervals and not as constant values.

Lets delve a little bit deeper into the concept of normal distribution

For understanding its utility in our daily life, we can think of some examples for instance heights of different persons in a crowded population, the number of times we check our smartphones each day, and the graphical representation of the motion and position of a user's fingers to a relative position on the operating system. All of the examples covered above are random and independent representing spontaneous events although they can easily be represented with the help of normal distribution, having different means and variances.

The total sum of any variables(like height, count, etc.) divided by the number of samples is known as the mean, while variance is a measure of how far the data is spread out from the mean (also referred to as deviation).For a better understanding of terminology, I found this website useful.

Why then is it that the normal distribution alone may be used to model these seemingly unconnected events? The Central Limit Theorem, one of the most useful theorems in the field of probability, holds the key to the solution.

You'll hear about the Central Limit Theorem quite a bit in probability and statistics, particularly in hypothesis testing. The Central Limit Theorem (CLT) essentially states that as sample size rises, the distribution of sample means tends to resemble a normal distribution when independent random samples are taken repeatedly.

Even though the distribution's initial shape was uniform, the graphic above shows that as the value of n (the sample size) increased, the distribution began to resemble a normal distribution graph.

An investor is interested in estimating the return of the Nifty-500 stock market index which is comprised of 500 top-performing stocks. Due to the larger size of the sample, the investor is unable to analyze each stock independently, as he needs to complete his analysis by the evening he asked his friend who is a Data Scientist to help him out. His friend suggested he use random sampling and instructed him on further steps required to get an estimate of the overall return of the index.

Each sample of at least 30 to 50 equities is chosen at random by the investor from among the stocks. He made damn sure the samples are random and that any previously chosen samples had to be replaced in later samples in order to prevent bias.

If the first sample produces an average return of 8.9%, the next sample may produce an average return of 10%. Due to the nature of randomized sampling, each sample will produce a different result.

And as per the laws of CLT (Central Limit Theorem), As he will increase the size of the sample, with each sample he picks, the sample means will start forming a graphical representation of the whole index(population). The distribution of the sample means will move towards normal distribution as the value of the sample increases.

The investor can make more informed decisions by using the sample index's average return to estimate the return of the entire index of 500 stocks.

The central limit theorem (CLT) asserts that regardless of the distribution of the population, the distribution of sample "means" approaches a normal distribution as the sample size increases.

Sample sizes of 30 or more are frequently thought to be adequate for the CLT to handle.

The fact that the population's "mean" and standard deviation will match the average of the sample's "means" and standard deviations is a crucial component of CLT.

The features of a population may be predicted more precisely with a suitably high sample size.

In statistics, a sample size of 30 is quite normal. A sample size of 30 frequently stretches the population data set's confidence interval to the point where comments contradicting your findings are justified. The likelihood that your sample will be representative of your population set increases with the sample size.

The Central Limit Theorem doesn't have its own formula, but it relies on sample mean and standard deviation. As sample means are gathered from the population, the standard deviation is used to distribute the data across a probability distribution curve.

Now lets do a mental exercise, think about some daily life scenarios where you could apply which could be represented through Normal Distribution and CLT (Central Limit Theorem). Also, feel free to experiment and apply the fabulous insights that you will derive after analyzing them. If you encounter a sample size that does not form a Normal Distribution, even in such cases you can apply CLT (Central Limit Theorem) to gain wonderful insights out of the universe of your samples. Thanks for giving your valuable time by reading my article, Since you have made so far feel free to connect with me on social media handles for any queries, concerns, and feedback.

If you liked the article, please help share and spread the word. Also, don't forget to connect with me on Twitter.

]]>Deep networks have outperformed traditional ML techniques in numerous areas, including speech, natural language, vision, and gaming, with accuracy levels that are significantly higher.

For instance, the graph below illustrates the picture classification accuracy of various approaches using the ImageNet dataset; the blue color denotes traditional machine learning (ML) methods, while the red color denotes a deep convolutional neural network (CNN) method. Here, deep learning completely defeats traditional ML.

When there is more data, deep networks scale considerably better than traditional ML algorithms. The straightforward yet potent illustration of this is the graph below. More data is frequently the greatest recommendation when trying to increase accuracy with a deep network. This quick and simple repair doesn't perform nearly as well with traditional ML algorithms, and more sophisticated techniques are frequently needed to increase accuracy.

Unlike traditional ML methods, deep learning approaches can be used in a variety of domains and applications. First, using pre-trained deep networks for various applications within the same domain is now efficient thanks to transfer learning.

For instance, in computer vision, object recognition and segmentation networks frequently use feature extraction front-ends that were trained on pre-trained image classification networks. The full model's training is facilitated by using these pre-trained networks as front-ends, which frequently leads to better performance in a shorter amount of time.

In addition, as speech recognition uses many of the same deep learning concepts and methods as natural language processing, understanding how to apply deep networks to NLP isn't too difficult given the same foundational knowledge.

Classical ML methods frequently necessitate significant feature engineering. Typically, exploratory data analysis is performed on the dataset initially. A dimensionality reduction may then be performed for ease of processing. Finally, just the best features should be passed on to the ML algorithm.When employing a deep network, there is no need for this because data can be passed straight to the network and usually achieves good performance immediately away. This completely avoids the lengthy and difficult feature engineering stage of the process.

Deep Learning is a branch of Machine Learning that use Neural Network ideas to handle highly computational use cases involving the processing of multidimensional data. It automates the feature extraction process, with very little human participation.A neural network is essentially a collection of neurons and the connections that connect them. A neuron is a function with many inputs and just one output. Its job is to accept all of the numbers from its input, apply a function to them, and then deliver the result to the output.

To grasp neural networks, we must split them and understand a perceptron, which is a neural network's most fundamental building block. To categorize linear data, a single-layer neural network is utilized. It comprises four vital parts:

- Inputs
- Weights and Bias
- Summation Function
- Activation or transformation Function

The following is the underlying logic of a Perceptron: The inputs (x) from the input layer are multiplied by the weights w applied to them. The weighted sum is formed by adding the multiplied values. The weighted total of the inputs and their weights is then applied to the appropriate Activation Function. The activation function converts the input to the desired output.

When an input variable is sent into the network, the weight of that input is allocated at random. The weight of each input data point reflects its significance in predicting the outcome.The bias parameter, on the other hand, allows you to fine-tune the activation function curve to produce a precise output.

Once the weights are allocated to the inputs, the product of the input and weight are calculated. The Weighted Sum is obtained by adding all of these products. The summing function accomplishes this.

The activation functions' principal goal is to map the weighted sum to the output. Transformation functions include activation functions such as tanh, ReLU, sigmoid, and others.

Think of a situation where you need to create a CNN that can divide images into two categories:

- Class A: Contains dog pictures
- Class B: Images of creatures other than dogs

So how do you build a neural network that can distinguish between dogs and other animals?

The first step in any process is to prepare the input for processing by processing and converting it. According to the image's dimension in our scenario, each dog image will be divided into pixels.

For instance, if the image has a resolution of 60 by 40 pixels, there will be 2400 pixels altogether. The input layer of the neural network receives these pixels' representations as matrices.

The perceptron in a CNN works similarly to the neurons in our brains in that they absorb input and process it by sending it from the input layer to the hidden layer, then to the output layer.

Each input is given a starting random weight transmitted from the input layer to the hidden layer. Following this, the inputs are multiplied by the relevant weights, and the sum is supplied as input to the following hidden layer.

Each perceptron in this situation has a bias value assigned to it, corresponding to the weighting of each input. Additionally, each perceptron undergoes an activation or transformation function that decides whether or not it will be activated.

Data transmission to the following layer takes place via an active perceptron. The data is transmitted via the neural network until the perceptron gets to the output layer (forward propagation).

The output layer determines whether the data belongs to class A or class B by deriving a probability.

If you liked the article, please help share and spread the word. Also, don't forget to connect with me on Twitter.

]]>