
N - The total number of elements (or trials in your experiment).įor our example, here's how you would calculate these: x The total of each value in the y column squared and then added together. The total of each value in the x column squared and then added together. The sum of the products of the x n and y n that are recorded at the same time. The sum of all the values in the y column. The sum of all the values in the x column. Those values can be computed by the following equations: Therefore, it is only necessary to compute m and b to determine the best fit line. The best form for our line is slope-intercept form, which looks like y = m x + b. But, usually we can find a line (or curve) that is a good approximation to the data. When there are more than 2 points of data it is usually impossible to find a line that goes exactly through all the points. The residuals can be visualized by the vertical lines from the observed data value to the regression line. These errors are also called as residuals. The line for which the the error between the predicted values and the observed values is minimum is called the best fit line or the regression line. It is important for us to keep our numbers straight, so we have created a few variables below which we defined to the right.

Our ultimate goal will be to find the line that has the minimal error. Since we have the actual value here, we can easily find the error in prediction.

It means that we find a bar and then find the prediction error. So what we can do here is to minimize the error. We cannot plot a single straight line that passes through all the points. "A line that is drawn to pass as close as possible to all the plotted points on a scatter graph is called the line of best fit"
