Understanding Linear Regression in C#: A Hands-On Implementation

19 Mar 2024 • Wayne Thompson

Linear regression is one of the foundational algorithms in statistics and machine learning. It’s used to discover relationships between variables and to make predictions. If you’re exploring machine learning concepts or need a way to analyze trends in your C# applications, understanding linear regression is a great starting point.

What is Linear Regression?

In essence, linear regression attempts to draw the best-fitting straight line through a set of data points. This line represents the underlying trend in the data. Imagine you have data about house prices based on their square footage – linear regression can help you model the relationship between these variables.

Implementing Linear Regression in C#

The basic implementation of an algorithm for linear regression in C# involves calculating the slope and y-intercept of the best-fit line. If you are interested in the math behind it read Least squares regression on maths is fun.

Here’s a simple example:

using System;

namespace LinearRegression
{
    class LinearRegression
    {
        /// <summary>
        /// Calculates the slope of the best-fit line
        /// </summary>
        /// <param name="x">Array of x values</param>
        /// <param name="y">Array of y values</param>
        /// <returns>The slope of the regression line</returns>
        public double CalculateSlope(double[] x, double[] y)
        {
            double xMean = x.Average();
            double yMean = y.Average();

            double numerator = 0;
            double denominator = 0;

            for (int i = 0; i < x.Length; i++)
            {
                numerator += (x[i] - xMean) * (y[i] - yMean);
                denominator += (x[i] - xMean) * (x[i] - xMean);
            }

            return numerator / denominator;
        }

        /// <summary>
        /// Calculates the y-intercept of the best-fit line
        /// </summary>
        /// <param name="x">Array of x values</param>
        /// <param name="y">Array of y values</param>
        /// <param name="slope">The slope of the regression line</param>
        /// <returns>The y-intercept of the regression line</returns>
        public double CalculateIntercept(double[] x, double[] y, double slope)
        {
            double xMean = x.Average();
            double yMean = y.Average();

            return yMean - slope * xMean;
        }

        /// <summary>
        /// Predicts a y value based on a given x value
        /// </summary>
        /// <param name="x">The input x value</param>
        /// <param name="slope">The slope of the regression line</param>
        /// <param name="intercept">The y-intercept of the regression line</param>
        /// <returns>The predicted y value</returns>
        public double Predict(double x, double slope, double intercept)
        {
            return slope * x + intercept;
        }
    }
}

The LinearRegression Class: This class encapsulates our algorithm:
CalculateSlope: Finds the slope of our best-fit line.
CalculateIntercept: Finds where the line crosses the y-axis.
Predict: Uses the line’s equation to predict y-values for new x-values.

Using Our Algorithm

using System;
using LinearRegression; 

namespace Example
{
    class Program
    {
        static void Main(string[] args)
        {
            double[] x = { 1, 2, 3, 4, 5 };
            double[] y = { 2, 5, 7, 9, 12 };

            var regression = new LinearRegression();

            double slope = regression.CalculateSlope(x, y);
            double intercept = regression.CalculateIntercept(x, y, slope);

            Console.WriteLine("Equation: y = {0}x + {1}", slope.ToString("F2"), intercept.ToString("F2"));

            double prediction = regression.Predict(6, slope, intercept);
            Console.WriteLine("Prediction for x = 6: y = {0}", prediction.ToString("F2"));
        }
    }
}

In this example, we:

Define sample data (x and y).
Create a LinearRegression object.
Calculate the slope and intercept of the regression line.
Print the line’s equation.
Make a prediction for a new input value.

Checking against R

More to come

Git repo at https://github.com/waynethompson/MlAlgorithms

data
machine-learning
csharp
statistics