ML.NET & Components

Anıl Eladağ
6 min readNov 15, 2021

--

Machine Learning for .Net applications & developers

In this article, I will talk about what ML.NET is and what the library items that we will come across while using it do. With ML.NET, we can add some machine learning abilities to our .Net applications. We can make predictions or decisions using built-in algorithms.

ML.NET is a machine learning model. So what’s the difference between “model and “algorithm”?

An algorithm; runs on data (learns from data or fits on a dataset) and creates a machine learning “model”.

  • Can be described as math formulas or pseudo code.
  • The effectiveness of the algorithm must be analyzable and identifiable.

Some of machine learning algorithm examples:

  • k-Means
  • Decision Tree
  • Linear Regression

Academics or you can find a new machine learning algorithm (it can be entirely new or variation of older algorithms) and people can use algorithms on their project.

A model; is the output of the algorithm or algorithms which runs on data. It’ s represents the “learned” from data. But don’t think it like the result, think of it as the program that will give you the result.

After the algorithm is run, the model is saved and; holds information about rules, numbers, or any other algorithm-specific data structures.

Examples may be more descriptive:

  • Linear regression algorithm results, that is, the vector of coefficients with certain values is as a model.
  • If-Else statements with specific values of a decision tree algorithm results

So ML.NET is a machine learning model which specifies the steps needed to transform input data into prediction. We can create custom models by specifying algorithms or can import & use pre-trained TensorFlow or ONNX models.

ML.NET includes lots of pre-loaded algorithms and a gui for building a model. With those algorithms, you can make predictions without effort, as follows;

  • Classification/Categorization
  • Regression/Predict Continuous Values
  • Anomaly Detection
  • Recommendations
  • Time Series/Sequential Data
  • Image Classification

Code Flow & Components

For creating a prediction engine, first of all you need a Input and Output class and a MLContext object.

In ML.NET, every operation is runs over a context derived from the MLContext class.

https://docs.microsoft.com/en-us/dotnet/machine-learning/how-does-mldotnet-work#code-workflow
ML.NET Code Work Flow

So those below are general steps to run a prediction;

  1. Collect & load data
  2. Transform Data
  3. Apply a machine learning algorithm
  4. Train the model on the pipeline
  5. Evaluate the model and improve the efficiency
  6. Save the model
  7. Load the model
  8. Make predictions using data

On the components side I’ll try cover some most used ones, but ML.NET contains lots of catalogs. “A catalog is a factory for data loading and saving, transforms, trainers, and model operation components”.

Each catalog below contains methods for creating different types of components:

So no we can inspect those components. Assume that we have a context instance like below, cause every component needs a context.

MLContext mlContext = new MLContext();

Loading Data

ML.NET offers multiple options for data loading.

Loading From Text Files

You can load a file or multiple files using “LoadFromTextFile” method and when you need to load files from multiple directories you can use “CreateTextLoader” method.

As you can see I define a input class named ModelInput and use it in generic methods but you can give a schema as a parameter too.

Loading From In-memory Objects

Assuming that you have some hot - warm or cold storages and you collected data from them or you get some JSON/XML data and deserialize them. You can load them using “LoadFromEnumerable” method.

Loading From Relational Database

ML.NET offers relational db supports with “System.Data.SqlClient” nuget package. You need to specify sql command for loading from db.

All those methods returns a object which derivere from “IDataView interface. With this; ML.NET provides a flexible and efficient way to describe numeric or tabular data.

Transforming Data

ML.NET offers a lot of transform functionality for data, but as I mention before I’ll try to cover most used ones. For detailed information please read microsoft’s documentation. Some of the methods are ONNX exportable, some are not. So read documentation carefully before using them.

Filter Data

To filter data based on the value of a column, you can use “FilterRowsByColumn” method. This method is accessible from DataOperationsCatalog.

mlContext.Data.FilterRowsByColumn(data, "Price", lowerBound: 200000, upperBound: 1000000);

Replace Missing Values

Some of your numeric data may be empty, in that case you can use “ReplaceMissingValues” from TransformsCatalog. This method has two overload methods, one does a inplace replacement, other one creates new column with replaced values.

Notice that; there is a ReplacementMode selection which supports various replacement modes.

Min-Max Normalization

Some times yo need to normalize your data and this case you can use various normalization algorithms from NormalizationCatalog. “NormalizeMinMax” method is one of them.

Assume that our input data is like below:

When we want to normalize this data we can use those below:

With this; original temperature values [400,200] are converted to [1, 0.5] using MinMax normalization formula.

Type Conversion

If you need to convert your column’s type, you can use “ConvertType method. It works on numeric, boolean, text and DateTime data types. Converts the type of the data to the specified type in the outputKind parameter.

In this example at the top, it’s converts all columns to Double type.

Concatenate Inputs

This method concatenates one or more input columns into a new output column. Concatenation is necessary because trainers take feature vectors as inputs. And if you have different type of columns they must converted to same type using ConvertType method before using Concatenate method.

As you can see first we convert our TimeStamp column and then concatenate those columns to single column as vector and the result should be like below.

One Hot Encoding

Not all algorithms operate on categorical or label data, they can require numerical inputs. If you have categorical data and your pipeline needs a numeric input you can use “OneHotEncoding” method. It takes finite set of values and converts them to binary representation.

Key Value Mapping

It’s often used to map labels to integer values for training, then back to their original values. “MapValueToKey functions adds a lookup dictionary to model and with that dictionary “MapKeyToValue” function can perform the reverse transform.

In this example it takes “SensorType” column and creates a dictionary on the background and a new column named “SensorTypeCategory” using “MapKeyToValue” method. And when we need original categories it we can retrieve them using “MapValueToKey” method.

At some cases you need to limit this mapping, you want to use your predefined categories etc. You can create a list as lookups and give it as a parameter to “MapValueToKey” method as shown below. In this case; if your data has values which is not contained this lookup dictionary, they will get mapped as the missing value 0.

Train & Evaluate Model

Up to this point we have seen how to load and transform data. When we load data we create a IDataView instance and you create a estimator pipeline using transform methods. So now we can actually split our data as train and test and can select our machine learning algorithm. After that we can fit our data to estimator and get transformed data.

Now we create our pipeline so we can divide & transform our data like below. We use this to debug our process; we can convert transformed data to enumerable and print it to console etc., so we can see what we actually doing.

When we doing this train season, we need to get metrics to evaluate our model. So we need to use the “Evaluate” method of the catalog of the chosen machine learning algorithm. You can get detailed information about Evaluation metrics here.

Using The Model

We trained our model, so now we can save it and we can make predictions.

After this we need to load our model. At this step we can create a prediction engine with model too.

Now finally, we can make predictions.

So that's it, if you look the example it’s actually a basic sentiment prediction model and we built it easily. Before I conclude, I would like to mention a small but important feature; Visual Studio’s Machine Learning plugin which adds a gui which you create a model with simple steps and build-in AutoML support. With AutoML, it selects best algorithm for you.

If you want to use AutoML in your api (it’s really takes selecting algorithm and hyperparameter configuration burden from you), you can read Microsoft’s documentation.

--

--