19 posts tagged with "c#"

Multi armed bandit exercise 2.5 with C#

May 1, 2025 · 6 min read

Recently I tried to code the 10 armed testbed example from chapter 2 of Sutton and Barto Reinforcement Learning: an introduction book.

The chapter continues introducing new theory elements and strategies to improve the approach shown in the 10 armed example. In particular, one of the points is about non-stationary problems.

The 10 armed testbed was a stationary problem, the probability distributions of the different actions don't change over time. If you remember the sample, at the beginning of the round we computed 10 random values, those values are then used to be the mean of a normal distribution from which we will pick the rewards at each step. The constant part is that this normal distributions don't change from a step to another, they stay the same for the whole round execution.

The focus of the exercise is to understand how the estimated reward computation impacts the performance of the $\epsilon$ -greedy strategy. In the 10 armed testbed, the estimate reward was computed averaging the rewards obtained from each action when selected. Note that this approach consider each reward with the same relative value, however in a non-stationary problem, where probability distributions change over time we would like to give more weight or importance to more recent rewards because they represent more realistically the current distribution the reward is generated from.

The text of the exercise is

Design and conduct an experiment to demonstrate the difficulties that sample-average methods have for nonstationary problems. Use a modified version of the 10-armed testbed in which all the $q_{*}(a)$ start out equal and then take independent random walks (say by adding a normally distributed increment with mean 0 and standard deviation 0.01 to all the $q_{*}(a)$ on each step). Prepare plots like Figure 2.2 for an action-value method using sample averages, incrementally computed, and another action-value method using a constant step-size parameter, $\alpha$ = 0.1. Use $\epsilon = 0.1$ and longer runs, say of 10,000 steps.

Figure 2.2 refers to the average reward graph and the best arm selection rate graph, the same graphs I produced in the previous post. The $\epsilon=0.1$ refers to the $\epsilon$ -greedy strategy to be used, both in the case of sample averages and in the constant step-size parameter.

Ten armed testbed for the Bandit problem with C#

April 27, 2025 · 10 min read

I'm continuing my attempt to reproduce examples from Reinforcement Learning: An Introduction book using C#.

In a previous post I reproduced the tic-tac-toe example with some improvements and clarification with respect to the original text. I think it's worth taking a look at it.

Today I'm reproducing the ten armed testbed for the Bandit problem, in particular I want to reproduce the two graphs showing the average reward improvements and the selection rate of the best arm.

The problem, as stated in the book is the following:

You are faced repeatedly with a choice among k different options, or actions. After each choice you receive a numerical reward chosen from a stationary probability distribution that depends on the action you selected. Your objective is to maximize the expected total reward over some time period, for example, over 1000 action selections, or time steps.

Asynchronous request reply pattern without polling

April 4, 2025 · 6 min read

The asynchronous request reply pattern is useful in cases where an application has a client application running in a browser where the user starts an operation and is waiting for an immediate feedback and possibly a success notification. An issue arises only if the requested operation is long running, even a few seconds is unacceptable to return a feedback to a waiting user. Microsoft has a nice documentation about this pattern which essentially propose to have 3 endpoints:

a POST endpoint to start the long running operation: POST /operations
a GET endpoint to check for the status of the operation GET /operations/{id}/status
a GET endpoint pointing to the created resource GET /operations/{id}

The actual path for the endpoints might vary and aren't explicitely provided in the Microsoft docs, I'm proposing those for the sake of the post, based on your requirements and the semantic of your API you might want have different ones. The interaction flow between the client and the backend is this one:

For example the POST /operations can return something like

{
  "id": 1,
  "links": [
    {
      "rel":"status_check",
      "href":"https://api.contoso.com/operations/1/status", 
      "action":"GET" 
    },
  ]
}

This flow allows the backend to perform the asynchronous/long running operation and any client can poll the status endpoint and notify the user when the operation is completed. The big advantage of this setup is that it is easy to implement, for example websockets could be an option but they are usually harder to use and you will probably need a third party library to use them. A downside of the polling approach is that a client can execute a lot of retries in a short amount of time, Microsoft suggests to provide a Retry-After header that the client should honor and wait for the indicated time, if you own the client this is an easy requirement to satisfy. While this pattern has a clear use case with browser based clients, for a machine to machine integration I would prefer implementing webhooks bot as a producer and as a consumer. However a company with a low tech maturity / capacity can implementing the polling approach very easily.

In my opinion there is another approach to this pattern, for a browser client, that avoids having the polling on the endpoint and doesn't require websockets. If you have a dedicated BFF for a client app this might be the solution for you.

Tic-tac-toe reinforcement learning with C#

March 16, 2025 · 9 min read

A couple of weeks ago I wanted to take a look at reinforcement learning and possibly work on a very simple sample in C#. In search for a book to learn some basics I found Reinforcement Learning: An Introduction suggested in multiple places. The book is available for free as a PDF on the linked website, so I thought it would be a good starting point.

The book offers in the very first chapter, a tic-tac-toe example where an algorithm is described, albeit with not too much details. I decided to try to implement a C# version of that. Plenty of implementations are available online and the authors offer a lisp version on their website, so there is a wide range of option to explore and evaluate.

Kaleidoscope tutorial with C# using LLVMSharp and llvm 18

February 20, 2025 · 4 min read

A few years ago I worked on reproducing the Kaleidoscope tutorial using LLVMSharp, a library that exposes C# bindings for LLVM. I updated multiple times the project to support the latest versions of llvm and the LLVMSharp bindings. In the last week I was busy moving to llvm 18 and I encountered a few difficulties.

First, the newest version of LLVMSharp available on nuget is 16.0.0, however the corresponding repo is updated to support version 18.0.0 of llvm. Luckily for me, there is a nighly nuget feed where a release candidate of LLVMSharp supporting version 18.0.0 is available. Please note that at the time of writing the latest llvm version is 19, LLVMSharp is updated inconsistently and it has been since the beginning for what I could see.

TreeSitter C# bindings - new languages

November 18, 2023 · 3 min read

I updated my TreeSitter.Bindings packages to support additional languages. I wanted to check how solid the bindings were and if I could implement a more complex sample with the respect to the json one I started with.

The tree-sitter documentation provides a multilanguage sample which involves three languages:

embedded template
ruby
html

Implementing APIs with the visitor pattern

June 16, 2023 · 4 min read

I want to leverage my visitor pattern source generator to implement a simple minimal api.

I aim to:

Have a request and a request handler for my endpoint. I will not use mediatr or any similar library and I will not use any real storage, only some in memory data structure to showcase the visitor pattern approach.
The request handler Handle method returns an interface, every subtype represents a different type of result, a success, and one type for each error (provided) emitted by the handled.
For each subtype we want to be able to return a possibly different http response.

TreeSitter C# bindings

April 9, 2023 · 6 min read

In a previous experiment I made I used the LLVMSharp library and I was quite curious on how the bindings are made. In the readme is it stated they are generated using che ClangSharp library, this one auto-generate hitself from the headers of Clang C header.

This functionality is exposed through a dotnet tool: ClangSharpPInvokeGenerator. So I wanted to try and hack my way into parsing the tree-sitter headers and use the generated code to run a very small sample using C#, in particular I was aiming for the one that's in the getting started section of the docs

Visitor pattern source generator

March 19, 2023 · 5 min read

I am a big fan of the visitor pattern, I think it is a very good approach for adding behaviors to a group of classes without modifying all of them.

For example if we are building a compiler we may have an abstract syntax tree that represents the code that the compiler is compiling. Two different visitors can be, for example:

a type checker
a code emitter

Using EF6 with SQLite on dotnet core

May 19, 2022 · 2 min read

In this post we will see how to use EF6 (not EF Core) in a .NET Core application with SQLite. Why we would like to do this? Maybe we are migrating a project from the full framework to .NET Core but we are not ready to migrate also EF6 to EF Core. I will not show how to work with migrations.

First of all, we need to install two packages in our project:

System.Data.SQLite.EF6
System.Data.SQLite