Skip to main content

Multi armed bandit exercise 2.5 with C#

· 5 min read

Recently I tried to code the 10 armed testbed example from chapter 2 of Sutton and Barto Reinforcement Learning: an introduction book.

The chapter continues introducing new theory elements and strategies to improve the approach shown in the 10 armed example. In particular, one of the points is about non-stationary problems.

The 10 armed testbed was a stationary problem, the probability distributions of the different actions don't change over time. If you remember the sample, at the beginning of the round we computed 10 random values, those values are then used to be the mean of a normal distribution from which we will pick the rewards at each step. The constant part is that this normal distributions don't change from a step to another, they stay the same for the whole round execution.

The focus of the exercise is to understand how the estimated reward computation impacts the performance of the ϵ\epsilon-greedy strategy. In the 10 armed testbed, the estimate reward was computed averaging the rewards obtained from each action when selected. Note that this approach consider each reward with the same relative value, however in a non-stationary problem, where probability distributions change over time we would like to give more weight or importance to more recent rewards because they represent more realistically the current distribution the reward is generated from.

The text of the exercise is

Design and conduct an experiment to demonstrate the difficulties that sample-average methods have for nonstationary problems. Use a modified version of the 10-armed testbed in which all the q(a)q_{*}(a) start out equal and then take independent random walks (say by adding a normally distributed increment with mean 0 and standard deviation 0.01 to all the q(a)q_{*}(a) on each step). Prepare plots like Figure 2.2 for an action-value method using sample averages, incrementally computed, and another action-value method using a constant step-size parameter, α\alpha = 0.1. Use ϵ=0.1\epsilon = 0.1 and longer runs, say of 10,000 steps.

Figure 2.2 refers to the average reward graph and the best arm selection rate graph, the same graphs I produced in the previous post. The ϵ=0.1\epsilon=0.1 refers to the ϵ\epsilon-greedy strategy to be used, both in the case of sample averages and in the constant step-size parameter.

Ten armed testbed for the Bandit problem with C#

· 9 min read

I'm continuing my attempt to reproduce examples from Reinforcement Learning: An Introduction book using C#.

In a previous post I reproduced the tic-tac-toe example with some improvements and clarification with respect to the original text. I think it's worth taking a look at it.

Today I'm reproducing the ten armed testbed for the Bandit problem, in particular I want to reproduce the two graphs showing the average reward improvements and the selection rate of the best arm.

The problem, as stated in the book is the following:

You are faced repeatedly with a choice among k different options, or actions. After each choice you receive a numerical reward chosen from a stationary probability distribution that depends on the action you selected. Your objective is to maximize the expected total reward over some time period, for example, over 1000 action selections, or time steps.

CSV parsing with typescript

· 2 min read

A few years ago I needed to troubleshoot some imports based on csv files and I had one big issue, depending on the language the user was using on Windows, Excel would export csv files with different separators. Being located in Italy, this means that the separator was ;. The same was true when trying to read csv with Excel, the separator is chosed based on the language of the OS.

I prefer to set up everything in English because I think it is easier to troubleshoot issues when the error is not localized to Italian.

So I couldn't read csv just opening them on Excel because of the separator, Excel also formats field depending on the content. It might hide leading zeros if it thinks a column is filled with numbers, dates are problematic and so on. I wanted to be able to see the actual content of the csv in a tabular format without fighting with Excel for the separator or for automatic formatting of columns.

I decided to go ahead and build a simple parser, hostable on a website that would read a csv and just print the values without any custom formatting and with the possibility to choose the separator. Of course today I would go with a vs code extension, I'm sure there are plenty doing something similar better than I did in a few hours but at the time this small parser was enough to support my need.

CSV is not a well defined or well agreed format, however there is a IEFT RFC that documents the format. Following that I implemented a parser in typescript that read a csv and creates an html table with the content manipulating the DOM directly.

The repo is this one: https://github.com/davidelettieri/yacv

Online demo: https://davidelettieri.github.io/yacv/

Just today I revisited the code, adding some basic tests and updating typescript version.

Asynchronous request reply pattern without polling

· 6 min read

The asynchronous request reply pattern is useful in cases where an application has a client application running in a browser where the user starts an operation and is waiting for an immediate feedback and possibly a success notification. An issue arises only if the requested operation is long running, even a few seconds is unacceptable to return a feedback to a waiting user. Microsoft has a nice documentation about this pattern which essentially propose to have 3 endpoints:

  • a POST endpoint to start the long running operation: POST /operations
  • a GET endpoint to check for the status of the operation GET /operations/{id}/status
  • a GET endpoint pointing to the created resource GET /operations/{id}

The actual path for the endpoints might vary and aren't explicitely provided in the Microsoft docs, I'm proposing those for the sake of the post, based on your requirements and the semantic of your API you might want have different ones. The interaction flow between the client and the backend is this one:

For example the POST /operations can return something like

{
"id": 1,
"links": [
{
"rel":"status_check",
"href":"https://api.contoso.com/operations/1/status",
"action":"GET"
},
]
}

This flow allows the backend to perform the asynchronous/long running operation and any client can poll the status endpoint and notify the user when the operation is completed. The big advantage of this setup is that it is easy to implement, for example websockets could be an option but they are usually harder to use and you will probably need a third party library to use them. A downside of the polling approach is that a client can execute a lot of retries in a short amount of time, Microsoft suggests to provide a Retry-After header that the client should honor and wait for the indicated time, if you own the client this is an easy requirement to satisfy. While this pattern has a clear use case with browser based clients, for a machine to machine integration I would prefer implementing webhooks bot as a producer and as a consumer. However a company with a low tech maturity / capacity can implementing the polling approach very easily.

In my opinion there is another approach to this pattern, for a browser client, that avoids having the polling on the endpoint and doesn't require websockets. If you have a dedicated BFF for a client app this might be the solution for you.

Tic-tac-toe reinforcement learning with C#

· 9 min read

A couple of weeks ago I wanted to take a look at reinforcement learning and possibly work on a very simple sample in C#. In search for a book to learn some basics I found Reinforcement Learning: An Introduction suggested in multiple places. The book is available for free as a PDF on the linked website, so I thought it would be a good starting point.

The book offers in the very first chapter, a tic-tac-toe example where an algorithm is described, albeit with not too much details. I decided to try to implement a C# version of that. Plenty of implementations are available online and the authors offer a lisp version on their website, so there is a wide range of option to explore and evaluate.

Kaleidoscope tutorial with C# using LLVMSharp and llvm 18

· 3 min read

A few years ago I worked on reproducing the Kaleidoscope tutorial using LLVMSharp, a library that exposes C# bindings for LLVM. I updated multiple times the project to support the latest versions of llvm and the LLVMSharp bindings. In the last week I was busy moving to llvm 18 and I encountered a few difficulties.

First, the newest version of LLVMSharp available on nuget is 16.0.0, however the corresponding repo is updated to support version 18.0.0 of llvm. Luckily for me, there is a nighly nuget feed where a release candidate of LLVMSharp supporting version 18.0.0 is available. Please note that at the time of writing the latest llvm version is 19, LLVMSharp is updated inconsistently and it has been since the beginning for what I could see.

Trying to implement Lox as Racket language module

· 5 min read

Given my previous attempt of having a basic scaffolding for creating a new racket language, I thought it was worth a shot at implementing Lox from the Crafting interpreters book as a Racket language module.

Racket has a huge ecosystem and extensive documentation, based on my own research and preferences I decided to use a lex/yacc source generator available in Racket itself and documented here https://docs.racket-lang.org/lex-yacc-example/index.html. There are plenty of resources on how to build languages with Racket, both free and paid.

I used only free resources and my feeling is that the amount of information required to approach this project is A LOT, I needed to get familiar with lisp syntax, libraries for parsing, macros, Racket language modules, a different set of tools for coding/debugging.

Another feeling I have is that the documentation, even if it's extensive, it's not providing all the information required to build new languages for Racket or at very least this information is hard to find. I didn't find it, despite trying multiple times.

Fedora, switch audio channels with pipewire and wireplumber 0.5

· One min read

This is an update on this post, on wireplumber v0.5 lua configuration files are not supported anymore. To switch channels on v0.5 you have to create the following file

.config/wireplumber/wireplumber.conf.d/51-change-channels.conf
monitor.alsa.rules = [
{
matches = [
{
node.name = "<name of the node>"
}
]
actions = {
update-props = {
audio.position = "FR,FL"
}
}
}
]

Check the previous post to understand how to retrieve the name of the node, if needed.

Use cilium service mesh on AKS

· 6 min read
warning

On 2025-08-02 I updated the repo corresponding to this post to use updated versions of Kubernetes, Cilium and Gateway API. The post is not updated accordingly. I did follow again the procedure explained here to confirm that everything is still working as expected. Most notable change is that we don't need experimental channel of Gateway API except for one resource, as described in Cilium v1.17.0 docs.

Azure BYOCNI configuration allows the use of cilium as CNI, in addition to that it is possible to configure cilium service mesh.

Cilium service mesh has several functionalities such as ingress controller, gateway api, mtls etc... my objective here is to use k8s gateway api. In order to enable cilium service mesh we have to replace kube-proxy with cilium itself, to do so we need to enable the kube proxy configuration feature on aks, which is currently in preview.

Cilium supports gateway api v1 from version 1.15, which is the one that I'm installing today. In particular I will install gateway api v1 experimental channel. This will allow to configure the underlying infrastructure (an azure load balancer) if needed.

Fedora, switch audio channels with pipewire

· 4 min read
warning

The lua configuration files are valid only for wireplumber v<0.5. Please check here for the configuration required on v=0.5. This post remains valid for what concerns retrieving the node name required in the config file.

I bought a pair Creative Pebble V3 and given my desk setup and the cables of the 2 speakers, I needed to switch left and right audio channels in order to setup correctly the speakers.

Now, I'm running Fedora Workstation and I never had to troubleshoot, manage, or change any audio settings besides adjusting volume when needed. I found out this task is not as easy as it seems, probably for a mixture of lack of documentation and lack of skills on my side.