Why code_change wouldnât work on my GenServer
I had a GenServer that I wanted to change the state of during a hot upgrade release, so I dutifully reached for code_change/3
as per the documentation, but no matter how hard I tried, I couldnât get it to work.
I read and re-read all the documentation I could find on releases and hot upgrades and tried and tried again but my callback was never called.
I quite like Dave Thomasâ method of splitting the API from the server implementation so my code looked something like this:
defmodule MyStore do
def child_spec(opts) do
%{
id: MyStore.Server,
start: {MyStore, :start_link, [opts]},
type: :worker,
restart: :permanent,
shutdown: 500
}
end
def start_link(args \\ nil, opts \\ []) do
GenServer.start_link(MyStore.Server, args, opts)
end
def put(pid, key, value) do
GenServer.call(pid, {:put, key, value})
end
def get(pid, key) do
GenServer.call(pid, {:get, key})
end
defmodule Server do
use GenServer
require Logger
@impl true
def init(_opts) do
{:ok, []}
end
@impl true
def handle_call({:put, key, value}, _from, server_state) do
server_state = [{key, value} | server_state]
{:reply, :ok, server_state}
end
def handle_call({:get, key}, _from, server_state) do
{:reply, Keyword.get(server_state, key), server_state}
end
@vsn "1"
@impl true
def code_change(from_vsn, server_state, _extra) do
Logger.info("code_change from: #{inspect(from_vsn)}")
{:ok, server_state}
end
end
end
A very simple and contrived example of a store running on a GenServer with the obvious flaw that itâs implemented as a keyword list instead of the more obvious map. So the idea is to change the state via a hot upgrade.
Adding the following code_change/3
code before the original implementation should do the trickâalong with updating the server API to use the map.
defmodule Server do
use GenServer
require Logger
@impl true
def init(_opts) do
{:ok, %{}}
end
@impl true
def handle_call({:put, key, value}, _from, server_state) do
server_state = Map.put(server_state, key, value)
{:reply, :ok, server_state}
end
def handle_call({:get, key}, _from, server_state) do
{:reply, Map.get(server_state, key), server_state}
end
@vsn "2"
@impl true
# Ignoring downgrading for this example
def code_change("1", server_state, _extra) do
Logger.info("code_change from: #{inspect(server_state)}")
{:ok, Map.new(server_state)}
end
def code_change(from_vsn, server_state, _extra) do
Logger.info("code_change from: #{inspect(from_vsn)}")
{:ok, server_state}
end
end
All good. So have you found out whatâs wrong yet? Neither had I.
So far as I can tell, there is nothing wrong with my code. The problem isnât even visible here, it becomes apparent when you look at the supervisor and how Erlang finds the processes itâs going to run code_change/3
against.
During an application upgrade, the Release handler works through the supervision tree and pauses processes that need updating. It then runs the code_change/3
function on the module for each process and then unpauses the processes and finalises the release.
The appup file for the example above would look something like this:
{"2",
[{"1", [{update, 'Elixir.MyStore.Server', {advanced, []}}]}],
[{"1", [{update, 'Elixir.MyStore.Server', {advanced, []}}]}]
}.
That looks fine. We want the upgrade to run MyStore.Server.code_change/3
.
When the map is started under a dynamic supervisor, the response from which_children/1
is
[{:undefined, #PID<0.161.0>, :worker, [MyStore]}]
This is the same result that Erlang gets when it retrieves all supervised processes in get_supervised_procs/0
which is ââŚthe magic function. It finds all process in the system and which modules they execute as a call_back or process module.â
{:undefined, #PID<0.161.0>, :worker, [MyStore]}
is included in the results of :release_handler_1.get_supervised_procs()
(which I was super happy to find was an exported functionâthank you Erlang) and there we have the problemâ==Erlang thinks that MyStore
is the module that is being executed as the call_back or process module, not MyStore.Server
==
Because MyStore
is not listed as changing in the appup file, no code_change/3
is called on it, and because MyStore.Server
isnât listed as a module of a running process, code_change/3
isnât called on that module either and so the process is left, state unchanged, and the next call to the process will have the incorrect state and the process will crash đŁ.
After a lot of code spelunking I have identified the problem and the solution is quite a simple change: move start_link/3
into MyStore.Server
and update the child_spec accordingly.
defmodule MyStore do
def child_spec(opts) do
%{
id: MyStore.Server,
start: {MyStore.Server, :start_link, [opts]},
type: :worker,
restart: :permanent,
shutdown: 500
}
end
#...
defmodule Server do
use GenServer
require Logger
def start_link(args \\ nil, opts \\ []) do
GenServer.start_link(Server, args, opts)
end
#...
end
end
Now the output of :release_handler_1.get_supervised_procs()
looks like this:
[#...
{:undefined, #PID<0.161.0>, :worker, [MyStore.Server]}]
and code_change/3
is correctly called đ.
I always appreciate gaining a deeper understanding of how the underlying toolset of a system works and I hope that when you are searching for âwhy code_change isnât called on my GenServerâ youâll get this helpful result ;-)