Why code_change wouldnāt work on my GenServer
I had a GenServer that I wanted to change the state of during a hot upgrade release, so I dutifully reached for code_change/3
as per the documentation, but no matter how hard I tried, I couldnāt get it to work.
I read and re-read all the documentation I could find on releases and hot upgrades and tried and tried again but my callback was never called.
I quite like Dave Thomasā method of splitting the API from the server implementation so my code looked something like this:
defmodule MyStore do
def child_spec(opts) do
%{
id: MyStore.Server,
start: {MyStore, :start_link, [opts]},
type: :worker,
restart: :permanent,
shutdown: 500
}
end
def start_link(args \\ nil, opts \\ []) do
GenServer.start_link(MyStore.Server, args, opts)
end
def put(pid, key, value) do
GenServer.call(pid, {:put, key, value})
end
def get(pid, key) do
GenServer.call(pid, {:get, key})
end
defmodule Server do
use GenServer
require Logger
@impl true
def init(_opts) do
{:ok, []}
end
@impl true
def handle_call({:put, key, value}, _from, server_state) do
server_state = [{key, value} | server_state]
{:reply, :ok, server_state}
end
def handle_call({:get, key}, _from, server_state) do
{:reply, Keyword.get(server_state, key), server_state}
end
@vsn "1"
@impl true
def code_change(from_vsn, server_state, _extra) do
Logger.info("code_change from: #{inspect(from_vsn)}")
{:ok, server_state}
end
end
end
A very simple and contrived example of a store running on a GenServer with the obvious flaw that itās implemented as a keyword list instead of the more obvious map. So the idea is to change the state via a hot upgrade.
Adding the following code_change/3
code before the original implementation should do the trickāalong with updating the server API to use the map.
defmodule Server do
use GenServer
require Logger
@impl true
def init(_opts) do
{:ok, %{}}
end
@impl true
def handle_call({:put, key, value}, _from, server_state) do
server_state = Map.put(server_state, key, value)
{:reply, :ok, server_state}
end
def handle_call({:get, key}, _from, server_state) do
{:reply, Map.get(server_state, key), server_state}
end
@vsn "2"
@impl true
# Ignoring downgrading for this example
def code_change("1", server_state, _extra) do
Logger.info("code_change from: #{inspect(server_state)}")
{:ok, Map.new(server_state)}
end
def code_change(from_vsn, server_state, _extra) do
Logger.info("code_change from: #{inspect(from_vsn)}")
{:ok, server_state}
end
end
All good. So have you found out whatās wrong yet? Neither had I.
So far as I can tell, there is nothing wrong with my code. The problem isnāt even visible here, it becomes apparent when you look at the supervisor and how Erlang finds the processes itās going to run code_change/3
against.
During an application upgrade, the Release handler works through the supervision tree and pauses processes that need updating. It then runs the code_change/3
function on the module for each process and then unpauses the processes and finalises the release.
The appup file for the example above would look something like this:
{"2",
[{"1", [{update, 'Elixir.MyStore.Server', {advanced, []}}]}],
[{"1", [{update, 'Elixir.MyStore.Server', {advanced, []}}]}]
}.
That looks fine. We want the upgrade to run MyStore.Server.code_change/3
.
When the map is started under a dynamic supervisor, the response from which_children/1
is
[{:undefined, #PID<0.161.0>, :worker, [MyStore]}]
This is the same result that Erlang gets when it retrieves all supervised processes in get_supervised_procs/0
which is āā¦the magic function. It finds all process in the system and which modules they execute as a call_back or process module.ā
{:undefined, #PID<0.161.0>, :worker, [MyStore]}
is included in the results of :release_handler_1.get_supervised_procs()
(which I was super happy to find was an exported functionāthank you Erlang) and there we have the problemā==Erlang thinks that MyStore
is the module that is being executed as the call_back or process module, not MyStore.Server
==
Because MyStore
is not listed as changing in the appup file, no code_change/3
is called on it, and because MyStore.Server
isnāt listed as a module of a running process, code_change/3
isnāt called on that module either and so the process is left, state unchanged, and the next call to the process will have the incorrect state and the process will crash š£.
After a lot of code spelunking I have identified the problem and the solution is quite a simple change: move start_link/3
into MyStore.Server
and update the child_spec accordingly.
defmodule MyStore do
def child_spec(opts) do
%{
id: MyStore.Server,
start: {MyStore.Server, :start_link, [opts]},
type: :worker,
restart: :permanent,
shutdown: 500
}
end
#...
defmodule Server do
use GenServer
require Logger
def start_link(args \\ nil, opts \\ []) do
GenServer.start_link(Server, args, opts)
end
#...
end
end
Now the output of :release_handler_1.get_supervised_procs()
looks like this:
[#...
{:undefined, #PID<0.161.0>, :worker, [MyStore.Server]}]
and code_change/3
is correctly called š.
I always appreciate gaining a deeper understanding of how the underlying toolset of a system works and I hope that when you are searching for āwhy code_change isnāt called on my GenServerā youāll get this helpful result ;-)