We will start by creating a new project to learn more about Elixir protocols:
mix new learn --module Learn
cd learn
note: I'm using Erlang 21.3 and Elixir 1.8.1 and I never coded in Elixir before :)
I searched for Elixir protocols and found the official documentation with an
example for a Size protocol
I added it to the lib/learn.ex file and added some calls on the hello function
to try it, it ended up looking like this:
defmodule Learn do
@moduledoc """
Documentation for Learn.
"""
@doc """
Hello world.
## Examples
iex> Learn.hello()
"""
def hello do
Learn.Size.size("asd")
Learn.Size.size(%{})
Learn.Size.size({1, 2, 3})
end
defprotocol Size do
@doc "Calculates the size (and not the length!) of a data structure"
def size(data)
end
defimpl Size, for: BitString do
def size(string), do: byte_size(string)
end
defimpl Size, for: Map do
def size(map), do: map_size(map)
end
defimpl Size, for: Tuple do
def size(tuple), do: tuple_size(tuple)
end
end
Compiled the project:
mix compile
Opened an Elixir shell:
iex
Wrote a little script to decompile all beam files to Erlang (warning: Elixir
flavored Erlang ahead!):
for f <- :filelib.wildcard('./_build/dev/lib/*/*/*.beam') do
result = :beam_lib.chunks(f,[:abstract_code])
{:ok,{_,[{:abstract_code,{_,ac}}]}} = result
code = :erl_prettypr.format(:erl_syntax.form_list(ac))
out_path = :string.replace(f, '.beam', '.erl')
:file.write_file(out_path, code)
end
The results:
$ tree
.
├── _build
│ └── dev
│ └── lib
│ └── learn
│ ├── consolidated
│ │ ├── Elixir.Collectable.beam
│ │ ├── Elixir.Collectable.erl
│ │ ├── Elixir.Enumerable.beam
│ │ ├── Elixir.Enumerable.erl
│ │ ├── Elixir.IEx.Info.beam
│ │ ├── Elixir.IEx.Info.erl
│ │ ├── Elixir.Inspect.beam
│ │ ├── Elixir.Inspect.erl
│ │ ├── Elixir.Learn.Size.beam
│ │ ├── Elixir.Learn.Size.erl
│ │ ├── Elixir.List.Chars.beam
│ │ ├── Elixir.List.Chars.erl
│ │ ├── Elixir.String.Chars.beam
│ │ └── Elixir.String.Chars.erl
│ └── ebin
│ ├── Elixir.Learn.beam
│ ├── Elixir.Learn.erl
│ ├── Elixir.Learn.Size.beam
│ ├── Elixir.Learn.Size.BitString.beam
│ ├── Elixir.Learn.Size.BitString.erl
│ ├── Elixir.Learn.Size.erl
│ ├── Elixir.Learn.Size.Map.beam
│ ├── Elixir.Learn.Size.Map.erl
│ ├── Elixir.Learn.Size.Tuple.beam
│ ├── Elixir.Learn.Size.Tuple.erl
│ └── learn.app
From the result it seems that it "consolidates" the protocols into the
consolidated folder and then puts the modules at ebin (with protocol
implementations named like the protocol plus the type they handle).
It's also clear that all Elixir modules are prefixed with Elixir., also that
if I declare a protocol inside a module the protocol "belongs" to the module,
in this case the "full qualified name" of the protocol is Elixir.Learn.Size.
Let's start exploring what code is generated by inspecting the main module we
wrote (I will cleanup unneeded code from the examples):
-module('Elixir.Learn').
-export([hello/0]).
hello() ->
'Elixir.Learn.Size':size(<<"asd">>),
'Elixir.Learn.Size':size(#{}),
'Elixir.Learn.Size':size({1, 2, 3}).
We can see that calling a function from a protocol implies calling the desired
function on the consolidated module for the protocol itself.
Let's now see what the Elixir.Learn.Size module does:
-module('Elixir.Learn.Size').
-export(['__protocol__'/1, impl_for/1, 'impl_for!'/1, size/1]).
'impl_for!'(__@1) ->
case impl_for(__@1) of
__@2 when __@2 =:= nil orelse __@2 =:= false ->
erlang:error('Elixir.Protocol.UndefinedError':exception([{protocol,
'Elixir.Learn.Size'},
{value,
__@1}]));
__@3 -> __@3
end.
size(__@1) -> ('impl_for!'(__@1)):size(__@1).
struct_impl_for(_) -> nil.
impl_for(#{'__struct__' := __@1})
when erlang:is_atom(__@1) ->
struct_impl_for(__@1);
impl_for(__@1) when erlang:is_tuple(__@1) ->
'Elixir.Learn.Size.Tuple';
impl_for(__@1) when erlang:is_map(__@1) ->
'Elixir.Learn.Size.Map';
impl_for(__@1) when erlang:is_bitstring(__@1) ->
'Elixir.Learn.Size.BitString';
impl_for(_) -> nil.
'__protocol__'(module) -> 'Elixir.Learn.Size';
'__protocol__'(functions) -> [{size, 1}];
'__protocol__'('consolidated?') -> true;
'__protocol__'(impls) ->
{consolidated,
['Elixir.Map', 'Elixir.BitString', 'Elixir.Tuple']}.
The exported function for the protocol (size/1) does a simple thing, it asks
the impl_for!/1 function for the module that knows how to handle
Learn.Size.size/1 for the given argument and then calls that module's
size/1 function:
size(__@1) -> ('impl_for!'(__@1)):size(__@1).
impl_for!/1 just calls impl_for/1 with the argument and handles the case
where the value doesn't have a known implementation, in that case it raises an
exception (Elixir.Protocol.UndefinedError), otherwise it just returns the
module name.
impl_for/1 starts by checking if the argument is an Elixir struct,
which underneath is just a map with a "well known" key __struct__ that
contains the type of the struct as an atom:
impl_for(#{'__struct__' := __@1})
when erlang:is_atom(__@1) ->
if it's a struct it calls struct_impl_for/1 with the struct type as argument:
struct_impl_for(__@1);
In our example, there's no struct that implements this protocol so the implementation of struct_impl_for/1 is simple:
struct_impl_for(_) -> nil.
After that it starts trying to find the implementation for non protocol types (mostly Erlang types), it tries to match using guards to check for the types, if none match, it returns nil like struct_impl_for/1:
impl_for(__@1) when erlang:is_tuple(__@1) ->
'Elixir.Learn.Size.Tuple';
impl_for(__@1) when erlang:is_map(__@1) ->
'Elixir.Learn.Size.Map';
impl_for(__@1) when erlang:is_bitstring(__@1) ->
'Elixir.Learn.Size.BitString';
impl_for(_) -> nil.
Now that we got the module that handles the protocol function for each type, let's see their implementations:
Elixir.Learn.Size.BitString:
size(_string@1) -> erlang:byte_size(_string@1).
Elixir.Learn.Size.Map:
size(_map@1) -> erlang:map_size(_map@1).
Elixir.Learn.Size.Tuple:
size(_tuple@1) -> erlang:tuple_size(_tuple@1)
Now that we got the basic call and dispatch sequence let's try adding two
structs and implement this protocol to see how it works for them:
I added two structs to the lib/learn.ex module:
defstruct name: "John", age: 27
defmodule User do
defstruct name: "John", age: 27
end
Added calls to Size.size/1 in the hello/0 function:
def hello do
Learn.Size.size("asd")
Learn.Size.size(%{})
Learn.Size.size({1, 2, 3})
Learn.Size.size(%User{age: 27, name: "John"})
Learn.Size.size(%Learn{age: 27, name: "John"})
end
And implemented the protocol Size for both structs:
defimpl Size, for: Learn do
def size(learn), do: learn.age + 1
end
defimpl Size, for: User do
def size(user), do: user.age + 2
end
Compiled with mix compile and inside iex pasted the script again, let's
see what changed.
The hello world function looks like this:
hello() ->
'Elixir.Learn.Size':size(<<"asd">>),
'Elixir.Learn.Size':size(#{}),
'Elixir.Learn.Size':size({1, 2, 3}),
'Elixir.Learn.Size':size(#{age => 27,
name => <<"John">>,
'__struct__' => 'Elixir.Learn.User'}),
'Elixir.Learn.Size':size(#{age => 27,
name => <<"John">>,
'__struct__' => 'Elixir.Learn'}).
Which confirms that Elixir structs are maps with a special __struct__ key.
Checking the generated files, there's a new file for our User struct
(Elixir.Learn.User.erl), the other struct is defined inside
Elixir.Learn.erl.
The module code relevant for the struct doesn't have anything specific to the
protocols it implements:
-module('Elixir.Learn.User').
-export([_struct__'/0, '__struct__'/1]).
'__struct__'() ->
#{'__struct__' => 'Elixir.Learn.User', age => 27,
name => <<"John">>}.
'__struct__'(__@1) ->
'Elixir.Enum':reduce(__@1,
#{'__struct__' => 'Elixir.Learn.User', age => 27,
name => <<"John">>},
fun ({__@2, __@3}, __@4) ->
maps:update(__@2, __@3, __@4)
end).
Almost the same code is inside Elixir.Learn.erl for the other struct.
This shows that each struct has two "constructors", one without arguments that
returns a struct with the default values for all fields and one that merges
the arguments on the default values.
Let's see what changed on the consolidated protocol module:
struct_impl_for('Elixir.Learn.User') ->
'Elixir.Learn.Size.Learn.User';
struct_impl_for('Elixir.Learn') ->
'Elixir.Learn.Size.Learn';
struct_impl_for(_) -> nil.
Each struct type returns the module where the protocol is implemented,
let's see both implementations:
Elixir.Learn.Size.Learn.User.erl:
size(_user@1) ->
case _user@1 of
#{age := __@1} -> __@1;
__@1 when erlang:is_map(__@1) ->
erlang:error({badkey, age, __@1});
__@1 -> __@1:age()
end
+ 2.
Elixir.Learn.Size.Learn.erl:
size(_learn@1) ->
case _learn@1 of
#{age := __@1} -> __@1;
__@1 when erlang:is_map(__@1) ->
erlang:error({badkey, age, __@1});
__@1 -> __@1:age()
end
+ 1.
Summary:
Elixir protocols are compiled to its own module whose content is the
consolidated dispatch logic for it.
This logic is created by getting all the defimpl statements for it and adding
a function clause to the struct_impl_for/1 function if the target type is an
Elixir struct and a clause to the impl_for/1 function if the target type is
any other type.
The function (impl_for!/1) returns the module that has the protocol
implementation for the provided type.
Each protocol function asks for the module via impl_for!/1 and calls it with
the given arguments.
This is just guessing, but the module indirection must be there to allow
hot code reloading protocol implementations for each type independently without
requiring also reloading the protocol consolidation. The struct_impl_for function
is there to destructure the map only once.
I don't see traces of dynamic dispatch in case a module is loaded with a
protocol implementation that was not known at consolidation time, I need
to research this further.
An extra guess, this logic on the struct field to get the age field:
case _learn@1 of
#{age := __@1} -> __@1;
__@1 when erlang:is_map(__@1) ->
erlang:error({badkey, age, __@1});
__@1 -> __@1:age()
end
May be because Elixir allows to call a struct "method" without parenthesis and
that's why it looks for the field first and the function with the same name second?
I'm not entirely sure since my Elixir knowledge is basically non existent :)
If you have any questions or corrections I'm @warianoguerra my other accounts here: https://keybase.io/marianoguerra