Better Params Parsing

by Jeremy D. Frens on July 17, 2016
part of the Fractals in Elixir series


This week’s episode of Fractals in Elixir has little to do with fractals and more to do with plain old Elixir. I cleaned up the way I’m processing the parameters for a fractal.

My code this week is in my mandelbrot repo on Github tagged blog_2016_07_17. Links to the previous articles are available in the README.

The problems

So I’ve been fighting a few problems pretty much from the beginning:

  • JSON inputs suck: I’d rather be parsing YAML.
  • “Options” is not a good name: other people have defined its meaning and connotation.
  • Flags and an input file: so dang confusing until it wasn’t!

YAML input

So, I admit it: I like YAML. I parsed YAML inputs for my Haskell version of the fractals program. I tried parsing YAML when I wrote my first Elixir version, but it didn’t go well. Then I discovered that poison parsed JSON just fine.

I didn’t want to maintain both YAML and JSON versions, so I found some code that converts YAML to JSON, and I created a rake task around that. But I had to remember to run that rake task every time I changed a YAML file, and it was annoying trying to read the JSON files because they were all flat and very explicitly stringified.

More recently I discovered yaml_elixir. The switch from poison to yaml_elxir was very simple. Instead of this:

json = filename |> File.read! |> Poison.Parser.parse!

I now parse YAML with this:

yaml = YamlElixir.read_from_file(filename)

Options, configs, specs, and params (oh my!)

So back in my Haskell program, I specified options for a fractal. The word/metaphor “options” may or may not have been a good decision for Haskell, but it was a bad idea for Elixir.

In Elixir and OTP, many functions have a options keyword hash (e.g., GenServer.start_link/3). I found that while I was writing my supervisors and servers, I was having trouble keeping straight what options referred to: my options for a fractal or options for the OTP server.

I thought about using the term “config”, but Elixir programs already have a “config” which work more at a system level.

I thought about using the term “specification”, but it’s kind of long, and it would be too easily confused with testing “specs” especially since I’m using espec for testing.

I thought about using the term “params”. I’m not sure I like the way “parameters” is often abbreviated as “params”,1 and it’s very difficult to make plural.2 However, “params” is common enough that I figured most developers would understand what it means: “params” are values that drive the program’s computation.

I went through the whole program and changed Options to Params and options to params. This turned out to be pretty easy and straightforward.

Params come from one Enum

In my original Elixir program, I intended for flags specified on the command line to override params3 set in an input file. Instead, flags were only used to set a chunk size and change concurrency options, and they could not be specified in an input file.

Part of the problem was that the params needed to be built from two different sources: flags and an input file. I sort of had this code:4

def main(args) do
  case OptionParser.parse(args) do
    {flags, [params_filename, output_filename], _} ->
      Params.parse(flags, params_filename, output_filename)
      |> Params.open_output_file
      |> Params.set_next_pid(self)
      |> main_helper
    _ ->
      usage()
  end
end
  • args are the command-line arguments.
  • OptionParser is a standard Elixir library to parse command-line arguments.
  • flags are the flags specified on the command line.
  • params_filename and output_filename are positional arguments from the command line.
  • Params.open_output_file and Params.set_next_pid inject more params.
  • main_helper does the rest of real work.

Everything here just seems like special handling to me. I feed parsed raw params (flags) and unparsed raw params (params_filename) and just a filename (output_filename) to Params.parse. But that one call into the Params module isn’t enough; I have to do some ad-hoc additions to open an output file and set the next pid.

I tried a variety of things to clean this up. The breakthrough came when I realized that params_filename and output_filename and next_pid were not separate entities to pass to Params.parse. They were just raw params that weren’t specified as flags on the command line. Positional arguments are awfully convenient on the command line; some values are internal; wherever they come from, everything is a raw param.

def main(args) do
  case OptionParser.parse(args) do
    {flags, [params_filename, output_filename], _} ->
      flags
      |> Keyword.put(:params_filename, params_filename)
      |> Keyword.put(:output_filename, output_filename)
      |> Keyword.put(:next_pid, self)
      |> Params.parse
      |> main_helper
    _ ->
      usage()
  end
end

Opening the file was really related to the filename itself and should be parse of parsing that raw param.

Keyword.put/3 puts the key-value at the beginning of the list and overrides any setting the key might have in the flags.

Parsing raw params, not a Params struct

Params.parse used to parse the input file and the flags separately and with two different functions. It was a lot of special handling, as if they were two very different things.

Parsing the input file centered around a Params struct:

def parse_input_file(params, json) do
  %{params |
    fractal:     parse_fractal(json["fractal"]),
    size:        parse_size(json["size"]),
    color:       parse_color(json["color"]),
    seed:        parse_seed(json["seed"]),
    upper_left:  parse_complex(json["upperLeft"]),
    lower_right: parse_complex(json["lowerRight"]),
    c:           parse_complex(json["c"], %Complex{real: 1.0, imag: 0.0}),
    z:           parse_complex(json["z"], %Complex{real: 0.0, imag: 0.0}),
    r:           parse_complex(json["r"], %Complex{real: 0.0, imag: 0.0}),
    p:           parse_complex(json["p"], %Complex{real: 0.0, imag: 0.0})
  }
end

I had a different function for dealing with the flags:

def parse_flags(params, user_flags) do
  flags = Keyword.merge(@default_flags, user_flags)
  %{params |
    chunk_size:  Keyword.fetch!(flags, :chunk_size)
    }
end

parse/3 brought these together:

def parse(flags, params_filename, image_filename) do
  %Params{image_filename: image_filename}
  |> parse_file(params_filename)
  |> parse_flags(flags)
end

Default values are a mess. Some values like fractal and color have no default value (although they could and probably should). c, z, r, and p have default values because parse_complex/2 has an optional second parameter for a default value. chunk_size has a default value because it’s a flag, and I have default values for flags. Three different ways to handle default values.

Ultimately, the problem was that I let the Params struct drive the code. But that’s the thing I should be transforming and accumulating. The incoming raw params should be driving this code.

Iterating over a Keyword list or Map is the same as far as Enum is concerned; they both implement the Enumerable protocol. Flags come in a Keyword list while a parsed YAML file results in a Map. So I just need a Params.parse that handles an Enumerable:

def parse(raw_params, params \\ default) do
  raw_params
  |> Enum.reduce(params, &parse_attribute/2)
  |> precompute
end
  • raw_params is the Enumerable.
  • params is a Params struct, an accumulator.
  • default returns a Params struct with default values in it.

Enum.reduce passes each key-value tuple to parse_attribute which has three clauses.

Parsing general attributes

The last and most general clause of parse_attributes handles most attributes:

defp parse_attribute({attribute, value}, params) do
  %{params | attribute => parse_value(attribute, value)}
end

parse_value has five clauses. Here’s one of them:

defp parse_value(:fractal, value) do
  String.to_atom(String.downcase(value))
end

So parse_attribute({:fractal, "Mandelbrot"}, params) ends up adding :fractal mapped to :mandelbrot in a new Params built off of params.

Parsing the size is a bit interesting since it has to go from "1024x768" to %Size{width: 1024, height: 768}:

defp parse_value(:size, value) do
  [_, width, height] = Regex.run(~r/(\d+)x(\d+)/, value)
  %Size{
   width:  String.to_integer(width),
   height: String.to_integer(height)
  }
end

Several attributes are complex numbers:

@complex_attributes [:upper_left, :lower_right, :c, :p, :r, :z]

defp parse_value(attribute, value)
when attribute in @complex_attributes do
  Complex.parse(value)
end

:seed and :chunk_size don’t need any special parsing, and there’s a catch-all case for them:

defp parse_value(_attribute, value), do: value

I love how simple and short these functions are, worried only about parsing one kind of data—and no default values!

Parsing the output filename

For the output filename, I need to open an output stream:

defp parse_attribute({:output_filename, filename}, params) do
  %{params | output_pid: File.open!(filename, [:write])}
end

I should probably also record the output filename in the official params, but the stream is absolutely necessary.5

Parsing the input file

Magic!

defp parse_attribute(:params_filename, filename, params) do
  yaml = filename |> YamlElixir.read_from_file |> symbolize
  parse(yaml, params)
end

Reading and parsing the YAML file is straightforward, simple, and oh-so-powerful: call Params.parse recursively. I pass in the params that have accumulated so far to the recursive call so that more params can be added.

I’m often struck by how powerful and simple recursion can be. Suddenly any raw attribute can be specified as a flag or in an input file. And since the solution is recursion, the input can be recursive. Yes, I can specify a params_filename in my input files, and the settings accumulate across all flags and input files.

Outstanding issues

Overriding params is somewhat unpredictable. It mostly works the way I want (flags override anything in the files). But due to the way that Maps sort their keys, you may get strange overriding if you parse one file from another file.

Only the last params_filename in a file will be honored. It might be nice to specify more than one.

output_filename can be specified more than once in input files, but only the last one encountered will get any data. All others will have a file handled opened for it that won’t be explicitly closed.

I have no protection from an infinite recursion reading input files.

Ultimately, all of these issues are not that important. I have much cleaner code, and it’s more powerful. So I’m going to declare params handling done for now.

Next week

I started to implement a few more fractals, but I discovered that I need to tweak the cutoff magnitude and the number of iterations in order for some of the fractals to look good. I’m handling these numbers inconsistently across several modules, and I’d like to be able to tweak them for each run, so I’m moving them into the Params.

Also, I’m looking to solve some of my code duplication issues, especially in the escape-time modules.

Footnotes

  1. I also don’t like “config” as an abbreviation for “configuration”.

  2. At one point, I was going to have a list of params and had troubles naming it: list_of_params? Inconsistent, too long, and includes a datatype in the name. params_collection? Also, inconsistent, too long, and includes a vague datatype in the name. paramses? Too Gollum.

  3. When I made the “one Enum” changes described in this section, the params were still called “options”. So you’ll see “options” used in the commit where the change was made, but I don’t think that transformation is that interesting. Look at the tag for this week to see the finished product.

  4. I really worked this code over so that it sucks in the right way for my story here, but if you go looking for it in my repo, it’s even worse.

  5. Params has a close function for closing any resources opened by the Params. For now, that’s just the one output file. If it’s not closed, the output file might not get properly flushed.

elixir fractals