MeleeNeat: Novelty Search and Super Smash Brothers Melee
MeleeNeat is a stream/project I started at the start of the year (2021) applying a homebrew NEAT (Neuroevolution of Augmenting Topologies) implementation to the game Super Smash Bros. Melee. In short, NEAT is a neuroevolution algorithm that evolves a population of neural networks to solve a particular problem. There are plenty of great write ups for the causal consumer, so I won’t belabor over the details. In a typical machine learning problem, algorithms are set up to aim at a given target or objective. The objective acts as a compass for the algorithm to know which way to go, and in specific for NEAT — which neural networks performed better at the goal than others. This generally makes sense and has served many problems well. However, some problems (Smash Brothers included) are not really definable by a single objective. Or — if it is, you’d need some sort of base intuition to pair with the single premise objective. This is because of something regarded as ‘deception’ in the search space. Deception in a problem space often gives the impression of progress but will instead lead to a dead end. Even multi-objective arrangements don’t pan out to describe the game or a problem with deception in the way the objective designer hopes.
I personally spent nearly 4 months experimenting with various constructions of objectives and schemes to create the right environment for two competing AI’s to learn the game in a way that was interesting to watch. Almost never did anything of value appear, or if reasonable models showed up, they quickly faded in favor of an exploit in the objective space. In retrospect, I would describe the objective search as fragile, and prone to derailment and converging into a terminal bad state.
In order to address this problem of deception, we can do away with the objective function all together (at least as a way of grading performance that drives the evolution selection) and instead replace it with an algorithm called Novelty Search. Novelty Search is an algorithm that can be applied to NEAT (among other algorithms) to search for new behavior. While it’s ambitions certainly not as grand, it emulates a potential driving factor of evolution — search by development. In a more general way, the algorithm operates under a philosophy of:
Sometimes going no where in particular can get you exactly where you want to be.
The quick and dirty idea is to define some sort of behavior space, track the agent’s behavior during its evaluation, and then perform a KNN average with the new behavior vector against all previously seen behavior vectors.
This measuring of behavior allows us to describe spaces of activity we’re interested in exploring, without conditioning the exploration on some sort of arbitrary benchmark. It’s true the behavior metric itself could be argued as arbitrary, but the functional impact of saying do different things in this space is a much different directive than maximize this specific condition.
In the novelty search paper, the authors use a maze to illustrate novelty search and the problem of deception. In that case, there exists a robot which must move from the start to the finish of a maze. The agent has a number of steps it can take before the robot’s turn expires. The final location where the agent ends acts as a representation of behavior. Thus each agent produces a vector of 2 values (x,y). The behavior for the finishing agent is compared against all ending locations seen thus far. The closest K (a chosen parameter) behaviors are selected. Those K behavior distances from the new behavior are averaged and that becomes the agent’s score.
The complex maze acts both as a complex problem as well as a convenient representation of what deception looks like, but not confined to. If the objective were described as the closer the robot was to the exit the better the score, many cul-de-sac’ exist that would provide higher scores than the longer, but eventually correct path to the exit. If one were to watch the novelty search unfold over time, they’d see the agents pour out of the starting area like water. Once a space is adequately sampled (for solutions), new solutions that land there are devalued as candidates for the next generation. This somewhat creates a promise of never ending production of new behavior.
In MeleeNEAT the behavior metric is represented by 3 structures. A vector of all actions performed throughout the agents turn. A vector that represents the causal action of each time damage was caused, and a set of vectors that are the sequence of actions used to recover back on stage. So this leaves us with 3 structures — allActions, damageActions, recoveryActions.
An action are things such as walk, run, down tilt attack, jump, land, die bottom stage, and so forth. In total there is about 400 actions per LibMelee. When the agent’s turn is over, the action vector is measured against the closest K behaviors seen in the past and the average distance is the resulting agent’s novelty score. In order to compare two sequences of actions, the vector of action ids are converted into corresponding utf8 characters and the Levenschtein edit distance is applied. Levenschtein is a way for us to measure the minimal number of operations required (add, remove or modify) to turn one string into another. A more common usage might be to compare a word against a dictionary for a spell check application.
In addition to grading novelty on the 3 behaviors outlined above, there is also the evaluation ruleset. MeleeNEAT uses a timer based breadcrumb idea where the agent starts with a given play clock T. Whenever the agent does damage, the enemy dies, or the agent recovers to stage from being knocked off — the agents clock is reset. Effectively guiding the suitable space to search. I tend to think of it as the following:
We don’t care how they do the things we’re interested in, but we do care that they are done.
The distance formula currently in use is allActionDistance + damageActionsDistance * 10 + recoveryActionsDistance * 2.
I think there’s two things worth mentioning here. First, the damageNovelty and recoveryNovelty are in effect subsequences of allActionDistance. But offering a refinement on the space. The second is that these joined distances provide a way to weight our interest in the novelty space. Recovery actions tend to be a longer string of sequences, and can occur in succession. The multiplier of 2 is arbitrary, but serves to elevate this behavior space over the all action space. The logic is similar for damageActions — these can be sparse (especially in the early phases) and are actually highly related to the goal we’re interested in. So again the 10 is arbitrary, but given the sparser and more valuable nature of damage interactions, an order of magnitude felt reasonable.
Lastly, we apply a minimal criteria of doing damage in the agents turn after the first 100 generations. Minimal criteria is basically a Boolean flag that when false the novelty search rejects the behavior and scores it a 0. With the objective of the game somewhat embeded in various parts of the experiment (behavior space and agent play time), the ai quickly can satisfy this criteria and seems to serve as an extra guard against less fruitful search directions.
I’ll hopefully continue to do some write ups to explain various parts of the project that are more opaque to the viewer at the moment. In the coming weeks I’m looking to try various representations of behavior, implement a hyperNEAT variant and improve the stream interface to provide more visual clarity to the viewers. Thanks for reading, feel free to @MeleeNeat on twitter or comment here if you are left with questions.
Follow up: Novelty Search, Minimal Constraints and Hyperspace Islands