I

Ramon Saraiva

A bard fisherman's tale

Fishing is one of the character professions most MMORPGs implement. When New World was still in its beta phase, AGS (Amazon Game Studios) announced that they were implementing a fishing mechanic to the game. Along with the announcement, there was a video of how it would be like, showcasing the experience of catching fishes and treasure boxes in salt and fresh waters.

When you talk to different people about fishing in games, they're on different sides of the spectrum, some tend to like it and find it chilling, others hate the experience cause it's usually boring. No matter what side you are, we can all agree fishing is either profitable or yield good resources for crafting, specially for cooking.

I personally dislike the experience as most games don't create any challenge or danger for it, so you're just sitting there repeating the same actions over and over, but I still wanted to try it out at least to see if it would be a profitable profession in New World.

In between 2 beta playtests, I downloaded the fishing experience video file and realized I could possibly automate the whole thing. There were specific patterns that could be used as triggers to a series of inputs and state transitions for catching a fish.

Before starting to write any code, I listed the sequence of states and possible strategies for transitioning between them:

It's easy to correlate these states and transitions with a FSM (Finite State Machine), where we have a finite number of nodes, describing the possible states of the system, and a finite number of arcs representing the transitions that do or don't change the state, respectively.

fishing finite state machine

Having the states and the strategies for the transitions defined, I began thinking of how I could design these entities in a way I could reuse that in the future for other use cases, not necessarily in the same game, so I came up with a design similar to this:

erd

The Application defines its states and strategies and uses a state controller that will manage transitions and possible rollbacks, it also has a WindowCapture instance responsible for capturing frames, which will be represented by MemoryImage instances. A State has an execution Strategy and can contain ExternalImage assets that will be used for template matching processes within a frame.

For capturing frames of the game, I tried different libraries and benchmarked fps (frames per second). One approach was using winapi directly, resolving what was the game window and doing bitmap operations (WinAPIWindowCapture). The other was to use a library called pyautogui (PAGWindowCapture). Initially I thought that using winapi directly would yield better results in terms of fps, but pyautogui was actually better.

This sequence diagram demonstrates how each entity plays during the full lifecycle of the application:

sequence diagram

The Strategy of each state is responsible for doing any image processing required to understand whether a template is contained in a frame. For that, I used opencv, specifically opencv-python. There is pretty good documentation about how to do template matching in Python, but briefly, you send two arrays of data, the template and the frame (in this case numpy arrays), the threshold and the method you want to use. It really depends on the type of image you're using, but in this case I used TM_CCOEFF_NORMED which worked really well.

A sprite map was composed with all templates required by the states and their strategies, so I only needed to load the image from disk once and then crop the templates with X, Y offsets and each template width and height.

fishing sprite map

In the list of states I mentioned that the success state was slightly different because the animation had a glowing and transparent background, so depending on the zone you're fishing, colors could be completely different and template matching would not work as expected. To solve for that, during that state strategy, I applied a HSV filter to both the frame and the template, so the background colors wouldn't matter much, as we're only considering the shape of the glow.

success hsv filter

As you can see in the image, the filter ignores most background noise and makes opencv's match template functionality way more precise for this specific case.

About a year after the game was released, AGS announced a new profession to the game. Players can now play instruments and perform different songs in groups or solo, providing different types of buffs to whoever tips a specific amount of gold in the end of the performance. The classical guitar songs were recorded by John Oeth and are available here, the one playing in the showcase video in the beggining of this post is called Midnight Harmony. The mini game for playing instruments is similar to "guitar hero"-like games, where specific keys slide towards a destination and you need to trigger the right input when they do so.

Guess what? Using the same design implemented for the fishing profession, I was able to write different states and strategies and endlessly play every instrument/song of the game without even touching the keyboard. It took me less than 2 hours to write the strategy of these states, including the time it took for cropping all keys and arrange them in a sprite map.

bard sprtite map

There are not many different states for playing instruments in the game, once you start playing a song, you just need to hit the right inputs at the right time, so only two states were actually necessary, Starting and Playing:

bard finish state machine

The catch for pressing the keys at the right time was simply to offset the viewport of the WindowCapture, so it processed frames a bit ahead of when it actually needed to press the right inputs. Of course this all depends on the velocity that the keys slide towards the destination, but for all instruments and songs, they have pretty much the same velocity, so no big deal.

Automating these kind of things with image processing is usually safe, cause anti-cheat clients look for super specific applications that are known to be used for botting. I usually just build these for fun and share with some friends if they want to try it out.

Although this is probably bannable, in multiple games, I've never received a single warning. It is not always based on image processing as well, in Beep beep I was actually monitoring network and executing actions based on specific packet patterns, so there are multiple ways you can automate these kind of things, without even directly touching the game client.