My recap from Google Hash Code 2019

This was my second attempt at # Hash Code contest, a team programming competition to solve an engineering problem proposed by Google. Every programming language or library can be used to produce the results.

Problem statement for Online Qualification Round 2019 is available at Photo slideshow.

Some notes and hints are included in this blog post, just to help with the preparation of next year contest.

Input & Output files

Google is producing input data for the problem using text files where data is delimited by spaces and lines. In order to submit the results, an equivalent format is described in the problem statement.

Advice: Prepare a generic file reader and file writer tool.

This year our team started with a generic Read & Write file tool ready to be used, so we saved around 30 minutes to focus on the problem.

We also prepared a small script just to package the source code in ZIP format quickly.

Solving the problem

Before thinking about bonus and scorings, it’s better to start by solving the problem. This year the problem was based on a massive comparison of lists of strings (tags) .

We developed a SimpleEngine class according to a simple strategy:

  • Compile slides from Vertical Photos trying to pair elements with no tags (or very few tags) in common
  • Take one slide and find the first one having a MAX Value in Google Formula. You can get MAX Value dividing the max count of tags by 2.
  • Repeat previous step till no more slides left

This implementation will produce following results:

 A - Example                    2
B - Lovely landscapes 202,569
C - Memorable moments 1,472
D - Pet pictures 401,434
E - Shiny selfies 406,415

What gives a total score of 1,011,892

You can check our source code at https://github.com/angelborroy/google-hashcode-2019

Thinking about going further

Despite this algorithm solves the problem, it’s far from an optimal scoring. It looks like this year a perfect scoring was 1,250,000 points.

What we are missing is that more than one slide can give us the MAX Value from a singular slide, so some backtracking algorithm should be used to find the best choice.

From my point of view, the problem was harder than last year. Not because of the complexity, as it was easier, but for the computational cost. There are a huge amount of calculations required to find the optimal path and even using a modern laptop 4 hours is not enough.

Anyway I’ve enjoyed again at the contest this year and of course I’ll be ready to do it better next one!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s