Why algorithms matter

Published March 29, 2018

I was working on learning algorithms & data structures. Reading the comments, many people were angry their solution worked but didn't pass all the tests. Performance on large data sets are why these matter though. Here are a couple of my attempts, plus a third I found online afterwards.

Here's the problem I was tasked to solve. Find the first duplicated number in an array that contains numbers from 1 to the length of the array.

Idea #1

For each number, run a for loop looking for a duplicate & recording their index.

int FirstDuplicate(int[] a)
{
    int duplicate = -1;
    int duplicateIndex = a.Length;
    for (int i = 1; i < a.Length; i++) {
        for (int j = (i + 1); j < a.Length; j++)
        {
            if (a[i] == a[j] && j < duplicateIndex)
            {
                duplicate = a[j];
                duplicateIndex = j;
            }
        }
    }
}

Idea #1 Result

Speed: O(n²)
Memory: Effecient, only 1 array and 2 variables.

This worked for small sets of data. But when tested against anything large, it became really slow. If you've never heard of O(n²) read this intro to Big O Notation. The n² means the time it takes our function will double with each item we add to the data set.

This was a typical first attempt by others. Some were furious that they didn't get to move on after this attempt. They felt they solved the question. Their answer solved the problem & passed all the tests except for the largest data sets.

Large data sets are the reason for learning algorithms though!

Idea #2

After my first attempt, I decided to try a hash set. A hash set is a method of storing unique values. I created an empty hash set to store numbers in. I used the hash set’s ability to quickly look for a duplicate.

int FirstDuplicate(int[] a)
{
  HashSet<int> duplicates = new HashSet<int>();
  foreach (var number in a)
  {
    if ( duplicates.Contains(number))
    {
      return number;
    }
    duplicates.Add(number);
  }
  return -1;
}

Idea #2 Result

Speed: O(n)
Memory: Less effecient than attempt #1 as we're storing 2 lists but not terrible.

With this attempt I passed all the tests. It's also easier to read than nested loops. With a speed of O(n) we see a linear progression by increasing our data set instead of exponential like with the first attempt. We're not doubling the length of time to run it anymore with each new item!

Idea #3 - How Did Others Solve

After passing I wanted to see how others did this. I found this creative way that works in very speficic cases.

Since all numbers are positive & less than the length of the array, loop through the array turning them negative until a negative number is found.


int FirstDuplicate(int[] a)
{
  for (int i = 0; i < a.Length; i++)
  {
    if (a[Math.Abs(a[i]) - 1] > 0)
    {
      a[Math.Abs(a[i]) -1] = -a[Math.Abs(a[i]) -1];
    } else {
      return Math.Abs(a[i]);
    }
  }
  return -1;
}

Idea #3 Result

Speed: O(n) Slightly outperforms Idea #2 in tests.
Memory: Most effecient as just 1 array is stored.

This one takes a bit of thought to read and understand but it does run slightly faster than my second idea. It also handles memory better by only storing one array.

This attempt also goes to show that reading the question diligently can provide clues on ways to improve performance.

One Last Thing...

If you have a question or see a mistake, please comment below.

If you found this post helpful, please share it with others. It's the best thanks I can ask for & it gives me momentum to keep writing!

Algorithms | Hash | Performance | Arrays