An Anagram Detection Example

To explore different orders of magnitude, let’s consider four different solutions to the problem of detecting if a string is an anagram. One string is an anagram of another if the second is a rearrangement of the letters in the first. For example, 'heart' and 'earth' are anagrams.

For the sake of simplicity, let’s assume that the two strings in question use symbols from the set of 26 lowercase alphabetic characters. Our goal is to write a boolean function that will take two strings and return whether they are anagrams.

Solution 1: Checking Off

Our first solution to the anagram problem will check whether each character in the first string occurs in the second. As we perform these checks, we’ll “check off” characters. If we can check off each character, then the two strings must be anagrams.

We can check off a character by replacing it with the special Python value None. Since strings in Python are immutable, we need to convert the second string to a list. Each character from the first string will be checked against the characters in this list and, if found, checked off by replacement.

An implementation of this strategy might look like this:

def anagram_checking_off(s1, s2):
    if len(s1) != len(s2):
        return False

    to_check_off = list(s2)

    for char in s1:
        for i, other_char in enumerate(to_check_off):
            if char == other_char:
                to_check_off[i] = None
                break
        else:
            return False

    return True

anagram_checking_off('abcd', 'dcba')  # => True
anagram_checking_off('abcd', 'abcc')  # => False

To analyze this algorithm, note that each of the nn characters in s1 causes an iteration of up to nn characters in the list from s2. Each of the nn positions in the list will be visited once to match a character from s1.

So the number of visits becomes the sum of the integers from 1 to nn. We recognized earlier that this can be written as

i=1ni=n(n+1)2=12n2+12n \sum_{i=1}^{n} i = \frac {n(n+1)}{2} = \frac {1}{2}n^{2} + \frac {1}{2}n
As nn gets larger, the n2n^{2} term will dominate the nn term and the 12\frac {1}{2} constant can be ignored. Therefore, this solution is O(n2)O(n^{2}).

Solution 2: Sort and Compare

A second solution uses the fact that, even though s1 and s2 are different, they are only anagrams if they consist of the same characters. If the strings are anagrams, sorting them both alphabetically should produce the same string.

First, we use the Python builtin function sorted to return an iterable of sorted characters for each string. Then, we use itertools.izip_longest to iterate over the sorted characters of both strings until the end of the longer string.

Here is a possible implementation using this strategy:

from itertools import izip_longest


def anagram_sort_and_compare(s1, s2):
    for a, b in izip_longest(sorted(s1), sorted(s2)):
        if a != b:
            return False
    return True

anagram_sort_and_compare('abcde', 'edcba')  # => True
anagram_sort_and_compare('abcde', 'abcd')  # => False

At first glance, this algorithm seems like it’s O(n)O(n), since there’s only one iteration to compare nn characters after sorting. However, the two sorted method calls have their own cost, typically O(nlogn)O(n\log n). Since that function dominates O(n)O(n), this algorithm will also be O(nlogn)O(n\log n).

Solution 3: Brute Force

A brute force technique for solving a problem typically tries to exhaust all possibilities. We can apply this technique to the anagram problem by generating a list of all possible strings using the characters from s1. If s2 occurs in that list, then s1 and s2 are anagrams.

There’s a problem with this approach, though: when generating all possible strings from s1, there are nn possible first characters, n1n-1 possible second characters, n2n-2 possible third characters, and so on. The total number of candidate strings is n(n1)(n2)...321n(n-1)(n-2)...321, which is n!n!. Although some of the strings may be duplicates, the program won’t know that and will still generate n!n! strings.

It turns out that n!n! grows even faster than 2n2^{n} as n gets large. If s1 were 20 characters long, there would be 20!20! or 2,432,902,008,176,640,000 possible candidate strings. If we processed one candidate every second, it would take 77,146,816,596 years to process the entire list. This is probably not going to be a good solution.

Solution 4: Count and Compare

Our final solution uses the fact that any two anagrams have the same number of a’s, the same number of b’s, the same number of c’s, and so on. First, we generate character counts for each string. If these counts match, the two strings are anagrams.

Since there are 26 possible characters, we can use a list of 26 counters for each string. Each time we see a particular character, we’ll increment the counter at that character’s position. If the two lists are identical at the end, the strings must be anagrams.

Here is a possible implementation of this strategy:

def anagram_count_compare(s1, s2):
    c1 = [0] * 26
    c2 = [0] * 26

    for char in s1:
        pos = ord(char) - ord('a')
        c1[pos] += 1

    for char in s2:
        pos = ord(char) - ord('a')
        c2[pos] += 1

    for a, b in zip(c1, c2):
        if a != b:
            return False
    return True

anagram_count_compare('apple', 'pleap')  # => True
anagram_count_compare('apple', 'applf')  # => False

Again, the solution has many iterations. However, unlike the first solution, none of them are nested. The first two iterations count characters and are O(n)O(n). The third iteration always takes 26 steps since there are 26 possible characters. Adding everything gives T(n)=2n+26T(n)=2n+26 steps, which is O(n)O(n). We have found a linear order of magnitude algorithm for solving this problem.

This implementation could be written more succinctly by using collections.Counter, which constructs a dict-like object mapping elements in an iterable to the number of occurrences of that element in the iterable. Behold:

from collections import Counter


def anagram_with_counter(s1, s2):
    return Counter(s1) == Counter(s2)

anagram_with_counter('apple', 'pleap')  # => True
anagram_with_counter('apple', 'applf')  # => False

Note that anagram_with_counter is also O(n)O(n), but we only know this because we understand the implementation of collections.Counter.

One final thought about space requirements: although the last solution was able to run in linear time, it only did so by using additional storage for the two lists of character counts. In other words, this algorithm sacrificed space in order to gain time.

This is a common tradeoff. In this case, the amount of extra space isn’t significant; however, if the underlying alphabet had millions of characters, there would be more cause for concern.

On many occasions, you’ll need to choose between time and space. When given a choice of algorithms, it’s up to you as a software engineer to determine the best use of computing resources for a given problem.

Solution 1: Checking Off

Our first solution to the anagram problem will check to see that each character in the first string actually occurs in the second. If it is possible to “checkoff” each character, then the two strings must be anagrams. Checking off a character will be accomplished by replacing it with the special JavaScript value null. However, since strings in JavaScript are immutable, the first step in the process will be to convert the second string to an array. Each character from the first string can be checked against the characters in the list and if found, checked off by replacement. An implementation of this strategy may look like this:

function anagramCheckingOff (string1, string2) {
  if (string1.length !== string2.length) return false

  const string2ToCheckOff = string2.split('')

  for (let i = 0; i < string1.length; i++) {
    let letterFound = false
    for (let j = 0; j < string2ToCheckOff.length; j++) {
      if (string1[i] === string2ToCheckOff[j]) {
        string2ToCheckOff[j] = null
        letterFound = true
        break
      }
    }
    if (!letterFound) return false
  }

  return true
}

assert.equal(true, anagramCheckingOff('abcd', 'dcba'))
assert.equal(false, anagramCheckingOff('abcd', 'abcc'))

To analyze this algorithm, we need to note that each of the n characters in s1 will cause an iteration through up to n characters in the list from s2. Each of the n positions in the list will be visited once to match a character from s1. The number of visits then becomes the sum of the integers from 1 to n. We recognized earlier that this can be written as

i=1ni=n(n+1)2 \sum_{i=1}^{n} i = \frac {n(n+1)}{2}
=12n2+12n = \frac {1}{2}n^{2} + \frac {1}{2}n
As nn gets large, the n2n^{2} term will dominate the nn term and the 12\frac {1}{2} can be ignored. Therefore, this solution is O(n2)O(n^{2}).

Solution 2: Sort and Compare

Another solution to the anagram problem will make use of the fact that even though string1 and string2 are different, they are anagrams only if they consist of exactly the same characters. So, if we begin by sorting each string alphabetically, from a to z, we will end up with the same string if the original two strings are anagrams. Below is a possible implementation of this strategy. First, we convert each string to an array using the string method split, and then we use the array method sort which lexographically sorts an array in place and then returns the array. Finally, we loop through the first string, checking to make sure that both strings contain the same letter at every index.

function anagramSortAndCompare (string1, string2) {
  if (string1.length !== string2.length) return false

  const sortedString1 = string1.split('').sort()
  const sortedString2 = string2.split('').sort()

  for (let i = 0; i < sortedString1.length; i++) {
    if (sortedString1[i] !== sortedString2[i]) return false
  }

  return true
}

assert.equal(true, anagramSortAndCompare('abcde', 'edcba'))
assert.equal(false, anagramSortAndCompare('abcde', 'abcd'))

At first glance you may be tempted to think that this algorithm is O(n)O(n), since there is one simple iteration to compare the n characters after the sorting process. However, the two calls to the Python sorted method are not without their own cost. Sorting is typically either O(n2)O(n^{2}) or O(nlogn)O(n\log n), so the sorting operations dominate the iteration. In the end, this algorithm will have the same order of magnitude as that of the sorting process.

Solution 3: Brute Force

A brute force technique for solving a problem typically tries to exhaust all possibilities. For the anagram detection problem, we can simply generate a list of all possible strings using the characters from s1 and then see if s2 occurs. However, there is a difficulty with this approach. When generating all possible strings from s1, there are nn possible first characters, n1n-1 possible characters for the second position, n2n-2 for the third, and so on. The total number of candidate strings is n(n1)(n2)...321n(n-1)(n-2)...321, which is n!n!. Although some of the strings may be duplicates, the program cannot know this ahead of time and so it will still generate n!n! different strings.

It turns out that n!n! grows even faster than 2n2^{n} as n gets large. In fact, if s1 were 20 characters long, there would be 20!20! or 2,432,902,008,176,640,000 possible candidate strings. If we processed one possibility every second, it would still take us 77,146,816,596 years to go through the entire list. This is probably not going to be a good solution.

Solution 4: Count and Compare

Our final solution to the anagram problem takes advantage of the fact that any two anagrams will have the same number of a’s, the same number of b’s, the same number of c’s, and so on. In order to decide whether two strings are anagrams, we will first count the number of times each character occurs. Since there are 26 possible characters, we can use a array of 26 counters, one for each possible character. Each time we see a particular character, we will increment the counter at that position. In the end, if the two arrays of counters are identical, the strings must be anagrams. Here is a possible implementation of the strategy:

function anagramCountCompare (string1, string2) {

  function getLetterPosition (letter) {
    return letter.charCodeAt() - 'a'.charCodeAt()
  }

  const string1LetterCounts = new Array(26).fill(0)
  const string2LetterCounts = new Array(26).fill(0)

  for (let i = 0; i < string1.length; i++) {
    const letterPosition = getLetterPosition(string1[i])
    string1LetterCounts[letterPosition]++
  }

  for (let i = 0; i < string2.length; i++) {
    const letterPosition = getLetterPosition(string2[i])
    string2LetterCounts[letterPosition]++
  }

  for (let i = 0; i < string1LetterCounts.length; i++) {
    if (string1LetterCounts[i] !== string2LetterCounts[i]) {
      return false
    }
  }

  return true
}

assert.equal(true, anagramCountCompare('apple', 'pleap'))
assert.equal(false, anagramCountCompare('apple', 'applf'))

Again, the solution has a number of iterations. However, unlike the first solution, none of them are nested. The first two iterations used to count the characters are both based on nn. The third iteration, comparing the two lists of counts, always takes 26 steps since there are 26 possible characters in the strings. Adding it all up gives us T(n)=2n+26T(n)=2n+26 steps. That is O(n)O(n). We have found a linear order of magnitude algorithm for solving this problem.

Those with greater familiarity with JavaScript may note that the counting strategy we implemented in anagramCountCompare could be much more succinctly approached using the reduce method of arrays. Note that we are required to add an additional check that the strings are of the same length since our dictionary comparison loop will not account for string2 having additional characters.

function anagramCountCompareWithReduce (string1, string2) {

  function letterCountReducer (letterCounts, letter) {
    if (letterCounts[letter]) {
      letterCounts[letter]++
    } else {
      letterCounts[letter] = 1
    }
    return letterCounts
  }

  const string1LetterCounts = string1.split('').reduce(letterCountReducer, {})
  const string2LetterCounts = string2.split('').reduce(letterCountReducer, {})


  for (let letter in string1LetterCounts) {
    if (string1LetterCounts[letter] !== string2LetterCounts[letter]) {
      return false
    }
  }

  return string1.length === string2.length
}

assert.equal(true, anagramCountCompareWithReduce('apple', 'pleap'))
assert.equal(false, anagramCountCompareWithReduce('apple', 'applf'))

It is worth noting that anagramCounterCompareWithReduce is also O(n)O(n), but that it is impossible to determine this without understanding the implementation of Array.reduce method.

Before leaving this example, we need to say something about space requirements. Although the last solution was able to run in linear time, it could only do so by using additional storage to keep the two dictionaries of character counts. In other words, this algorithm sacrificed space in order to gain time.

This is a common tradeoff. On many occasions you will need to make decisions between time and space trade-offs. In this case, the amount of extra space is not significant. However, if the underlying alphabet had millions of characters, there would be more concern. As a software engineer, when given a choice of algorithms, it will be up to you to determine the best use of computing resources given a particular problem.

Practical Algorithms and Data Structures

Introduction