http://jdevelopment.nl/efficient-determine-string-number/



Introduction

After cloning the M4N projects from Mercurial, one of the first classes I checked was the M4N Common Utils class. I saw the following method:

public static boolean isNumber(String string) {
    try {
        Long.parseLong(string);
    } catch (Exception e) {
        return false;
    }
    return true;
}

It’s the most easy way to determine if the String represents a valid number. But it’s not the most cheap/efficient way for the case that the String doesn’t represent a valid number at all, because an Exception has to be created. Creating an Exception is relatively expensive, among others the whole stacktrace has to be collected/built during the creation of an Exception. Even when it doesn’t throw an exception, it does another unnecessary job: creating the long value based on the digits found.

Alternatives

We could copy the source of the Long#parseLong() and modify it so that it returnsfalse rather than throwing an Exception and that it doesn’t create a long value. It makes use of Character#digit() to obtain the String’s character as a digit. We can replace this by Character#isDigit().

public static boolean isNumber(String string) {
    if (string == null || string.isEmpty()) {
        return false;
    }
    int i = 0;
    if (string.charAt(0) == '-') {
        if (string.length() > 1) {
            i++;
        } else {
            return false;
        }
    }
    for (; i < string.length(); i++) {
        if (!Character.isDigit(string.charAt(i))) {
            return false;
        }
    }
    return true;
}

Since we're looking for a certain pattern in a String, we could also grab regex for this. True, regular expressions are not the holy grail, but it may happen that its speed is very affordable. We want to allow an optional minus sign in the front -? and for the remnant only digits d+, so the regex pattern end up look like this:

private static final Pattern numberPattern = Pattern.compile("-?\d+");    

public static boolean isNumber(String string) {
    return string != null && numberPattern.matcher(string).matches();
}

Because compiling the pattern is also an expensive task at its own, we want to do it only once and declare it as a static final field.

(Micro) Benchmarking

Now we want to benchmark those three different approaches. This can basically be done by obtaining the System#nanoTime() as start time, then executing the piece of code -preferably in a fixed amount of iterations- and then obtaining the System#nanoTime()once again as end time and finally calculate the difference between the two times. However, there are some gotchas in this approach. You would be more benchmarking the JVM/Hotspot/JIT which is been used, not the code. The JIT for example may bring in some optimizations which may after all result in misleading benchmark results. Here are two articles which tells a bit more about the gotchas:

The most important considerations are that we'd like to put the code we want to benchmark in its own method which has a return value (which we in turn shouldn't ignore!) and that we also want to execute the particular method a bunch of times beforehand to trigger the JIT optimizations (to "warmup" the JVM).

public static void main(String... args) {
    // Prepare.
    String[] strings = { 
        null, "foo", "123", "+123", "-123", "0", "--123", "12345678901234567890"
    };
    int iterations = 1000000;
    boolean result = false;

    // Let for each of the strings show the isNumber() results.
    for (String string : strings) {
        System.out.printf("String: %s isNumberWithParseLong: %s WithIsDigit:"
            + " %s WithRegex: %s%n", string, isNumberWithParseLong(string),
                isNumberWithIsDigit(string), isNumberWithRegex(string));
    }

    // JVM warmup.
    System.out.print("Warming up JVM .. ");
    for (int i = 0; i < iterations / 10; i++) {
        for (String string : strings) {
            result ^= isNumberWithParseLong(string);
            result ^= isNumberWithIsDigit(string);
            result ^= isNumberWithRegex(string);
        }
    }
    System.out.println("Finished! Now the benchmarks ..");

    // Benchmark isNumber() with Long#parseLong().
    long st1 = System.nanoTime();
    for (int i = 0; i < iterations; i++) {
        for (String string : strings) {
            result ^= isNumberWithParseLong(string);
        }
    }
    long et1 = System.nanoTime();
    System.out.printf("isNumberWithParseLong: %d ms%n", (et1 - st1) / 1000000);

    // Benchmark isNumber() with Character#isDigit().
    long st2 = System.nanoTime();
    for (int i = 0; i < iterations; i++) {
        for (String string : strings) {
            result ^= isNumberWithIsDigit(string);
        }
    }
    long et2 = System.nanoTime();
    System.out.printf("isNumberWithIsDigit: %d ms%n", (et2 - st2) / 1000000);

    // Benchmark isNumber() with regex.
    long st3 = System.nanoTime();
    for (int i = 0; i < iterations; i++) {
        for (String string : strings) {
            result ^= isNumberWithRegex(string);
        }
    }
    long et3 = System.nanoTime();
    System.out.printf("isNumberWithRegex: %d ms%n", (et3 - st3) / 1000000);
    
    // Print the result. This way we let the JIT know that we're interested in the
    // result so that it doesn't optimize the one or other away, for the case that.
    System.out.println(result);
}

At the current machine, a Dell Latitude E5500 with Core2Duo P8400, the results are like this:

String: null isNumberWithParseLong: false WithIsDigit: false WithRegex: false
String: foo isNumberWithParseLong: false WithIsDigit: false WithRegex: false
String: 123 isNumberWithParseLong: true WithIsDigit: true WithRegex: true
String: +123 isNumberWithParseLong: false WithIsDigit: false WithRegex: false
String: -123 isNumberWithParseLong: true WithIsDigit: true WithRegex: true
String: 0 isNumberWithParseLong: true WithIsDigit: true WithRegex: true
String: --123 isNumberWithParseLong: false WithIsDigit: false WithRegex: false
String: 12345678901234567890 isNumberWithParseLong: false WithIsDigit: true WithRegex: true
Warming up JVM .. Finished! Now the benchmarks ..
isNumberWithParseLong: 9392 ms
isNumberWithIsDigit: 369 ms
isNumberWithRegex: 2763 ms
false

You see, using Character#isDigit() is in this particular benchmark up to 25 times faster than Long#parseLong(). True, this benchmark also covers the corner cases. In a lot of cases we expect valid numbers. If you remove the invalid numbers from theString[], you'll see that the difference isn't 25 times anymore, but only about 2.5 times.

Long Overflow

Maybe you've also noticed that there's a 12345678901234567890 string which is invalid according to Long#parseLong() (because it overflows), but is valid according to others. In practice, numbers won't be long like that, but if this has to be taken into consideration in the new isNumber() method as well to ensure its robustness, then it's worth the effort to call Long#parseLong() anyway when the string's length is equal to or greater than the number of digits in Long.MAX_VALUE. We'll change our winning isNumber() method like that:

private static final int NUMBER_MAX_LENGTH = String.valueOf(Long.MAX_VALUE).length();

public static boolean isNumber(String string) {
    if (string == null || string.isEmpty()) {
        return false;
    }
    if (string.length() >= NUMBER_MAX_LENGTH) {
        try {
            Long.parseLong(string);
        } catch (Exception e) {
            return false;
        }
    } else {
        int i = 0;
        if (string.charAt(0) == '-') {
            if (string.length() > 1) {
                i++;
            } else {
                return false;
            }
        }
        for (; i < string.length(); i++) {
            if (!Character.isDigit(string.charAt(i))) {
                return false;
            }
        }
    }
    return true;
}

It became a piece of code, but it's at least faster in the majority of use cases. It's however not very beneficial in a webapplication with 200ms response time and only one or twoisNumber() calls.

Bauke Scholtz

6 comments to “Efficient way to determine if a String is a Number”

  1. Imam says: 

    How about just using a regular expression?

  2. development says: 

    >How about just using a regular expression?

    That’s an option that’s mentioned in the post, see isNumberWithRegex(string) ;)

  3. Roy says: 

    For better understanding and comparison, you should compare each test case (input string) in three methods. I have given a different variant of your benchmark below:

    package main;

    import java.util.regex.Pattern;

    public class IsNumberTest {
    private static final Pattern numberPattern = Pattern.compile(“-?\d+”);

    public static boolean isNumberWithParseLong(String string) {
    try {
    Long.parseLong(string);
    } catch (Exception e) {
    return false;
    }
    return true;
    }

    public static boolean isNumberWithRegex(String string) {
    return string != null && numberPattern.matcher(string).matches();
    }

    public static boolean isNumberWithIsDigit(String string) {
    if (string == null || string.isEmpty()) {
    return false;
    }
    int i = 0;
    if (string.charAt(0) == ‘-’) {
    if (string.length() > 1) {
    i++;
    } else {
    return false;
    }
    }
    int n = string.length();
    for (; i < n; i++) {
    if (!Character.isDigit(string.charAt(i))) {
    return false;
    }
    }
    return true;
    }

    public static void main(String… args) {
    // Prepare.
    String[] strings = {
    null, "foo", "123", "+123", "-123", "0", "–123", "12345678901234567890"
    };
    int iterations = 1000000;
    boolean result = false;

    // Let for each of the strings show the isNumber() results.
    for (String string : strings) {
    System.out.printf("String: %s isNumberWithParseLong: %s WithIsDigit:"
    + " %s WithRegex: %s%n", string, isNumberWithParseLong(string),
    isNumberWithIsDigit(string), isNumberWithRegex(string));
    }

    // JVM warmup.
    System.out.print("Warming up JVM .. ");
    for (int i = 0; i < iterations / 10; i++) {
    for (String string : strings) {
    result ^= isNumberWithParseLong(string);
    result ^= isNumberWithIsDigit(string);
    result ^= isNumberWithRegex(string);
    }
    }
    System.out.println("Finished! Now the benchmarks ..");

    for (String string : strings) {
    // Benchmark isNumber() with Long#parseLong().
    long st1 = System.nanoTime();

    for (int i = 0; i < iterations; i++) {
    result ^= isNumberWithParseLong(string);
    }
    long et1 = System.nanoTime();
    System.out.printf("isNumberWithParseLong: %d ms%n", (et1 – st1) / 1000000);

    // Benchmark isNumber() with Character#isDigit().
    long st2 = System.nanoTime();

    for (int i = 0; i < iterations; i++) {

    result ^= isNumberWithIsDigit(string);
    }
    long et2 = System.nanoTime();
    System.out.printf("isNumberWithIsDigit: %d ms%n", (et2 – st2) / 1000000);

    // Benchmark isNumber() with regex.
    long st3 = System.nanoTime();

    for (int i = 0; i < iterations; i++) {
    result ^= isNumberWithRegex(string);
    }
    long et3 = System.nanoTime();
    System.out.printf("isNumberWithRegex: %d ms%n", (et3 – st3) / 1000000);
    }
    // Print the result. This way we let the JIT know that we're interested in the
    // result so that it doesn't optimize the one or other away, for the case that.
    System.out.println(result);
    }

    }

    Here is their output as I am seeing in my PC:
    String: null isNumberWithParseLong: false WithIsDigit: false WithRegex: false
    String: foo isNumberWithParseLong: false WithIsDigit: false WithRegex: false
    String: 123 isNumberWithParseLong: true WithIsDigit: true WithRegex: true
    String: +123 isNumberWithParseLong: false WithIsDigit: false WithRegex: false
    String: -123 isNumberWithParseLong: true WithIsDigit: true WithRegex: true
    String: 0 isNumberWithParseLong: true WithIsDigit: true WithRegex: true
    String: –123 isNumberWithParseLong: false WithIsDigit: false WithRegex: false
    String: 12345678901234567890 isNumberWithParseLong: false WithIsDigit: true WithRegex: true
    Warming up JVM .. Finished! Now the benchmarks ..
    isNumberWithParseLong: 1161 ms
    isNumberWithIsDigit: 6 ms
    isNumberWithRegex: 2 ms
    isNumberWithParseLong: 1463 ms
    isNumberWithIsDigit: 12 ms
    isNumberWithRegex: 216 ms
    isNumberWithParseLong: 66 ms
    isNumberWithIsDigit: 28 ms
    isNumberWithRegex: 300 ms
    isNumberWithParseLong: 1454 ms
    isNumberWithIsDigit: 13 ms
    isNumberWithRegex: 215 ms
    isNumberWithParseLong: 63 ms
    isNumberWithIsDigit: 28 ms
    isNumberWithRegex: 307 ms
    isNumberWithParseLong: 25 ms
    isNumberWithIsDigit: 12 ms
    isNumberWithRegex: 244 ms
    isNumberWithParseLong: 1455 ms
    isNumberWithIsDigit: 12 ms
    isNumberWithRegex: 244 ms
    isNumberWithParseLong: 1912 ms
    isNumberWithIsDigit: 160 ms
    isNumberWithRegex: 762 ms
    false

    As you can see the difference in performance is dependent on the input string.

  4. none@example.org says: 

    StringUtils.isNumber -> apache commons

  5. development says: 

    @rob, that’s a very useful alternative, thanks a lot for sharing this!

    As the article hinted already a little, the performance indeed differs based on the input String (the article mentions valid/invalid numbers).

    If you’re really doing high performance computations, you might pick a version that performs best on your expected input.

    In general, from your numbers above we see that parseLong is often *much* slower, although in a few occasions it’s faster than regex (66 vs 300, 63 vs 307 and 25 vs 244). Incidentally, isDigit is the winner in precisely those cases.

    Better, yet, isDigit is still the overall winner. In only 1 case it ends up being second, and then the difference is very minor (6 vs 2). Maybe a more extensive benchmark should include even more different kinds of input strings.

    @none
    StringUtils.isNumber could be interesting to include in this test. Thanks!

  6. Jenita says: 

    At last! Smoonee who understands! Thanks for posting!