Saturday, August 8, 2009

Validate IPv4 in Java using Regular Expressions

Validating an IP is a very easy task in Java. IPv4 structure is combined from 4 parts, each part moves from 0 to 255. So in order to validate an IP, we need to split it to 4 parts (according the the “.” character) and check that each part (octet) is a numeric value that lies in the range of 0 to 255. Here is an example implementing IP validation:

package com.bashan.blog.ip;
import org.apache.commons.lang.StringUtils;
public class IpValidate {
  public static boolean isValidIp2(String ip) {
    String[] octets = ip.split("\\.");
    if (ip.endsWith(".") || octets.length != 4) {
      return false;
    }
    for (String octet : octets) {
      if (StringUtils.isNumeric(octet)) {
        int num = Integer.parseInt(octet);
        if (num < 0 || num > 255) {
          return false;
        }
      } else {
        return false;
      }
    }
    return true;
  }
}

Note that this code uses StringUtils.isNumeric function taken from Apache Commons Lang. It is possible to skip this function by simply putting the Integer.parseInt function in a “try” and “catch” expression and catching the exception: NumberFormatException.

When dealing with text validations, usually the first thing coming in mind is taking advantage of Regular Expressions. But, is it a good solution for validating IP? Well, the answer is a bit more complex than it looks. Regular expressions is a great tool for validating and extracting data from text. But when it comes to numerical ranges, it doesn’t give a good solution. checking only if a text value is in the range of 0 to 255 yields the following regular expression:

(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)

This expression is quite straight forward: We are checking if a given text:

  • Starting with 25 and then any number from 0 to 5.
  • OR text is starting with 2 and then any number from 0 to 4 and then any number from 0 to 9.
  • OR text is starting with 01 and then a number from 0 to 9 or text is starting with number from 0 to 9 and then another number from 0 to 9.

This whole expression is for checking if a single number is between 0 to 255 only!

So, does it really worth bothering constructing such a complex expression for simply validating an IP?

We will consider 2 main things in order to answer that question:

  • Does the code of of validating an IP using regular expression is really simpler?
  • Does it perform better?

To answer the first question we will simply look at the complete function for validating an IP using regular expressions:

package com.bashan.blog.ip;
import java.util.regex.Pattern;
public class IpValidate {
  public static final String _255 = "(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)";
  public static final Pattern pattern = Pattern.compile("^(?:" + _255 + "\\.){3}" + _255 + "$");
  public static boolean isValidIp(String ip) {
    return pattern.matcher(ip).matches();
  }
}

As you can see, the function itself is much simpler and shorter. The only thing that is quite complex is the regular expression. But it can also be simplified by reusing the expression for finding a digit between 0 to 255.

And what about performance? for this case we will build a small test program. The program will contain 2 methods:

  • isValidIp1 – Validate IP using regular expression.
  • isValidIp2 – Validate IP by splitting a string and checking its parts.

Each method will be called 10 million times with different random IPs. Approximately half of the IPs will be valid and the rest will be invalid. The time for each series of calls will be measured for comparison.

This is our test program:

package com.bashan.blog.ip;
import org.apache.commons.lang.StringUtils;
import java.util.Date;
import java.util.Random;
import java.util.regex.Pattern;
public class IpValidate {
  public static final String _255 = "(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)";
  public static final Pattern pattern = Pattern.compile("^(?:" + _255 + "\\.){3}" + _255 + "$");
  private final static int NUM_TESTS = 10000000;
  private static final Random random = new Random();
  public static boolean isValidIp1(String ip) {
    return pattern.matcher(ip).matches();
  }
  public static boolean isValidIp2(String ip) {
    String[] octets = ip.split("\\.");
    if (ip.endsWith(".") || octets.length != 4) {
      return false;
    }
    for (String octet : octets) {
      if (StringUtils.isNumeric(octet)) {
        int num = Integer.parseInt(octet);
        if (num < 0 || num > 255) {
          return false;
        }
      } else {
        return false;
      }
    }
    return true;
  }
  private static String getRandomIp() {
    return random.nextInt(306) + "." + random.nextInt(306) + "." +
        random.nextInt(306) + "." + random.nextInt(306);
  }
  public static void main(String[] args) {
    int countValid = 0;
    Date date = new Date();
    for (int i = 0; i < NUM_TESTS; i++) {
      if (isValidIp1(getRandomIp())) {
        countValid++;
      }
    }
    System.out.println("\"Regular Expression\" Test:");
    System.out.println("Time in ms: " + (new Date().getTime() - date.getTime()));
    System.out.println("Valid ips: " + countValid + "/" + NUM_TESTS);
    countValid = 0;
    date = new Date();
    for (int i = 0; i < NUM_TESTS; i++) {
      if (isValidIp2(getRandomIp())) {
        countValid++;
      }
    }
    System.out.println();
    System.out.println("\"Split and check range\" Validation Test: ");
    System.out.println("Time in ms: " + (new Date().getTime() - date.getTime()));
    System.out.println("Valid ips: " + countValid + "/" + NUM_TESTS);
  }
}

And this is a sample output:

"Regular Expression" Test:
Time in ms: 12353
Valid ips: 4898508/10000000
"Split and check range" Validation Test:
Time in ms: 18963
Valid ips: 4899584/10000000

We can easily notice that the regular expression IP validation, despite its complex expression, is significantly more efficient with more than 50% better performance!

No comments:

Post a Comment