Detecting literal IP address from string
I'm creating a network-heavy program, and at various points in the program, for various purposes, I need to identify literal IP addresses from hostnames.
I'm aware that it's possible by:
1. Reading the RFCs and creating appropriate regexs
2. Creating an InetAddress and comparing with getHostname.
However, #1 isn't exactly scalable with potential future addressing methods (Please, I don't want this to become a discussion about the chances of those arising, so let's settle for "never say never"), and #2 would either generate a look-up or a reverse look-up, which is particularily undesireable when processing large lists of addresses for bookkeeping purposes only.
My question is, hence, is there a scalable way to identify literal IP addresses without generating a lookup?
Alternately, is there a way to create an InetAddress from String that guarantees no lookup? (Either throws, or gives 0-length address if not literal)
# 1
What is a literal IP address?
If you're trying to get an IP address from a hostname, you have to do a lookup. The result might come from something like /etc/hosts, or it might come from DNS or something else.
I don't know what you mean about using regex to get an IP from a hostname.
# 2
A java.net.InetAddress is only resolved by the DNS when somebody calls one of the methods that do that. If you just construct an InetAddress from a literal string, you can compare them without fear of DNS lookups and you also get the benefit of supporting all present and future IP addressing syntaxes.
ejpa at 2007-7-12 1:09:46 >

# 3
I don't think I was clearly understood.
What I need is a function like:
boolean isLiteralIP(String ip);
So that:
boolean ret;
ret = isLiteralIP("192.168.1.1"); // ret = true
ret = isLiteralIP("205.123.43.3"); // ret = true
ret = isLiteralIP("www.yahoo.com"); // ret = false
ret = isLiteralIP("sdajfdla"); // ret = false
ret = isLiteralIP("1080:0:0:0:8:800:200C:417A"); // ret = true
A possible implementation is:
boolean isLiteralIP(String ip) {
try {
return InetAddress.getByName(ip).getHostName().equals(ip);
}
catch(UnknownHostException e) {
return false;
}
}
This works since getByName stores the used hostname for non-literal IPs, and returns it with getHostName(reverse lookup is done only on getCanonicalHostName). For literal IPs, getHostName will perform a reverse lookup, so it won't be the same as the given string. For invalid strings or non-existing hosts it will throw and hence return false.
The problem with the above solution is that for a literal IP it performs a reverse lookup, and for a non-literal IP it performs a lookup. This should be a pure string check, and should not invoke network requests of any sort (Particularily not of the slow, blocking type).
An alternate solution is to keep regexs like "([0..9]+)(\.[0..9]+){3}", but this also has several obvious problems:
1. The RFCs don't exactly come with regexs. They describe them, more or less, but from the RFC to a working regex covering the entire range of allowed possibilities is non-trivial.
2. There's no guarantee the java.net string-parsing covers the entire regex range, and where they do not match, for the purposes of java programming, the java.net implementation takes precedence (particularily if you want to avoid the possibility of lookup on a getByName call). The javadocs don't come with regexs either, and give no guarantee that what works with one version of java will work in the next (Sun might decide to "fix" their internal parsing to include or exclude certain possibilities at any point).
3. It doesn't scale. Just for example, consider using only the above regex. It wouldn't support IPv6. Future Java implementations are quite likely, if not guaranteed, to support any future IP versions (like I said,"never say never"). Hence, a scalable solution should use some interface in java.net or some other Sun-sponsored API.
An alternative to the above isLiteralIP function would be a function that returns InetAddress from String but throws something like "HostNameResolutionRequired" or simply returns null if it is not a literal string (That is, it does not attempt any network communication under any circumstance).
Obviously, this is equivalent, since one can be coded from the other("if (!isLiteral(...)) throw ... else return InetAddress.getByName(...);", "try {...} catch(...) {return false;} return true;"). Incidentally, I require both.
# 5
> I'm aware that it's possible by:
> 1. Reading the RFCs and creating appropriate regexs
> 2. Creating an InetAddress and comparing with
> getHostname.
>
> However, #1 isn't exactly scalable with potential
> future addressing methods (Please, I don't want this
> to become a discussion about the chances of those
> arising, so let's settle for "never say never"), and
> #2 would either generate a look-up or a reverse
> look-up, which is particularily undesireable when
> processing large lists of addresses for bookkeeping
> purposes only.
>
> My question is, hence, is there a scalable way to
> identify literal IP addresses without generating a
> lookup?
As stated no.
You have to allow an some way for an algorithm to exist. Your requirements eliminate all possibilities.
Relaxing your requirements you could us a regex and require that the value comes from a configuration file. (Although if it was me I would question the assumption that the code would last long enough in such a state that it requires no changes and yet a new IP addressing strategy comes into existance and is used extensively.)
# 6
It's debatable to what extent the IPv6 addressing format will come into general use actually but I would have thought 128 bits plus NAT plus the other provisions of IPv6 will be enough for quite a while. I don't expect to live to see IPv7.
ejpa at 2007-7-12 1:09:46 >

# 7
> You have to allow an some way for an algorithm to exist. Your requirements eliminate all possibilities.
The algorithm does exist, sort of. The Java implementation of InetAddress.getByName must identify whether the input string is a literal address or not to know if to perform a lookup or not. It does this by analyzing the string, since DNS servers don't do this work for you, and, last I've checked, Win32API doesn't offer this service either(don't know about other OSs, though).
It's also perfectly scalable since as long as I keep my run-time up-to-date, I'm getting up-to-date translation.
The only problem is, InetAddress.getByName doesn't make said result available to me, the end-developer(woo, I just coined a term). It uses it right away, and doesn't give me a way to peek, or at least prevent the lookup if it deems it necessary.
What I was wondering is if there's some alternate function within the Java API using the exact same code for detecting the literal address, but giving it to me instead of using it right away. Alternately, I'm wondering if some combination of calls contains at least one function which uses said code, allows me to determine what the result was, but doesn't permit any lookup to occur.
If not, I may have to request it for Java 7.
> Although if it was me I would question the assumption that the code would last long enough in such a state that it requires no changes and yet a new IP addressing strategy comes into existance and is used extensively.
Look at it this way. Currently my code fully supports IPv6 without me writing a single character especially for that purpose. That is because Java automatically determines if an address is IPv4 or IPv6(or hostname), and supports sockets for both to run on the same code(using the generic InetAddress, which acts as both).
For the whole connection process, converting addresses to strings, getting addresses from running sockets, and pretty much everything else other than the whole literal IP detection, just updating the JRE(or, at most, recompiling the exact same code, if binary compatibility is compromised) would give me full support for any new IP versions or addressing methods.
