How To Check Whether an External URL Exists with WordPress’ HTTP API

September 18, 2012 - 6 minutes read

As an avid WordPress developer (hopefully as I am), you might run into a scenario where you’ll have to check whether an external URL exists or not (Meaning, if the URL responds with a 404 status code or a 202, which means the page exists – in most cases). For instance, before attempting to download a file, you might want to make sure it actually exists, and that the target URL is not leading to a 404 page.

Another use case scenario might be if you want to verify a URL when a user posts a new comment. This can actually help if your site is being regularly targeted by spammers (happens to the best of us). This was my case.

Obviously, that URL does not exist, unless you go ahead and buy that domain right now!

Step-by-Step Walk-through

So here’s a short description, step by step, of how this thing works:

Quick note: When dealing with HTTP requests in WordPress you should always, always use WordPress’ built-in WP_HTTP API. And it has much more to it than just best practices.

  1. We use the pre_comment_author_url filter that executes when a user submits a new comment. The URL is passed as a variable ($comment_author_url).
  2. We make sure we’re not on the admin side of WordPress, so in case an administrator decides, against any common sense, to assign a  broken URL to a comment, they will be able to do it. We also make sure the URL field is not empty before continuing.
  3. At this point, we initiate a HTTP HEAD request to the target URL. The reason I chose to perform a HEAD request as opposed to a regular GET request is simply because a GET request will also serve the HTML code, which we don’t really need in our case. Theoretically speaking, we should have enough information in the response headers to determine weather the URL exists or not.
  4. We define an array, $accepted_status_codes, which stores HTTP status codes that we consider as accepted.
  5. Now we make sure the HTTP request was successful and that the response code matches one of the status codes we defined earlier. If all is positive, it means that the URL exists, or at least responded with one of the status codes we accepted, so now we can return the URL.
  6. If the request has failed, or if we got a status code that didn’t match, whether it is a 404 or 500, we return an empty string, since we don’t want to show a URL that does not exist.

The Actual Snippet

Here’s what makes it all possible.

function maor_verify_comment_author_url( $comment_author_url ) {
	/* Make sure the visitor actually filled up the URL field, and that we're not on the dashboard. */
	if ( ! is_admin() && '' != $comment_author_url ) {
		/* Using a HEAD request, we'll be able to know if the URL actually exists.
		 * the reason we're not using a GET request is because it might take (much) longer.
		 * To make sure the user doesn't wait too much, we limit the request overall duration to 5 seconds. */
		$response = wp_remote_head( $comment_author_url, array( 'timeout' => 5 ) );
		/* We'll match these status codes against the HTTP response. */
		$accepted_status_codes = array( 200, 301, 302 );

		/* If no error occured and the status code matches one of the above, go on... */
		if ( ! is_wp_error( $response ) && in_array( wp_remote_retrieve_response_code( $response ), $accepted_status_codes ) ) {
			/* Target URL exists. Let's return the (working) URL */
			return $comment_author_url;
		/* If we have reached this point, it means that either the HEAD request didn't work or that the URL
		 * doesn't exist. This is a fallback so we don't show the malformed URL */
		return '';
	return $comment_author_url;
/* We're using a low priority because WordPress is running esc_url using this filter,
 * so running the test afterwards gives us the advantage since the URL is clean now . */
add_filter( 'pre_comment_author_url', 'maor_verify_comment_author_url', 11 );

Please note that this code is not bulletproof, and will not work in all scenarios. For an example, there are severs configured in a way that if a nonexistent URL is being requested, a 301 response code will be issued, referring to the site’s homepage. And I’ve seen that happen. So, in case you’re interested in using this snippet, feel free to use it, but it’s your own responsibility to test the code on your WordPress install.

I have tested this code on my development server and it seems to be working just fine. If for some reason it doesn’t work for you, or works differently than expected, please let me know in the comments. Also if you have any questions about this piece of code, lay them down on me!

Happy coding!