Plugging the link leaks, part 1 - reclaim links you are throwing away

March 15, 2013

Plugging the link leaks, part 1 – reclaim links you are throwing away

In London hundreds of SEOs have gathered for LinkLove, and as it is a day of sharing tips on getting more links, we thought we would join in.

As the easy self-publishing or submission tactics fall by the wayside, link building has become a far more creative, and time-consuming, process.

But at SEOptimise, as well as building links through content, we also regularly boost clients’ link profiles without typing a word. There’s no asking for links, nor risking the wrath of Google’s anti-spam team.

This is link reclamation – fixing existing links that point to broken or inefficiently redirected pages on your site.

As Ian Lurie pointed out in one of his excellent webinars last year, before worrying about various creative methods of generating links, “get the #@@!#@$ easy links” first. And link reclamation is just that – it might take a couple of hours to complete, but can be a boost for any campaign.


What you’ll need for your link reclamation project:

Finding your broken links

Now, the quick-win version of this process is simply to put together a list of all the broken URLs on your site that have external links pointing to them, ready to put 301 redirects in place. There’s nothing wrong with doing this, and it will certainly give you the boost of reclaiming your lost authority, but sometimes we need to know where all the broken links are.

This is so we can see which broken links we should redirect, and which we want to attempt to have fixed on the source URL. Plus, as an agency, it can be advantageous to be able to report all the links we have reclaimed.

So, let’s put together as comprehensive a list of broken links pointing to our target site as possible.

Backlink data

Our first port of call is backlink data. Go to the tool of your choice and look up your site. We’re using Open Site Explorer in the examples here, but Majestic and Ahrefs both also provide perfect data for this. Within the inbound links tab select links from “only external” pages and either “pages on this sub-domain” or “pages on this root domain”, depending on the scope of your project.

Grabbing the backlink data from Open Site Explorer

There’s a whole range of metrics we could use to investigate, but to keep things moving, delete all the columns except for URL, anchor text, page authority, domain authority, followable and target URL.

Doing this allows us to analyse our broken links by PA or DA, and see which are no-followed, helping us decide which links to 301, and which to reach out to have fixed to the correct URL. If you are not using OSE, then Majestic SEO and Ahrefs have their own importance metrics.

Now to find our broken links. Copy the entries in the target URL column, and paste them into a new spreadsheet. Use the remove duplicates feature within the data tab, and save as a .csv or .txt file.

Removing duplicate URLs in Excel

Fire up Screaming Frog, and select ‘List’ from the mode menu. Choose your file of URLs, and start crawling. Once the crawl is complete, select the Response Codes tab and filter to ‘Client Error (4XX)’. You now have a complete list of URLs that external sites are linking to which don’t exist on your server.

No URLs on the list? Congratulations! You have no broken links to fix, and can crack on with working on ways to generate fresh links. If, like most sites we’ve worked with, you have URLs here, export the list.

Exporting all pages resulting i a 404 error from Screaming Frog

Finding 302 redirects

Still in screaming Frog, filter to ‘Redirection (3xx)’, and order the results by the ‘Status Code’ column. Are there any 302 redirects in there? If so, export this list, open in Excel and make the data a table (ctrl+T is the shortcut). Filter by Status Code to find the 302s, and copy the data. Open your exported list of URLs resulting in 404 errors, and paste your 302 data into the spreadsheet. You now have a complete list of linked-to pages we want to fix.

Getting clever

It’s time to prune data again. Delete or hide every column until you are left with just the Address and Status Code columns. Once ready, select all the 404/302 data and copy. Go back to your spreadsheet with OSE data. You need to paste in the two columns, either to the right of the OSE data, or in a new sheet (however you prefer to work).

Now for the (relatively) clever bit. Add a column to the right of your OSE data, and call it ‘status code’, then turn all the OSE data into a table. Now we are going use a VLOOKUP function in the new ‘status code’ column to have Excel tell us which of our OSE links match the 404 errors we found in Screaming Frog.

The code we used is =IFERROR(VLOOKUP(F:F,I:J,2,FALSE),””), with F:F specifying the Target URL column in the OSE data, and I:J the Address and Status Code columns respectively in the Screaming Frog data. (A big hat-tip to Joe and Tamsin for patiently helping me with Excel formulas!)

Alternatively use the Insert Function wizard in the Formulas tab to work through the process, though you will have to add the IFERROR part afterwards.

Our ‘status code’ column should now contain the code from the Screaming Frog data each time one of our external links points to a URL that returns a 404 or 302 code. Simply filter the source code column by 404 and 400 to give you a complete list of broken URLs. You can then reorder this list by PA, DA or by which are followed.

You may also wish to add a ‘date fixed’ column, so you can record when the redirect or edit is in place, and the link starts passing its sweet, sweet authority to your target site.

You can also filter by 302, and instantly have a list of redirects to be changed to 301s, and all the links that suddenly pass all their potential link authority to show your client or boss. Not bad for a few minutes’ work!

Two sources are better than one

So are we done? Not quite; many SEOs work on the premise that using more than one data source is prudent.

Once you have done this process, it’s very quick to do the same again from an alternative source; in our example I might now use Ahrefs. Once we have all my 404s/302s from Ahrefs in a new tab in our spreadsheet, we can create a third tab to combine with the 404s from OSE, using the remove duplicates tool once again.

Of course, the sources cannot share quality metrics – just URL, anchor text and target URL. However, the advantage of using multiple sources to find a greater number of broken links to fix is worthwhile, and we can still filter on individual sheets.
Google Webmaster Tools crawl errors.

To use every available source of external links leading to 404 errors, we need to use Google Webmaster Tools’ ‘Crawl Errors’ report (found under Diagnostics in the menu).

Alas, this is where things become a little more frustrating. As no doubt many of you know, it is impossible to cleanly download a list of each 404 URL address and the links pointing to it, despite the information being available on screen. Plus GWT is not always as up-to-date as we would like. So, we have to use a workaround.

What you can download from GWT is all the broken URLs Google has found on your site. So, our first step is to download this list as a .csv file by selecting Health, then Crawl Errors in the left-hand navigation.

Finding external links resulting in 404 errors in Google Webmaster Tools

Select the ‘Not found’ links, and hit the download button. This file can then be imported using Screaming Frog’s list mode, and all the reported broken URLs checked. Any URLs that are now returning 200 or 301 status codes should be removed from your list, and marked as ‘fixed’ within GWT.

We now have a smaller and accurate list of the broken URLs on our site. Create a new tab in the spreadsheet with the broken links we found in our backlink tool, and create headings for URL, target URL and status code. Unfortunately, there’s now some manual work involved; how much depends on how many 404 errors GWT is reporting.

  • Select each error in turn within GWT
  • Choose the ‘Linked from’ tab
  • Copy all the external URLs pointing to that URL
  • Paste these URLs in the URL column in your spreadsheet
  • Add the target URL you have just been checking to the target URL column

As you can see, if you have a lot of reported external links, this can quickly become quite a pain. One helpful shortcut I have found is the Link Clump extension for Chrome. This allows you to create keyboard and mouse action shortcuts for opening or copying multiple links. I set one for copying all URLs selected to the clipboard. This makes it relatively quick to grab all the URLs for each reported error and paste them into my spreadsheet.

There’s plenty of other great extensions/add-ons that can help with this, such as Scraper for Chrome and Multi Links for Firefox. Please suggest any favourites you have in the comments below!

After a bit of leg-work, you will now have a list of all the source links, and their target URL. The final stage is to ensure that these external URLs still exist, and still link to our site. Doing this is a two stage process, both using the same VLOOKUP method we used earlier.

Copy all the source URLs and paste into a new spreadsheet, then save as a .csv file. Now go back to Screaming Frog and upload the list and crawl all the URLs. Firstly go to Response Codes and filter for any redirects. If you have some you need to export the list.

Open this list and copy the redirect destination URLs, then add these to your master list of URLs from GWT. Next use the same VLOOKUP methodology to remove any URLs that result in a 301 or 302 – we don’t want them in our external link list as they no longer exist, but do want the redirect targets, in case our links are there!

Now go back to Screaming Frog and filter for any client errors (400/404s). If there are any, again export then use the VLOOKUP method to remove them from our list of external links from GWT.

The second step is to check they are still linking to you. Copy the edited column of URLs reported by GWT, and save to (yep, yet another) .csv file. Upload in Screaming Frog and go to the Configuration menu and select Custom to add a bespoke filter. Enter your domain, with or without subdomain depending on your project, and set to ‘does not contain’.

Creating a custom filter in Screaming Frog

Crawl your URL list, then head to the Custom tab and filter to your bespoke filter. This then shows you all the URLs that no longer point to you. Export, copy into your main spreadsheet and VLOOKUP one last time and delete these links. You’ll need to add some form of marker text in a second column so you can see which ones to delete, or use the Status column.

Side note: You may wish to keep a record of these to try and get your site back on them if still relevant – it may be they simply removed the link to you because it was a broken page. Being able to write to the site saying, “You used to link to us and we’d love to be featured once again”, is a great reason to have to contact these sites.

Your final list

So, after a lot of editing, you have a list of broken external links reported by Google Webmaster Tools, plus the page they are linking to. Add these to your master list (the URLs from OSE and Ahrefs), de-duplicate and you have your final list of links to reclaim.

Using the individual sheets for each source you can check each link for importance, deciding which ones to try and have corrected, and which you will simply put a 301 redirect in place for. Of course, as we have recently learned, 301 redirects possibly pass all their authority, but many still prefer to have clean links wherever possible (as previous studies have shown some authority is lost).

So that’s it. It might seem a little complex or time-consuming at first, but the process only takes a couple of hours, or less if Webmaster Tools hasn’t reported too many errors. The rewards vary of course, but if you have an older domain, or one that went through a site migration without SEO assistance, there can be many broken links. We’ve found several hundred links for clients before doing this – worth getting for any site.

To make things a little easier (as this is a long post to follow!), we’ve put together a basic version you can access and copy for your own projects.

There’s plenty more that can be done of course. Another good use of time is finding the sites that linked to you at one point, but no longer do so, as excellently laid out here by Ethan Lloyd at Seer Interactive, and we’ll be bringing you more as well. Happy reclaiming!

By Charlie Williams SEO Share:

14 thoughts on “Plugging the link leaks, part 1 – reclaim links you are throwing away

  1. Dom says:

    Nice tips Charlie. I always go through GWT’s crawling issues first as I know they are solid backlinks. Would you also class searching by brand, telephone and address for missing/incorrect links as link reclamation as well? Can’t wait for part 2.

  2. Ben says:

    Nice read 🙂

    It makes sense to start any new campaign this way, it’s something I like to do myself and I find that there are sometimes some really quick wins, I found in one instance that we had a client who previously had a domain but didn’t redirect the site to the new .com so there was a good few opportunities to be found there!
    I would also like to add that I use the check my links extension for chrome.


  3. Thanks Dom, appreciate your feedback!

    Yeah, I’d class searching for broken links via brand, address and telephone references as reclamation. There’s a few advanced search operators that can quickly yield results this way.

    I guess technically looking for where you are listed but there’s no link falls under non-citing references, but it fits perfectly in with the link reclamation work here. It’s all about finding places where you are already on the page, but not getting the link authority you should be.

    Part 2 is being worked on, shouldn’t be *too* long away…

  4. Hi Ben, thanks for reading – I appreciate it!

    That’s a perfect example of finding links you’ve already got, but all the authority has leaked away. And good shout on Check my links – great tool for checking when on an individual page for broken links (see for those that haven’t got it)

  5. Jeff says:

    Thanks for this great tutorial! These are really nice tips that will come in handy for many people.

  6. Joel says:

    Good tips on reclaiming links. It’s hard enough claiming any good links, so reclaiming those that you already have is much easier than going out to find new ones.

  7. Virgil S. says:

    Thanks for the post which can only help, I had not thought of approaching the problem this way. I do have some older domains which have undoubtedly a whole lot of broken links, so it’s time to get to work on them.

  8. Greg says:

    Never thought of doing this before. This has already helped me reclaim quite a few links already.
    Now lets get started with every other domain I’ve got.

  9. Pingback: What I Have Read This Month – March 2013 | Mocco
  10. Catherine says:

    I really appreciate how clearly this process was explained. Definitely time for some spring cleaning.

  11. Hi Catherine, thanks for your kind comments. If you have any questions about the process when working on it feel free to hit me up on Twitter (@pagesauce) or G+ to ask them.

  12. Ian says:

    Great article,its somthink i dont often think about…its one of those things i put on the backburner and i know i shouldnt,

  13. amazing – very concise and great info. Thanks for doing this!

Leave a Reply

Your email address will not be published. Required fields are marked *