Thank you so much Josh Truly! – James Gianoutsos
This week I was able to work with a company that sells skincare and beauty treatments. They have created a massively popular blog within there industry. So when they woke up one morning to a 80% drop in his traffic.
Nicki is the founder and CEO of Future Drem. It is a WordPress Blog related to health and beauty, skin care, nutrition that receives tens of thousands of unique visits every month. It has also been featured in The New York Times, SHAPE, Redbook, Fitness,and many other top publications.
Nicki had a huge shock when she woke up this week. She had received emails from Google Webmaster tools that her site was receiving a lot of errors, specifically a 403 forbidden error. When she and her tech admin Jim took a look at their Webmaster Tools they found a 80% drop or search engine traffic.
So I got to work on finding what had caused these errors and quickly. I started by looking at all the things that could of caused these errors.
First off what is a 403 (Forbidden) Error?
Very simply the sever that the website is hosted on is refusing to ‘serve’ the request to the user or in this case search engine robots like Googlebot, Bingbot and Yahoobot. So when Google asked to see a fresh copy of the website for their web index the server refused them.
The server is refusing the request. If you see that Googlebot received this status code when trying to crawl valid pages of your site (you can see this on the Crawl Errors page under Health in Google Webmaster Tools), it’s possible that your server or host is blocking Googlebot’s access.
So if this is what it is, what causes it? Well like most things online there can be many different reasons so we had to quickly check these to find out what was the real issue.
1) Plugin Conflict
One of the most common reasons for 403 errors that I have seen are plugin’s that relate to security or to cashing. Sometime they have settings that can block robots from accessing the site although this does prevent spam and some types of hacking attacks they also can block the good robots such as search engines robots.
So my first port of call was to disable and security and caching plugins and try to get Google to recrawl the site. However this had no effect.
2) Coding Errors
Sometimes errors in your websites code can cause these 403 errors. For example DOCTYPE errors can not only mess up your site but also how robots see your site. To see if the site had any major errors I put the URL into http://validator.w3.org. It’s likely to pick up a few minor errors put keep a look out for any major ones.
After putting the site through the validator I found a few minor errors but nothing to be concerned about.
3) Is the site itself blocking Google Bots?
One other major thing to check it the Robots.txt file. This can normally be found by typing in yourwebsite.com/robot.txt. It should say something like Allow all Robots. This was FutureDrem’s Robot file and it looks very normal
Robots allowed: All robots
It’s saying allow all robots and disallow any robots from looking in the following directories. This is pretty standing and stops automated software trying to log into your sites admin panel.
4) Check .Htcaccess check servers file wall
The next step would be to check the .htcaccess for changes and any errors, Check the new server’s firewall and any security settings, permission for you files/folders (755 and files 644 )
From my research I found something that can fix theses errors if there is not a major issue such as submitting the site to Googles addurl (link). This can kick start the process of Google indexing the site.
Also adding Bing Webmaster tools, this isn’t really a fix but gives you a bit more information. From this you can see if Google is being block alone or if it is all search engine robots.
Do a reverse look-up on the IP address, of you are on shared hosting you will be able to see if other sites on your server are have similar problems. This would point to a clear server issue.
Neither of these had an effect on the index status of the site.
6) Looking for any changes
Google gave us the date when the errors started to appear. This was a huge clue as to what happened a few days before the 403 errors appeared they site upgraded VPS servers
So in this case it was very strange we found a second robot.txt file in a root folder before the public_html folder on the server:
BrowserMatchNoCase “alexabot” badbots
# BrowserMatchNoCase “iphone” badbots
BrowserMatchNoCase “googlebot” badbots
BrowserMatchNoCase “bingbot” badbots
Allow from all
Deny from env=badbots
This was stopping the the GoogleBot from every reaching the main robots.txt file. We quickly correct the bad robot.txt file and the errors started to disappear and traffic return to the site. We were unsure if this if this second robot.txt file was left the the last user of the server or had been maliciously put there but we recommend a full security audit just to be sure.
If you are having issues in Google Webmaster Tool or have seen an increase in 403 errors give a call and see if I can help you get you site flowing with traffic again.
Thank you so much Josh Truly! – James Gianoutsos