Robots.txt block URL starting with +

Question

Could you please let me know how to block such URLs from robots.txt for Googlebots to stop indexing?

http://www.example.com/+rt6s4ayv1e/d112587/ia0g64491218q

My website was hacked which is now recovered but the hacker indexed 5000 URLs in Google and now I get error 404 on random generated links as above all starting with /+ like above link.

I was wondering if there is a quick way other than to manually remove these URLs from the google webmaster tools?

Can we block this with robots.txt to URLs starting with + sign?

There is nothing special about + (plus) in the URL-path, it is just a character like any other. — w3dk, 2 hours ago

w3dk · Answer 1 · 2016-11-22 13:08:03Z

My website was hacked which is now recovered but the hacker indexed 5000 URLs in Google and now I get error 404

A 404 is probably preferable to blocking with robots.txt if you want these URLs dropped from the search engines (ie. Google). If you block crawling then the URL could still remain indexed. (Note that robots.txt primarily blocks crawling, not indexing.)

If you want to "speed up" the de-indexing of these URLs then you could perhaps serve a "410 Gone" instead of the usual "404 Not Found". You could do something like the following with mod_rewrite (Apache) in your root .htaccess file:

RewriteEngine On
RewriteRule ^+ - [G]

Sven · Answer 2 · 2016-11-22 13:02:30Z

up vote 2 down vote

User-Agent: *  
Disallow: /+

should do what you want. It will tell the robot to not request all URLs starting with a +.

edited 2 hours ago

answered 3 hours ago

Sven

1793

add a comment |

davidbl · Answer 3 · 2016-11-22 14:00:37Z

up vote 1 down vote

If you really want to use robots.txt this would be a simple answer to your question. Also i have included a link to where you can read on the specifications on robots.txt.

User-agent: *
Disallow: /+

Read about robots.txt specs

But one other alternative might be to use .htaccess to make a rewrite rule (if you use Apache etc) to catch them and perhaps tell Google a better return HTTP code or to simply redirect the traffic to some other page.

edited 1 hour ago

answered 3 hours ago

davidbl

213

1

There is no need for the * (asterisk) at the end of the URL-path. It should be removed for greatest spider-compatibility. robots.txt is already prefix matching, so /+* is the same as /+ for bots that support wildcards, and for bots that don't support wildcards then /+* will not match at all. – w3dk 2 hours ago

You are right, i just wrote that based on his question about Googlebot. I have edited it to reflect better compatibility against multiple bots. – davidbl 1 hour ago

add a comment |

asked	today
viewed	265 times
active	today

current community

your communities

more stack exchange communities

Robots.txt block URL starting with +

migrated from serverfault.com 3 hours ago

3 Answers 3

Your Answer

Hot Network Questions

current community

your communities

more stack exchange communities

Robots.txt block URL starting with +

migrated from serverfault.com 3 hours ago

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Hot Network Questions