Detecting... | 00:00:00

· Kh.Abdul · Technical SEO  · 5 min read

robots.txt vs noindex: What Is the Difference and When to Use Each

Robots.txt and noindex both stop Google from showing your pages but they work at completely different stages. Using the wrong one is one of the most common technical SEO mistakes.

Robots.txt and noindex are the two main tools for controlling what Google can and cannot include in its search results. They look similar on the surface because both can stop a page from appearing in Google. But they work at completely different stages of how Google processes your site.

Using the wrong one is one of the most common and damaging technical SEO mistakes. In 2018, a well-known SEO agency accidentally added Disallow: / to a client’s robots.txt during a site migration. The entire site vanished from Google overnight. It took three weeks to recover because the client had not noticed until traffic had already dropped.

This guide explains exactly how each directive works, when to use which, and the specific scenario where combining them can backfire.


The core difference

robots.txtnoindex
What it controlsWhether Google can crawl the URLWhether Google can index the URL
Where it lives/robots.txt file at your root domainmeta tag in head or HTTP response header
When Google reads itBefore visiting the pageAfter visiting and reading the page
Can pass link equityNo. Blocked pages cannot pass PageRankYes. Google crawls the page and can follow its links

How robots.txt works

Your robots.txt file sits at yourdomain.com/robots.txt. It tells search engine crawlers which URLs they are allowed to visit before they ever request the page.

User-agent: Googlebot
Disallow: /admin/
Disallow: /checkout/
Allow: /

When Googlebot sees a Disallow rule matching a URL, it does not visit that URL at all. It has no idea what content is on the page.

The critical implication: If Google has already indexed a URL from a previous crawl before you added the rule, blocking it in robots.txt will not remove it from Google’s index. The page stays indexed. Google just cannot recrawl it to update or remove it.

Robots.txt controls the crawler. It does not control the index.


How noindex works

A noindex directive tells Google: you can visit this page, but do not include it in search results.

It lives either in the page’s HTML head:

<meta name="robots" content="noindex">

Or in the HTTP response header:

X-Robots-Tag: noindex

Google visits the page, reads and processes its content, finds the noindex tag, and then excludes the page from its index. The page is crawled but not shown in search results.

The critical implication: Google still uses the crawl to discover links on the page. A noindexed page can still pass PageRank through its outbound links. A robots.txt-blocked page cannot.


The dangerous combination to avoid

This is where most people go wrong.

Do not put a noindex tag on a page that is also blocked by robots.txt.

If robots.txt blocks a URL, Google cannot read the page at all, which means it cannot read the noindex tag either. The result: the page may stay indexed indefinitely because Google cannot see the instruction to remove it.

John Mueller from Google has confirmed this directly in multiple Google Search Central hangouts. He described it as one of the most frequent errors he sees on large sites during migrations.

If you want to completely remove a page from Google’s index:

  1. Remove it from robots.txt so Googlebot can visit the page
  2. Add noindex to the page
  3. Wait for Google to recrawl and process the noindex tag
  4. After you confirm the page is deindexed in Search Console, optionally re-add the robots.txt block if you also want to save crawl budget

The Google Search Console URL Inspection tool will tell you if a page is “blocked by robots.txt” and still indexed. This is a common issue on large sites where robots.txt rules were added after pages were already indexed.


When to use robots.txt

Use robots.txt to save crawl budget and prevent Google from spending time on pages that have no SEO value and should never be indexed:

  • Admin and login pages such as /admin/, /login/, /wp-admin/
  • Internal search result pages such as /search?q=
  • Staging or development environments
  • User account pages
  • Duplicate content generated by URL parameters when you cannot use canonical tags
  • Large volumes of files that generate noise such as JavaScript files or CSS files if not needed

The key rule: only use robots.txt on pages that are genuinely not useful to index and that you do not need Google to read for link discovery purposes.


When to use noindex

Use noindex when Google can crawl the page but you do not want it to appear in search results:

  • Thank-you and confirmation pages
  • Thin or duplicate content pages that have internal links worth following
  • Tag and category archive pages on blogs that add no unique value
  • Paginated pages beyond page 1 in some cases
  • Privacy policy and legal pages (optional, some prefer to index these)
  • Out-of-stock product pages with no alternative content

The key rule: use noindex when crawlability is fine but indexability is the problem.


Quick reference guide

ScenarioUse
Admin panel you never want Google to touchrobots.txt
Thank-you page after a form submissionnoindex
Staging siterobots.txt on the whole domain
Thin tag page with useful outbound linksnoindex, not robots.txt
URL parameter duplicate such as ?sort=pricerobots.txt or canonical tag
Out-of-stock product pagenoindex or improve content
Page indexed in error, needs removing nowAllow in robots.txt, then add noindex
JavaScript or CSS resourcesrobots.txt

How to check which directive is affecting a page

Use the URL Inspection tool in Google Search Console:

  1. Enter the URL you want to check
  2. Look at Coverage. It will tell you if the page is Indexed, Blocked by robots.txt, Excluded by noindex, or another status
  3. Click “View crawled page” to see what Google actually saw on its last visit

If the status says “Blocked by robots.txt” but the page still appears in search results, that is the dangerous combination described above.

For a broader look at crawl-related issues in Search Console, see how to fix crawl errors in Google Search Console.


Frequently asked questions

Does robots.txt remove a page from Google search results?

Not reliably. Robots.txt prevents Google from crawling a URL, but if that URL was already indexed before the rule was added, the page can remain in search results indefinitely. Google cannot recrawl the page to process any removal signals. To reliably remove a page from results, use noindex (which requires the page to be crawlable), or the URL Removal tool in Search Console for urgent cases.

Can I use both robots.txt and noindex on the same page?

You can, but be careful. If robots.txt blocks the page, Google cannot read the noindex tag, making the noindex instruction invisible to Googlebot. If you want noindex to take effect, the page must be accessible to Googlebot. Add noindex first, confirm the page is deindexed, then optionally re-add the robots.txt rule to save crawl budget.

Does blocking a page in robots.txt save crawl budget?

Yes. Googlebot has a crawl budget, a limit on how many pages it will crawl per site per day. Blocking low-value pages in robots.txt frees up that budget for pages you actually want indexed. On large sites with thousands of pages, this can meaningfully improve how quickly new and updated content gets crawled.

What is the difference between noindex and canonical tags?

A canonical tag tells Google this page is a duplicate of another URL, treat that other URL as the original. A noindex tag tells Google do not include this page in search results at all. Canonical tags are for managing duplicates while keeping pages crawlable and potentially visible. Noindex is for pages you want completely excluded from search results.

Will noindex pages still appear in Google if they have many backlinks?

Google will generally respect a noindex directive even on pages with many backlinks, but it may take multiple recrawls before the page is fully removed from the index. The more frequently a page is crawled, driven by its authority and links, the faster it will be deindexed after you add noindex.

How do I know if my robots.txt is blocking important pages?

Go to Google Search Console, then Indexing, then Pages, and look for pages with the "Blocked by robots.txt" status. Then check whether any of those pages should actually be indexed. You can also test specific URLs in the robots.txt tester within Search Console Settings.

Back to Blog

Related Posts

View All Posts »

Does Page Speed Affect SEO Rankings?

Yes, page speed affects SEO. But probably not in the way most people think. Here is exactly what Google measures, what the thresholds are, and where to focus your effort.