How to correctly set robots.txt on WordPress website
File header rules:
The beginning of robots.txt file starts with User-agent:, which is used to specify search engine spiders. If you want to target Google search spiders, you can enter
User-agent: Googlebot
If you want to work on all search engines, enter
User-agent: *
Disallow rules:
Disallow: /abc means prohibiting access to abc.php, abc.html and all files under the abc folder.
Disallow: /abc/ means prohibiting access to all files under the abc folder, but not restricting abc.php and abc.html files.
Allow rules:
Allow rules are the same as Disallow.
Sitemap rules:
Sitemap is used to tell search engines the location of the sitemap
Sitemap: http://your domain name/sitemap.xml
Where sitemap.xml is the sitemap file of your website
To avoid including WordPress system files:
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins
Disallow: /wp-content/themes
To avoid including duplicate content:
Disallow: /feed
Disallow: /articles/*/feed
The full text of robots.txt is summarized as follows:
User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins
Disallow: /wp-content/themes
Disallow: /feed
Disallow: /articles/*/feed
Sitemap: http://yourdomain/sitemap.xml
Comments (0)