search engines are answer machines. They exist to discover, understand, and organize the internet’s content in order to offer the most relevant results to the questions searchers are asking. In order to show up in search results, your content needs to first be visible to search engines, a lot of processes run to find out your website and one of them is crawling. It’s arguably the most important piece of the SEO puzzle: If your site can’t be found, there’s no way you’ll ever show up in the SERPs (Search Engine Results Page).
Firstly, How do search engines work?
Search engines work through three primary functions:
1- Crawling:
Scour the Internet for content, looking over the code/content for each URL they find.
2- Indexing:
Store and organize the content found during the crawling process. Once a page is in the index, it’s in the running to be displayed as a result of relevant queries.
3- Ranking:
Provide the pieces of content that will best answer a searcher’s query, which means that results are ordered by most relevant to least relevant.
Crawling:
Crawling is the discovery process in which search engines send out a team of robots (known as crawlers or spiders) to find new and updated content.
Content can vary — it could be a webpage, an image, a video, a PDF, etc. — but regardless of the format, content is discovered by links.
What’s the crawling word mean?
Googlebot starts out by fetching a few web pages and then following the links on those web pages to find new URLs. By hopping along this path of links, the crawler is able to find new content and add it to their index called Caffeine — a massive database of discovered URLs — to later be retrieved when a searcher is seeking information that the content on that URL is a good match for.
Can search engines find your pages by crawling?
As you’ve just learned, making sure your site gets crawled and indexed is a prerequisite to showing up in the SERPs. If you already have a website, it might be a good idea to start off by seeing how many of your pages are in the index.
This will yield some great insights into whether Google is crawling and finding all the pages you want it to, and none that you don’t.
One way to check your indexed pages is “site:yourdomain.com”, an advanced search operator. Head to Google and type “site:yourdomain.com” into the search bar. This will return results Google has in its index for the site specified:
For accurate results, monitor and use the Index Coverage report in Google Search Console. You can sign up for a free Google Search Console account if you don’t currently have one. With this tool, you can submit sitemaps for your site and monitor how many submitted pages have actually been added to Google’s index, among other things.
If you’re not showing up anywhere in the search results, there are a few possible reasons why that prevent crawling to do that:
- Your site is brand new and hasn’t been crawled yet.
- The site isn’t linked to any external websites.
- The website navigation makes it hard for a robot to crawl it effectively.
- The Website contains some basic code called crawler directives that is blocking search engines.
- Site has been penalized by Google for spammy tactics.
Tell search engines how crawling your site:
If you used Google Search Console or the “site:domain.com” advanced search operator and found that some of your important pages are missing from the index and/or some of your unimportant pages have been mistakenly indexed, there are some optimizations you can implement to better direct Googlebot how to crawl web content. Telling search engines how to crawl your site can give you better control of what ends up in the index.
Can crawlers find all your important content?
Sometimes a search engine will be able to find parts of your site by crawling, but other pages or sections might be obscured for one reason or another.
It’s important to make sure that search engines are able to discover all the content you want indexing and not just your homepage.
Is your content hidden behind login forms?
If you require users to log in, fill out forms, or answer surveys before accessing certain content, search engines won’t see those protected pages. A crawler is definitely not going to log in.
Are you relying on search forms?
Robots cannot use search forms. Some individuals believe that if they place a search box on their site, search engines will be able to find everything that their visitors search for.
Is text hidden within non-text content?
Non-text media forms (images, video, GIFs, etc.) should not be used to display text that you wish to be indexed. While search engines are getting better at recognizing images, there’s no guarantee they will be able to read and understand them just yet. It’s always best to add text within the <HTML> markup of your webpage.
Can search engines follow your site navigation?
Just as a crawler needs to discover your site via links from other sites, it needs a path of links on your own site to guide it from page to page.
If you’ve got a page you want search engines to find but it isn’t linked to from any other pages, it’s as good as invisible. Many sites make the critical mistake of structuring their navigation in ways that are inaccessible to search engines, hindering their ability to get listed in search results.
Common navigation mistakes that can keep crawlers from seeing all of your sites:
- Having a mobile navigation that shows different results than your desktop navigation
Any type of navigation where the menu items are not in the HTML, such as JavaScript-enabled navigations. Google has gotten much better at crawling and understanding Javascript, but it’s still not a perfect process.
The more surefire way to ensure something gets found, understood, and indexed by Google is by putting it in the HTML.
- Personalization, or showing unique navigation to a specific type of visitor versus others, could appear to be cloaking to a search engine crawler.
- Forgetting to link to a primary page on your website through your navigation — remember, links are the paths crawlers follow to new pages!
This is why it’s essential that your website has clear navigation and helpful URL folder structures.