There’s a lot of material on how to use specific tools to figure out what keywords your competitor deems important. There are two problems with most of these articles:
- They assume that the competition is actually SEO savvy
- They are rarely scalable
In this article, I’ll be showing an easy to get massive amounts of raw data to analyse by capitalizing on the functions of two free tools, Xenu and SEO Tools for Excel, as well as a paid solution, SEMRush.
Step 1: Gathering Your Competitions URLs
The first step is deciding whether you want to crawl your competitor’s whole site or focus on a specific segment such as a specific category or ranking URLs. While gathering data might seem more attractive, it can sometimes bring a lot of dirty data by including pages that have no SEO value or are not relevant to the niches that you and your competitor are fighting on.
After you’ve explored your competitor’s website for a few minutes, you’ll definitely have a better idea of which sections of the sites are more interesting to gather data on. There are two methods to go about getting this info:
- Gather all ranking URLs using SEMRush and segment by keyword group or category directory and filter out all URLs you are not interested in.
- Crawl specific sections of a website and perform a valid text/url report using Xenu: Link Sleuth.
The advantage of the latter is that you can also easily gather the title and meta descriptions for each page in a single shot. It also gives you a better idea of the overall strategy that your competition when it comes to content optimization. If you care about competitors on specific rankings, you should be doing both a backlink audit of the competition as well as an onsite of the specific pages.
Crawling Your Competitors
If you want a more in-depth view of client’s onsite keyword strategy, Xenu or Screaming Frog, if you’ve got the money, are your best friends. As I don’t have much experience working with Screaming Frog, I’ll focus on using Xenu.
The most important thing to do before starting your Xenu scan is setting up some basic options. Most computers can easily handle 100 parallel threads, so I tend to leave it like that because I want data fast. There are some people who claim that the ideal number is 70, but I’ve seen no data to suggest that there is a huge difference between the numbers of URLs crawled.
I’ll also limit the maximum depth depending on the type of site I’m focusing on. For corporate sites, I usually limit to a depth of around 5. This limit tends to get their most important page and gives a basic architecture of the site. Using Xenu for publisher sites or sites capitalizing mostly on user-generated content, deeper crawls are useful but holistic pictures are much harder to gather. The end result, even with such a limited depth, ends up being quite large depending on the site’s architecture.
It is possible to focus on sections of the site, if only a specific section is needed. All that you need to do is either start the crawl from a specific subcategory of the site or change the rules controlling external/internal pages.
As for report settings, I prefer to just export the totality of the crawl and clean up the non-text/urls in EditPad because it also provides me with the title and meta descriptions. You can also just use the Valid Text URL report if you want to spend some extra time in Excel, but using this method will not provide you with the titles and meta descriptions of each page.
Step 2: Scrape to Your Heart’s Content
If you don’t already have it, you’ll need to get SEO Tools. You’ll also need to brush up on your XPath if you really want to get a detailed idea of how well your competition is writing their copy. For the uninitiated, XPath is an XML Language used to navigate through XML documents like an HTML file. So let’s say you’re interested in seeing what your competition is using as the first header on the page you could use the following XPath with the SEO Tools scraper:
Luckily, for most basic scraping, SEO Tools has a bunch of presets for headers, titles, meta descriptions, etc. These alone are good enough to perform a basic competitor research. Most smart competitors will have keyword rich titles, headers and meta descriptions. Where it gets really interesting is when you can start actually analyzing their content.
Making Macros for Content
There are two limitations when dealing with the XPath scraper in Excel. First, it will fail if you use double-quotes so be extra attentive. Second, it always returns a single element. For example, if I tried to use this query to scrape my article on SEJ:
I would only get the first paragraph as a result. This is even the case if I use more advanced queries to select multiple paragraphs such as this one which still only returns the second paragraph:
//*/div/p[position() >= 1 and position() < 21]
If you’re really interested in scraping the content to do mass analysis, you could always create a macro to automate the function:
- Calculate the occurrences of <p> using IsFoundOnPage
- Create a loop using the appropriate XPath for content and run through XPath query using number attributes as follows while posting results in a separate sheet:
- When there is the first error, exit loop.
- Concatenate all successful retrieved values and copy paste value into appropriate row on main sheet.
Optional Step: Check PPC Ad Text
Last but not least, it is possible to do an overview of the PPC Ad Copy. Depending on the site this can be a real chore to go through because automation is very common for sites with user-generated content. Your best bet is to crawl a few pages manually before deciding whether to export the data or not.
PPC ad copy can be the most useful overall when it comes to competitor keyword research as it shows what keywords your competitor is willing to spend money on. They should only be bidding on keywords that convert and cost less than they get in return.
Step 3: Analyze the Data
There are numerous ways to analyze the data, but in the end gathering the raw data is usually the most time consuming process when trying to judge your competitions content and keyword strategy. Now, you have an easy step by step guide that makes the process much faster and organized.
After all this is done, you should end up with multiple rows of data for URLs that include titles, meta descriptions, h1 tags, h2 tags, h3 tags and even the whole body content.
This post was originally written by Philip Tomlinson.