There are two routes for creating a content inventory. One with paid tools, and one without.
On this page...
Route 1: with paid tools
- Open up your crawling tool, choose your settings, and set it crawling. On a big site this can take a long time to complete. If you choose to use a paid tool like Screaming Frog, SEM Rush, Moz, etc., look at the tutorials/help content before you get started to get the most from them and make sure you configure them the right way. (There’s a step by step process for doing this with Screaming Frog in the next section.)
- Check your spreadsheet for duplication and delete/filter content that you do not want to audit, for example images files.
In Google Sheets go to 'Data' then 'Remove duplicates' to dedupe quickly and easily.
- Once your crawl and tidying up is done, upload your list of URLs to URL Profiler, choose your settings, and set it crawling.
Take a good look at all the features and make sure you understand the settings before you get going with URL Profiler. A few quick things to remember: check the Readability and Google Analytics boxes, and get a year's worth of data from Google Analytics.
- Open the spreadsheet that URL Profiler has created and tidy it up. There are likely to be a few columns that you’re not interested in. It can be really useful to go through and allocate each page with a content type (news story, blog post, product page, etc.) at this stage too.
- Copy your columns over to the content inventory template one by one, adding new columns if you need to.
- That’s it, you have your content inventory.
How to get a list of all live pages on a website using Screaming Frog
- Open up Screaming Frog and paste the domain you would like to audit in the top box and click ‘Start’.
- Once the audit is complete (the time varies depending on the size of the website), click ‘Export’ and save as a csv.
- Upload the csv file to Google Drive and open with Google Sheets.
Filters are extremely useful for many different reasons when it comes to finding the pages and content that might be relevant to your project. You might also use filters to check for duplicate content, in this case we need them to find the live pages on your site.
- To set up a filter in the spreadsheet, highlight the top row of headings and click ‘data’, then ‘create a filter’.
- Next we need to filter everything on the site to just get the live pages that a customer might visit, either through Google or navigating to it on-site. Click on the filter drop down for the column ‘content type’ and click ‘Clear’ under ‘filter by values’. Then select any value with ‘text’ in the option to choose just the static pages on a site and click OK. If you want to audit PDFs and files, make sure you leave these checked.
- Do the same in the status code column and deselect any status codes other than ‘200’ and click OK. A 301 status code means the page has been redirected to another and passes link equity. A 302 status code is similar to a 301, the only difference being it does not pass link equity.
- Some of the pages on the site might have been canonicalised, which means that they have been deemed as necessary duplicates and are not crawlable by Google. If this is the case, clear those from your filter view too. To do that, just click the ‘indexability’ column and deselect the ‘Non-indexable’ option.
- If you want a separate tab with the live static pages on, simply highlight the whole filtered view spreadsheet and copy and paste it into a new tab. Then you can add or remove columns to start the content auditing process.
Route 2: without paid tools
- Download lists of pages from your two free data sources, for example, your CMS and Google Search Console.
- Combine your two spreadsheets and remove all the duplicate pages. At this stage, you should also delete content that you do not want to audit, like images, etc. from your inventory.
In Google Sheets go to 'Data' then 'Remove duplicates' to dedupe quickly and easily.
- Now you can start pulling in data about the page using scraping. The scraping works by looking up HTML elements. This works well for things like headings, title tags, meta descriptions, publication dates, author, etc. There are examples in the template spreadsheet to get you started. These two articles explain how to do this for Google Sheets and Excel respectively. If the examples in the template don’t work, it might be because these elements have different names in your HTML. Follow the tutorials to fix them. It’s also good to remember that it can take some time for all the data to load, especially if you have a lot of pages.
Use Google Sheets' ImportXML function to scrape and visualize data, Geckoboard
Learn how to use the ImportXML feature to populate your Google Sheets content inventory.
Scraping Data from Website to Excel, Octoparse
Learn how to scrape data to populate your Excel content inventory.
- Go to Google Analytics and create a report with all the metrics you want to analyse. Put things like pageviews, entrances, etc. in the ‘Metrics’ tab and ‘Page’ as the dimension drilldown.
- Download a spreadsheet with a year’s worth of data from Google Analytics. Google Analytics limits you to downloading 5,000 lines of data at a time, which isn’t enough for most audits. This page has some tips for how to get around the restriction.
- Copy it column by column into the ‘Analytics’ tab in your inventory spreadsheet. Copy your domain without the trailling slash (for example https://lapope.com or https://www.bbc.com) all the way down the ‘Domain’ column.
- Use VLOOKUP to match the data from Google Analytics to URLs. There are examples in the template for the basic metrics, like pageviews. These will work automatically if you copy the data into the relevant columns. You can work from these and use the tutorial linked below to match the rest.
Learn how to use the VLOOKUP formula to match data from different sheets.
- Give everything a quick check and tidy up. It can be really useful to go through and allocate each page with a content type (news story, blog post, product page, etc.) at this stage too.
- That’s it, you have your content inventory.
Example/template inventory
This toolkit includes an example/template inventory. You can make a copy and use it as the basis for your own audit.
The example includes formulas and/or conditional formatting to:
- match Analytics and URL Profiler data via VLOOKUP
- scrape headings, titles, and meta descriptions via IMPORTXML (be sure to delete this if you’re not going to use it – it will slow your spreadsheet down)
- calculate page depth
- pull out page path and primary section
- show if title tags or meta descriptions are too short/long
Before you move on...
Prepare your content inventory.
(If you haven't made a copy yet, the inventory is linked on the content audit toolkit homepage.)
Your progress
58%