There are two routes for creating a content inventory. One with paid tools, and one without.
On this page...
Route 1: with paid tools
- Open up your crawling tool, choose your settings, and set it crawling. On a big site this can take a long time to complete. If you choose to use a paid tool like Screaming Frog, SEM Rush, Moz, etc., look at the tutorials/help content before you get started to get the most from them and make sure you configure them the right way. (There’s a step by step process for doing this with Screaming Frog in the next section.)
- Check your spreadsheet for duplication and delete/filter content that you do not want to audit, for example images files.
- Once your crawl and tidying up is done, upload your list of URLs to URL Profiler, choose your settings, and set it crawling.
- Open the spreadsheet that URL Profiler has created and tidy it up. There are likely to be a few columns that you’re not interested in. It can be really useful to go through and allocate each page with a content type (news story, blog post, product page, etc.) at this stage too.
- Copy your columns over to the content inventory template one by one, adding new columns if you need to.
- That’s it, you have your content inventory.
How to get a list of all live pages on a website using Screaming Frog
- Open up Screaming Frog and paste the domain you would like to audit in the top box and click ‘Start’.
- Once the audit is complete (the time varies depending on the size of the website), click ‘Export’ and save as a csv.
- Upload the csv file to Google Drive and open with Google Sheets.
- To set up a filter in the spreadsheet, highlight the top row of headings and click ‘data’, then ‘create a filter’.
- Next we need to filter everything on the site to just get the live pages that a customer might visit, either through Google or navigating to it on-site. Click on the filter drop down for the column ‘content type’ and click ‘Clear’ under ‘filter by values’. Then select any value with ‘text’ in the option to choose just the static pages on a site and click OK. If you want to audit PDFs and files, make sure you leave these checked.
- Do the same in the status code column and deselect any status codes other than ‘200’ and click OK. A 301 status code means the page has been redirected to another and passes link equity. A 302 status code is similar to a 301, the only difference being it does not pass link equity.
- Some of the pages on the site might have been canonicalised, which means that they have been deemed as necessary duplicates and are not crawlable by Google. If this is the case, clear those from your filter view too. To do that, just click the ‘indexability’ column and deselect the ‘Non-indexable’ option.
- If you want a separate tab with the live static pages on, simply highlight the whole filtered view spreadsheet and copy and paste it into a new tab. Then you can add or remove columns to start the content auditing process.
Route 2: without paid tools
- Download lists of pages from your two free data sources, for example, your CMS and Google Search Console.
- Combine your two spreadsheets and remove all the duplicate pages. At this stage, you should also delete content that you do not want to audit, like images, etc. from your inventory.
- Now you can start pulling in data about the page using scraping. The scraping works by looking up HTML elements. This works well for things like headings, title tags, meta descriptions, publication dates, author, etc. There are examples in the template spreadsheet to get you started. These two articles explain how to do this for Google Sheets and Excel respectively. If the examples in the template don’t work, it might be because these elements have different names in your HTML. Follow the tutorials to fix them. It’s also good to remember that it can take some time for all the data to load, especially if you have a lot of pages.
- Go to Google Analytics and create a report with all the metrics you want to analyse. Put things like pageviews, entrances, etc. in the ‘Metrics’ tab and ‘Page’ as the dimension drilldown.
- Download a spreadsheet with a year’s worth of data from Google Analytics. Google Analytics limits you to downloading 5,000 lines of data at a time, which isn’t enough for most audits. This page has some tips for how to get around the restriction.
- Copy it column by column into the ‘Analytics’ tab in your inventory spreadsheet. Copy your domain without the trailling slash (for example https://lapope.com or https://www.bbc.com) all the way down the ‘Domain’ column.
- Use VLOOKUP to match the data from Google Analytics to URLs. There are examples in the template for the basic metrics, like pageviews. These will work automatically if you copy the data into the relevant columns. You can work from these and use the tutorial linked below to match the rest.
- Give everything a quick check and tidy up. It can be really useful to go through and allocate each page with a content type (news story, blog post, product page, etc.) at this stage too.
- That’s it, you have your content inventory.
This toolkit includes an example/template inventory. You can make a copy and use it as the basis for your own audit.
The example includes formulas and/or conditional formatting to:
- match Analytics and URL Profiler data via VLOOKUP
- scrape headings, titles, and meta descriptions via IMPORTXML (be sure to delete this if you’re not going to use it – it will slow your spreadsheet down)
- calculate page depth
- pull out page path and primary section
- show if title tags or meta descriptions are too short/long