Chris Unitt

Arts Analytics: indexed pages

Generally speaking, bigger websites are better. A large website means more pages to be picked up by search engines and presented in organic search results, hopefully resulting in more traffic.

Of course, if care isn’t taken to produce and organise content effectively, larger websites can become unwieldy, sprawling and directionless, filled with with poor quality, stale or duplicate content. That’s not so good.

I’ve been wondering – how big do arts organisations’ websites tend to be? Without access to their databases or content management systems, it can be hard to tell. However, it’s pretty easy to find out how many pages Google thinks a website has. I’ll explain how below.

Findings

So, of the arts organisations that I’m looking at in this series, who has the most pages indexed by search engines? Let’s have a look.

(Data captured on 6 February 2013).

What does this tell us?

Not a great deal, to be honest. Not at an overall level at least, because I think there’s something wrong here. It’s nice to have the context this data gives us but I think something’s going on to create those massive outliers. I’ll get on to that.

I’d have liked to have been able to prove a hypothesis that larger venues such as the Barbican and Southbank Centre would be near the top of the list, given the number of productions they have every year (they’re up there but not topping the list). Or maybe organisations that have digital/new media teams that produce large amounts of content might perform well. Or perhaps those who know how to deal with old/archived information properly (rather than deleting entirely from their site – a capital crime in my eyes).

What if there’s something skewing this data then? It might be worth taking a closer look at the pages that are being indexed for those larger sites to see if the numbers are being inflated somehow. Well, at a quick glance I can see that:

So maybe there’s something to that theory. Bear in mind that if your site is, in the eyes of the search engines, substantially made up of duplicate or poor quality content then Google, et al may well decide that they should send less traffic to your website. Maybe they’re doing this already.

As you can see, context is good but this kind of information’s really more useful when applied at the individual level. If my organisation was at either the high or low end of that table, or my number of indexed pages seemed off then I might want to do some investigating.

Why you should check your indexed pages

One of the first things you do as part of a search engine optimisation audit is to find out how many of the pages that make up a given website are being listed (or indexed) by search engines.

In either case, you’ll want to remedy the situation to prevent being penalised by the search engines.

How to check your indexed pages for yourself

Google have a very handy thing called Webmaster Tools which is easy to set up and gives you all sorts of ways to check the health of your website. Under ‘Sitemaps’ you’ll be able to compare the number of URLs in your website’s sitemap with the number that have been indexed by Google.

For instance, for my website you can see in the screenshot below that 1,192 pages are included in the sitemap and 1,107 have been indexed. I’m also told that I have 91 warnings. I know that those all relate to URLs that I’m hiding from search engines on purpose (it’s a WordPress thing), so that’s all healthy.

By the way, you can and should hook up your Webmaster Tools account to your Google Analytics account. It only takes a second and is worthwhile doing.

How to check someone else’s indexed pages

I don’t have access to information from Webmaster Tools for all of these sites, so how did I get my data?

I’ve used two methods. To find the number of pages indexed by Google I just Googled ‘site:URL’ for each organisation and looked at how many results were returned. For instance, try clicking site:bsolive.com. At the time of writing, the figure is ‘about 634,000’.

In the fourth column in the table above I’ve also included data from a company called Majestic SEO. They’ve built a search engine which crawls the web and builds up an index of pages and the links between them. They then make this information available to others.

The data from Majestic SEO doesn’t seem to match up with Google’s all that often here but I wonder if that might point to the kind of issues I outlined above. Or maybe it’s a difference between indexed pages and URLs on a site (where URLs may include images, etc). Either way, I’ve included the data because it was easy enough to do. I’ll be using other data from Majestic SEO in later posts.

Recommendations

The problem with Royal & Derngate

You may have noticed that, according to my research, Royal & Derngate in Northampton only have two pages indexed by Google. A visit to their site shows that they have plenty more pages than that, so there must surely be some mistake. Unfortunately not.

Royal & Derngate are blocking search engines from indexing their site.

This is a very bad thing. They’re using a robots.txt file (http://www.royalandderngate.co.uk/robots.txt)to tell search engines not to visit the site. You can find out more about how this works at robotstxt.org. Here’s what it looks like:

Now this must’ve been done by mistake. There are reasons for hiding websites from search engines but I can’t see how any of them would apply here. Mistakes happen and that’s fair enough, but I’m going to stick my neck out and assume a few things here:

Thing is, who does know what a normal level of organic search traffic looks like for a theatre? Well, me, but then I have access to lots of theatres’ stats. My point is, it’s not info that’s shared around very much.

Just to underline what an important issue this is, I’ve just checked some stats for some clients and found that 47-58% of their online sales come via organic search. I really should be charging someone for flagging this up.

Instead, Royal & Derngate: please make fixing this a priority. UPDATE: I’m pleased to say the folks there have been in touch to say they’re fixing this (and I did drop them an email about it to let them know).

This post is part of a series called Arts Analytics where I’m using digital metrics to see what a group of arts organisations are doing online. If that’s your sort of thing then sign up for the free Arts Analytics newsletter for more of the same.

If this was useful then I’d be really grateful if you shared it with others.