Site Speed analysis in Google Analytics

The Site Speed report in Google Analytics was announced about a month ago, so how about doing some analysis now that we have gathered some data?

First off some background. Load times are still only collected for Internet Explorer and Chrome, and the data is sampled to the tune of about 3% of pageviews for IE and 9% of Chrome. I don’t have a problem with sampling at all, but since the metric of interest is average page load time it would be nice to know a little bit about how the data is distributed. Does it vary a lot? What’s the typical range of values? Do some outliers massively skew the data? At this point we can’t say since we just get the average value…

Anyhow. Reducing page load times across the board may be a worthy goal in and by itself, but can we get some specific answers from the data itself? Specifically, can the data tell us slow page load = bad and fast page load = good ?

Fortunately we can make use of the fact that data is collected on all types of visits, including single pageview visits, aka bounces! So one interesting question might be:

Is bounce rate correlated with page load times?

Here is how I approached the question.

I created a Flat Table (cool new feature btw) custom report with Page and Landing Page as dimensions and added a bunch of metrics. See here.

(Why not just “Page” as in the default Site Speed report? Because with bounce rate I am concerned with Landing Pages and I want to try to include data only if the Page is a Landing Page. For non-bounce traffic there will be some cases where the page was viewed again during the same visit, which is what I don’t want, but I can at least make sure that the visit started on the same page. Not perfect, but better than just looking at Pages.)

Next, I created an advanced segment for bounced traffic. There is already a default segment for non-bouncing visits, so I appliedĀ both segments to the custom report. Finally, I wanted to get rid of data with too small a sample size, so I applied an advanced filter to only show data where I have more than 10 samples and I ended up with this nice report:

(Click image for larger version)

The only thing missing is the visualization… Can we say something about page load times in aggregate for both segments? I don’t have a nice way to visualize this data in Google Analytics so I just exported this report as a csv and loaded it into R. And my best friend there is the boxplot.

This is just one data set for one site over a short period of time, but what do you think in terms of the validity of process? This is observational data of course and not an experiment, but those segments sure look different.