User-defined functions in BigQuery

In an attempt to sharpen my SQL and BigQuery skills a bit I set out to calculate the Doubling Rate ("Verdopplungsrate" in German) of cases that the data team at Der Spiegel uses in their reporting of the coronavirus pandemic. They define the doubling rate as the number of days it takes for the number of cases to double, assuming that the current rate stays the same. What would you base the current rate on? Two days, a week? Der Spiegel decided to use a 5 day range: just short enough to be reflective of current conditions and long enough to smooth out daily variations. Here is their chart:

Time to build! The Johns Hopkins covid-19 dataset is accessible directly in Bigquery, courtesy of Google.

Here is my approach:

  1. Getting the number of cases for each country from 5 days ago is done with the built-in LAG window function.

  2. To find a common rate over a 5-day period we have to pay attention to the fact that this is a compound rate, so you can't just divide (current_value - previous_value) / 5. Instead you have to base it on the compound annual growth rate. Once you calculate the rate, you use it to calculate the time periods required to reach the future value, which is 2 * current_value. If these look like financial functions to you, you'd be right! These functions are built into Google Sheets as RRI and PDURATION, but not to my knowledge in BigQuery. Fortunately we can use user-defined functions (UDF) to create these ourselves in Javascript since the math is pretty straightforward.

Here is what I came up with, which you should be able to plug in to the BigQuery UI:

CREATE TEMPORARY FUNCTION doubling_rate(start INT64, finish INT64, periods INT64)
  RETURNS FLOAT64
  LANGUAGE js AS """
    function irr(start, finish, periods) {
      const exponent = (1/periods)
      return Math.pow((finish/start),exponent) - 1
    }
    function pduration(start, finish, rate) {
      const numerator = Math.log10(finish/start)
      const denominator = Math.log10(1 + rate)
      return (numerator/denominator)
    }
    const internal_rate = irr(start, finish, periods);
    return pduration(finish, finish * 2, internal_rate);
""";

WITH
  db AS (
  SELECT
    country_region,
    date,
    sum(confirmed) as confirmed,
  FROM
    `bigquery-public-data.covid19_jhu_csse.summary`
  GROUP BY
    country_region,
    date
  ORDER BY
    date DESC)
SELECT
  *,
  ROUND(doubling_rate(previous_5,confirmed,5),1) as doubling_rate
FROM
(SELECT
  *,
  LAG(confirmed,5) OVER(ORDER BY country_region,date) As previous_5,
FROM
  db)
WHERE date = "2020-04-05"
ORDER by confirmed desc

To learn more about user-defined functions, see the offical docs and Felipe has a great tutorial.