Userdefined functions in BigQuery
BigquerySQLIn an attempt to sharpen my SQL and BigQuery skills a bit I set out to calculate the Doubling Rate ("Verdopplungsrate" in German) of cases that the data team at Der Spiegel uses in their reporting of the coronavirus pandemic. They define the doubling rate as the number of days it takes for the number of cases to double, assuming that the current rate stays the same. What would you base the current rate on? Two days, a week? Der Spiegel decided to use a 5 day range: just short enough to be reflective of current conditions and long enough to smooth out daily variations. Here is their chart:
Time to build! The Johns Hopkins covid19 dataset is accessible directly in Bigquery, courtesy of Google.
Here is my approach:

Getting the number of cases for each country from 5 days ago is done with the builtin LAG window function.

To find a common rate over a 5day period we have to pay attention to the fact that this is a compound rate, so you can't just divide
(current_value  previous_value) / 5
. Instead you have to base it on the compound annual growth rate. Once you calculate the rate, you use it to calculate the time periods required to reach the future value, which is 2 * current_value. If these look like financial functions to you, you'd be right! These functions are built into Google Sheets as RRI and PDURATION, but not to my knowledge in BigQuery. Fortunately we can use userdefined functions (UDF) to create these ourselves in Javascript since the math is pretty straightforward.
Here is what I came up with, which you should be able to plug in to the BigQuery UI:
CREATE TEMPORARY FUNCTION doubling_rate(start INT64, finish INT64, periods INT64)
RETURNS FLOAT64
LANGUAGE js AS """
function irr(start, finish, periods) {
const exponent = (1/periods)
return Math.pow((finish/start),exponent)  1
}
function pduration(start, finish, rate) {
const numerator = Math.log10(finish/start)
const denominator = Math.log10(1 + rate)
return (numerator/denominator)
}
const internal_rate = irr(start, finish, periods);
return pduration(finish, finish * 2, internal_rate);
""";
WITH
db AS (
SELECT
country_region,
date,
sum(confirmed) as confirmed,
FROM
`bigquerypublicdata.covid19_jhu_csse.summary`
GROUP BY
country_region,
date
ORDER BY
date DESC)
SELECT
*,
ROUND(doubling_rate(previous_5,confirmed,5),1) as doubling_rate
FROM
(SELECT
*,
LAG(confirmed,5) OVER(ORDER BY country_region,date) As previous_5,
FROM
db)
WHERE date = "20200405"
ORDER by confirmed desc
To learn more about userdefined functions, see the offical docs and Felipe has a great tutorial.