As a frequent R user in school, I was used to apply the time series packages directly to process decomposition, when it came to the de-seasonality. I’ve done several time series analyses with BigQuery in my current job, and actually found that the step-by-step “recipe” helped me understand the mechanism in a deeper level. I would share some experience about how to manually process time series decomposition with the query language, for business analysis purpose[1].
Before coding, you need firstly to understand what is seasonality and why we need to remove it for certain analysis. Let’s think about a simple case:
- An ice cream vendor increased the price per ice cream by 3% in May
- Revenue from a 3-month window Feb – Apr before the price change was 1000 €
- Revenue from a 3-month window Jun – Aug after the price change was 3000 €
- Can he conclude the 3% price change drives revenue tripled?
The answer is no, because the sales of ice cream are seasonally higher in the summer months than that in any other months, and those “extra sales” repetitively happen every summer are what we call “seasonality”. To evaluate how much the 3% price change conducted the revenue growth, you need to remove this “foam” (seasonality) from the time series, to obtain a relatively smooth curve, and then compare the 3-month revenue before and after.
Secondly, you need to understand a simple formula of time series, from an econometric perspective.
- A time series Yt, which can be sales, temperature, population etc. during certain period with daily/ weekly/ quarterly/ monthly/ yearly frequency, is composed by 3 elements:
- Trend Tt
- Seasonality St
- Noises / Residual / Remainder component Rt
- Yt = Tt + St + Rt
- The purpose of de-seasonality is to find out and get rid of the seasonal component St from the time series Yt
The example of the time series de-seasonality will use the BigQuery public data. To access the project, this guidance will help you set it up. The classic decomposition method will be applied, and the output will be presented and visualized in connected Google Sheets.
sTEP 1. DATA SHORTLIST
The time series of new_residential_construction with monthly observations are applied as the example. The table is under the dataset census_bureau_construction, in the project bigquery-public-data. The earliest period dates back to 1959-01-15. To simplify our analysis, we shortlist the dimensions and the length of date by:
- Choosing the period after year 2000, period_date > “2000-01-01”
- Reporting at “Total Units” level, datatype_id = 1
- Using the unadjusted data, is_adjusted = 0
- Selecting the category as “Housing Units Under Construction”, category_id = 6
- Constraining the geographic dimensions in “United States” (ignore the regional dimensions, e.g. South, West, Northeast, etc.) to see the full picture, geo_id = 1
The query result is the time series that we will use for decomposition.
SELECT
period_date,
category_desc,
datatype_desc,
geo_desc,
is_adjusted,
value
FROM `bigquery-public-data.census_bureau_construction.new_residential_construction`
WHERE period_date > "2000-01-01"
AND datatype_id = 1
AND is_adjusted = 0
AND category_id = 6
AND geo_id = 1
| period_date | category_desc | datatype_desc | geo_desc | is_adjusted | value |
|---|---|---|---|---|---|
| 2000-01-15 | Housing Units Under Construction | Total Units | United States | 0 | 946.6 |
| 2000-02-15 | Housing Units Under Construction | Total Units | United States | 0 | 948.4 |
| 2000-03-15 | Housing Units Under Construction | Total Units | United States | 0 | 952.5 |
| 2000-04-15 | Housing Units Under Construction | Total Units | United States | 0 | 979.5 |
| 2000-05-15 | Housing Units Under Construction | Total Units | United States | 0 | 994.6 |
sTEP 2. TIME SERIES SETTING and moving avg
Before finding out the seasonality St, 3 columns are added to assist the calculation:
- t: a running number as an alias to the period, from the latest to the most recent
- t_month: month number from the corresponding date. We need the month number because seasonality is a value that repetitively allocated in the identical month number
- value_12_ma: moving average of observations (column: value) with a 12-month window. In this case, the 12-month window is selected as 11 months in ahead (11 rows before) + the current month (the current row) .
Note that a moving average with a window of m observations, is usually called as m-MA, or moving average of order m. If m is an even number, for example, in our case, monthly data – MA of order 12, or quarterly data – MA of order 4, the trend component is based on one another moving average of the moving average (will introduce later), for making an even-order moving average symmetric[2]. If m is an odd number, for example, weekly data – MA of order 7, the 7-MA can be directly treated as the trend component.
SELECT
ROW_NUMBER() OVER (ORDER BY period_date ASC) AS t,
period_date,
SUBSTRING(CAST(period_date AS STRING),6,2) AS t_month,
value,
CASE
WHEN ROW_NUMBER() OVER (ORDER BY period_date ASC) < 12 THEN NULL
ELSE
AVG(value) OVER (
ORDER BY period_date ASC
ROWS BETWEEN 11 PRECEDING AND CURRENT ROW
)
END AS value_12_ma,
FROM `bigquery-public-data.census_bureau_construction.new_residential_construction`
WHERE period_date > "2000-01-01"
AND datatype_id = 1
AND is_adjusted = 0
AND category_id = 6
AND geo_id = 1
Row 7-14 provides a method to calculate moving average with a 12-month window. It is necessary to point out:
ROWS BETWEEN 11 PRECEDING AND CURRENT ROW
The expression assigns the moving average to a row, whose corresponding value and the 11 values before result in the moving average. Nevertheless, It is flexible to design the position of the window. For example, if you want to assign the moving average to a row, whose corresponding value, 5 values before and 6 values after, result in the moving average, you can change the expression into:
ROWS BETWEEN 5 PRECEDING AND 6 FOLLOWING
However, whatever the position the window is, we need to pay attention to the moving average of certain row that does not group a sufficient amount of observations as the window designed. In our example, we implemented an condition to clean those moving averages whose window length is shorter than 12 months:
WHEN ROW_NUMBER() OVER (ORDER BY period_date ASC) < 12 THEN NULL
Let the the first 11 values in the moving average column be NULL, because the moving average at the first 11 rows does not group enough values as what the 12-month window designed. For example, the moving average of the third row is the mean of the first 3 values, a much shorter 3-month window.
Till this step, we’ve set up the time series for decomposition.
| t | period_date | t_month | value | value_12_ma |
|---|---|---|---|---|
| 1 | 2000-01-15 | 01 | 946.6 | null |
| 2 | 2000-02-15 | 02 | 948.4 | null |
| 3 | 2000-03-15 | 03 | 952.5 | null |
| 4 | 2000-04-15 | 04 | 979.5 | null |
| 5 | 2000-05-15 | 05 | 994.6 | null |
| 6 | 2000-06-15 | 06 | 1006.8 | null |
| 7 | 2000-07-15 | 07 | 1015.2 | null |
| 8 | 2000-08-15 | 08 | 1009.9 | null |
| 9 | 2000-09-15 | 09 | 998.9 | null |
| 10 | 2000-10-15 | 10 | 997.3 | null |
| 11 | 2000-11-15 | 11 | 980.2 | null |
| 12 | 2000-12-15 | 12 | 933.8 | 980.3083333333333 |
| 13 | 2001-01-15 | 01 | 940.8 | 979.825 |
| 14 | 2001-02-15 | 02 | 939.1 | 979.05 |
| 15 | 2001-03-15 | 03 | 967.0 | 980.2583333333333 |
| 16 | 2001-04-15 | 04 | 996.2 | 981.65 |
| 17 | 2001-05-15 | 05 | 1027.6 | 984.4 |
STEP 3. CALCULATE THE TREND COMPONENT
As it is mentioned above, if the order of the moving average is an odd number (e.g. weekly data with a 7-day window), the m-MA can be directly treated as the trend component. However, when dealing with the monthly data, the order of the moving average is 12, and we need to calculate one more moving average of order 2 on the 12-MA to obtain the trend component, which is notated as 2 × 12-MA.
SELECT *,
CASE WHEN t < 13 THEN NULL
ELSE
AVG(value_12_ma) OVER (
ORDER BY period_date asc
ROWS BETWEEN 1 PRECEDING AND CURRENT ROW
)
END AS trend_value,
FROM (
SELECT
ROW_NUMBER() OVER (ORDER BY period_date ASC) AS t,
period_date,
SUBSTRING(CAST(period_date AS STRING),6,2) AS t_month,
value,
CASE
WHEN ROW_NUMBER() OVER (ORDER BY period_date ASC) < 12 THEN NULL
ELSE
AVG(value) OVER (
ORDER BY period_date ASC
ROWS BETWEEN 11 PRECEDING AND CURRENT ROW
)
END AS value_12_ma,
FROM `bigquery-public-data.census_bureau_construction.new_residential_construction`
WHERE period_date > "2000-01-01"
AND datatype_id = 1
AND is_adjusted = 0
AND category_id = 6
AND geo_id = 1
)
Row 2-8 calculates the moving average of order 2 on the already-calculated moving average of order 12, and this newly generated value is the trend value of this time series. Table below shows the result.
| t | period_date | t_month | value | value_12_ma | trend_value |
|---|---|---|---|---|---|
| 1 | 2000-01-15 | 01 | 946.6 | null | null |
| 2 | 2000-02-15 | 02 | 948.4 | null | null |
| 3 | 2000-03-15 | 03 | 952.5 | null | null |
| 4 | 2000-04-15 | 04 | 979.5 | null | null |
| 5 | 2000-05-15 | 05 | 994.6 | null | null |
| 6 | 2000-06-15 | 06 | 1006.8 | null | null |
| 7 | 2000-07-15 | 07 | 1015.2 | null | null |
| 8 | 2000-08-15 | 08 | 1009.9 | null | null |
| 9 | 2000-09-15 | 09 | 998.9 | null | null |
| 10 | 2000-10-15 | 10 | 997.3 | null | null |
| 11 | 2000-11-15 | 11 | 980.2 | null | null |
| 12 | 2000-12-15 | 12 | 933.8 | 980.3083333333333 | null |
| 13 | 2001-01-15 | 01 | 940.8 | 979.825 | 980.0666666666667 |
| 14 | 2001-02-15 | 02 | 939.1 | 979.05 | 979.4375 |
| 15 | 2001-03-15 | 03 | 967.0 | 980.2583333333333 | 979.6541666666667 |
| 16 | 2001-04-15 | 04 | 996.2 | 981.65 | 980.9541666666667 |
| 17 | 2001-05-15 | 05 | 1027.6 | 984.4 | 983.025 |
| 18 | 2001-06-15 | 06 | 1038.3 | 987.025 | 985.7125 |
| 19 | 2001-07-15 | 07 | 1056.3 | 990.4499999999999 | 988.7375 |
| 20 | 2001-08-15 | 08 | 1046.2 | 993.475 | 991.9625 |
Step 4. calculate the seasonal component
The seasonal component is the mean of the difference between the observation value and the trend component, in the repetitive periods. For example, you have a monthly-frequency time series with a length of 3 years (36 observations), the seasonal component in January in the 3 years should be the same, and equals to:
[(Valueyear1-Jan – Trendyear1-Jan) + (Valueyear2-Jan – Trendyear2-Jan) + (Valueyear3-Jan – Trendyear3-Jan)] ÷ 3
Row 3-7 in the following query executes the logic mentioned above to obtain the seasonal component.
Row 9-13 in the following query calculates the residual component, which equals to:
Value Yt – Trend Tt – Seasonality St = Residual Rt
Row 15-19 in the following query calculates the de-seasonality time series:
Value Yt – Seasonality St = Residual Rt + Trend Tt = time series without the seasonal component
SELECT *,
CASE
WHEN t < 13 THEN NULL
ELSE
AVG(value - trend_value) OVER (PARTITION BY t_month)
END AS seasonality_value,
CASE
WHEN t < 13 THEN NULL
ELSE
value - trend_value - (AVG(value - trend_value) OVER (PARTITION BY t_month))
END AS residual_value,
CASE
WHEN t < 13 THEN NULL
ELSE
value - (AVG(value - trend_value) OVER (PARTITION BY t_month))
END AS value_deseasonality,
FROM
(
SELECT *,
CASE WHEN t < 13 THEN NULL
ELSE
AVG(value_12_ma) OVER (
ORDER BY period_date asc
ROWS BETWEEN 1 PRECEDING AND CURRENT ROW
)
END AS trend_value,
FROM (
SELECT
ROW_NUMBER() OVER (ORDER BY period_date ASC) AS t,
period_date,
SUBSTRING(CAST(period_date AS STRING),6,2) AS t_month,
value,
CASE
WHEN ROW_NUMBER() OVER (ORDER BY period_date ASC) < 12 THEN NULL
ELSE
AVG(value) OVER (
ORDER BY period_date ASC
ROWS BETWEEN 11 PRECEDING AND CURRENT ROW
)
END AS value_12_ma,
FROM `bigquery-public-data.census_bureau_construction.new_residential_construction`
WHERE period_date > "2000-01-01"
AND datatype_id = 1
AND is_adjusted = 0
AND category_id = 6
AND geo_id = 1
)
)
ORDER BY period_date ASC
Now we are ready to connect Google Sheets with this query, and utilize the final table for visualization.
| t | period_date | t_month | value | value_12_ma | trend_value | seasonality_value | residual_value | value_deseasonality |
|---|---|---|---|---|---|---|---|---|
| 1 | 2000-01-15 | 01 | 946.6 | null | null | null | null | null |
| 2 | 2000-02-15 | 02 | 948.4 | null | null | null | null | null |
| 3 | 2000-03-15 | 03 | 952.5 | null | null | null | null | null |
| 4 | 2000-04-15 | 04 | 979.5 | null | null | null | null | null |
| 5 | 2000-05-15 | 05 | 994.6 | null | null | null | null | null |
| 6 | 2000-06-15 | 06 | 1006.8 | null | null | null | null | null |
| 7 | 2000-07-15 | 07 | 1015.2 | null | null | null | null | null |
| 8 | 2000-08-15 | 08 | 1009.9 | null | null | null | null | null |
| 9 | 2000-09-15 | 09 | 998.9 | null | null | null | null | null |
| 10 | 2000-10-15 | 10 | 997.3 | null | null | null | null | null |
| 11 | 2000-11-15 | 11 | 980.2 | null | null | null | null | null |
| 12 | 2000-12-15 | 12 | 933.8 | 980.3083333333333 | null | null | null | null |
| 13 | 2001-01-15 | 01 | 940.8 | 979.825 | 980.0666666666667 | -22.255416666666704 | -17.01125000000006 | 963.0554166666667 |
| 14 | 2001-02-15 | 02 | 939.1 | 979.05 | 979.4375 | -26.131359649122864 | -14.206140350877114 | 965.2313596491229 |
| 15 | 2001-03-15 | 03 | 967.0 | 980.2583333333333 | 979.6541666666667 | -17.865789473684284 | 5.211622807017587 | 984.8657894736842 |
| 16 | 2001-04-15 | 04 | 996.2 | 981.65 | 980.9541666666667 | -2.6839912280702114 | 17.929824561403606 | 998.8839912280703 |
| 17 | 2001-05-15 | 05 | 1027.6 | 984.4 | 983.025 | 11.862280701754313 | 32.71271929824562 | 1015.7377192982456 |
| 18 | 2001-06-15 | 06 | 1038.3 | 987.025 | 985.7125 | 19.524780701754334 | 33.06271929824564 | 1018.7752192982456 |
| 19 | 2001-07-15 | 07 | 1056.3 | 990.4499999999999 | 988.7375 | 30.926535087719245 | 36.635964912280755 | 1025.3734649122807 |
| 20 | 2001-08-15 | 08 | 1046.2 | 993.475 | 991.9625 | 26.467543859649112 | 27.769956140350956 | 1019.7324561403509 |
| 21 | 2001-09-15 | 09 | 1043.3 | 997.175 | 995.325 | 26.886403508771878 | 21.08859649122803 | 1016.4135964912281 |
| 22 | 2001-10-15 | 10 | 1035.6 | 1000.3666666666667 | 998.7708333333334 | 20.942105263157842 | 15.887061403508696 | 1014.6578947368421 |
| 23 | 2001-11-15 | 11 | 1018.4 | 1003.55 | 1001.9583333333334 | 11.036622807017492 | 5.405043859649114 | 1007.3633771929825 |
| 24 | 2001-12-15 | 12 | 959.4 | 1005.6833333333333 | 1004.6166666666667 | -21.10504385964916 | -24.111622807017536 | 980.5050438596492 |
| 25 | 2002-01-15 | 01 | 957.3 | 1007.0583333333333 | 1006.3708333333333 | -22.255416666666704 | -26.81541666666662 | 979.5554166666667 |
Step 5. visualization
The picture below illustrates the original time series, the trend and the time series without the seasonal component:
- Housing Unites Under Construction: value
- Trend: trend_value
- Housing Unites Under Construction (De-seasonality): value_deseasonality
Here you can see the adjusted time series (without seasonality), the blue curve, gets much more smoother than the original time series, the red one.

The picture below demonstrates the seasonal component and the residual that we separated from the original time series, where you can clearly observe the seasonal peak.
- Seasonality: seasonality_value
- Residual: residual_value

Hope you have a fun and smooth de-seasonality process!
Notes & References
[1] The analysis is only for business analysis purpose. Therefore, I try to avoid jargons and extra analysis that are commonly used in econometrics, such as white noises, lags, stationarity, etc., although time series analysis has a strict academic structure.
[2] Hyndman R J, Athanasopoulos G. Forecasting: principles and practice[M]. OTexts, 2018. Chapter 6.3 Classical decomposition. https://otexts.com/fpp2/
Other reference: A Guide to Modern Econometrics: Edition 5, Marno Verbeek, 2017, Wiley Global Education
评论 Leave a Comment