Stylized Facts

about former Yugoslav republics economies

GDP Real Sector Unemployment

Okun’s Law, step by step: A replicable roadmap

Reading Time: 11 minutes

1. How to test a “Law” that keeps changing its mind

This is the second instalment in a short series on the relationship between unemployment and economic growth, Okun’s Law. The first post introduced the idea and its two main specifications; this one is the “methodology post” that shows, step by step, how we actually try to verify the relationship in data without getting fooled by statistics that look confident but aren’t. The third and final step will be the fun part: empirical verification using data from former Yugoslavia countries, where the relationship is unlikely to behave politely for long.

A methodological post risks becoming a sewer tour of econometrics. Let’s not. Think of this pipeline as a set of guardrails. Each test or model is introduced for a reason: either because macro time series are slippery objects, or because the Okun relationship is easily distorted by structural change, measurement choices, and plain old noise. The workflow is sequential on purpose: each stage answers a question whose answer determines what you’re allowed to do next.

We also run a companion track for panel methods. Not because panels are “better”, but because later we will also analyse single countries, and it helps to keep the logic consistent across both worlds. Panel methods can add power when time series are short, but they introduce new risks, especially cross-sectional dependence, so they need their own diagnostic discipline.

2. First, don’t trip over stationarity

Okun’s Law is about co-movement: output and unemployment moving together in predictable ways. But macroeconomic series often “wander”, they drift, trend, break, and sometimes pretend to revert only after you’ve published your paper. If we regress wandering things on wandering things, we can manufacture impressive statistics out of nonsense. The first job, then, is to establish the time-series properties of the variables we plan to use.

That is why the workflow begins with unit root testing and stationarity checks. The purpose is not to win a philosophical debate about whether GDP “really” has a unit root. The purpose is narrower and more practical: to decide whether we should work in differences, levels, or in an error-correction framework that allows both short-run dynamics and long-run relationships.

In the pipeline, this stage is not treated as a single test but as triangulation: different tests have different null hypotheses and different sensitivities. That’s useful, because macro data are rarely kind enough to satisfy the ideal conditions of any single test.

3. Second, assume the world breaks, and test for it

In macroeconomics, “structural break” is a polite phrase for “history happened”. Large events, crises, regime shifts, re-definitions of labour-market institutions, can change the relationship we are trying to estimate. If breaks are present and ignored, many standard unit root and cointegration tests lose power or become misleading. That is why the pipeline explicitly treats breaks as a first-class citizen.

The methodology therefore adds break-robust unit root testing, tests that allow for one or more breaks in levels or trends, rather than assuming the data-generating process has been the same since the dawn of quarterly national accounts. The practical meaning is simple: if the unemployment series jumps because the labour market reorganises, or output trends shift because the growth model changes, we should not interpret that as “evidence” of non-stationarity or of a missing long-run relationship. We should interpret it as a break, then test accordingly.

This logic also matters downstream. In cointegration work, the study explicitly notes that “large events such as financial crises, pandemics, and policy regime shifts can induce breaks” that “render standard cointegration methods unreliable.” That warning is not decorative. It is the reason the workflow includes structural-break cointegration alongside “standard” cointegration.

4. Third, decide whether you are chasing short-run moves or long-run gravity

Okun’s Law can be told as a short-run story (changes in unemployment responding to growth) or as a gap story (labour-market slack responding to output slack). Methodologically, that choice maps to different econometric objects.

The workflow keeps both in view by making cointegration the hinge. If variables are integrated and cointegrated, then there is a long-run equilibrium relationship, call it “gravity”, and deviations from it are expected to correct over time. That is a very different world from one in which the relationship exists only in short-run changes, without any equilibrium pull.

So the pipeline asks: do unemployment and output (or their gap counterparts) share a long-run relationship, or are we confined to short-run dynamics? The answer determines whether we proceed with an error-correction representation and long-run coefficients, or remain in differenced models.

5. Fourth, test cointegration like you don’t trust yourself

Cointegration testing is the central fork in the road. It is also where researchers most easily cherry-pick, because there are many tests and they can disagree. The pipeline deals with this not by pretending there is one perfect test, but by using a coherent set of methods that speak to different weaknesses.

The document groups cointegration methods into four families: (i) residual-based single-equation methods, (ii) multivariate system methods, (iii) ARDL bounds testing, and (iv) combined testing approaches. The point is not to hoard methods; it is to build robustness by approaching the same question from multiple angles.

5.1 Residual-based cointegration

The residual-based approach (in the spirit of Engle and Granger) is intuitive: estimate the long-run relationship and test whether the residual behaves like a stationary error rather than a drifting series. It is appealing because it maps cleanly onto the economic story: output and unemployment may deviate in the short run, but the deviation shouldn’t explode forever if there is an equilibrium link.

A light “template” representation is:

y_t = \alpha + \beta x_t + \varepsilon_t

If (\varepsilon_t) is stationary, (y_t) and (x_t) are cointegrated; the relationship is not merely contemporaneous correlation dressed up as theory. The methodology uses this logic while recognising its limits, particularly when the long-run relationship itself may shift.

5.2 Cointegration with regime shifts

Because breaks can invalidate standard residual-based tests, the pipeline also includes residual-based tests that explicitly allow regime shifts in the cointegrating relationship. That is not an econometric flourish. It is a recognition that Okun’s Law can “hold” in one labour-market regime and look weaker (or different) in another, without the data becoming meaningless.

5.3 ARDL bounds testing: Cointegration without over-promising

The ARDL approach is introduced for a pragmatic reason: it can test for a long-run relationship even when regressors are a mix of I(0) and I(1), and it performs well in small samples, conditions that are common in macro work, especially with annual data. The post highlights ARDL as “flexible and suitable for small samples” and emphasises that it “allows for different lag lengths for the dependent and independent variables.”

Just as importantly, ARDL comes with a built-in interpretation that economists actually like: you can read off both short-run effects and long-run effects, and you can interpret adjustment back to equilibrium in an error-correction form. That error-correction representation is also where the workflow starts to feel like an economic narrative rather than a statistics exercise.

A light template version looks like:

\Delta y_t = \alpha + \sum_{i=1}^{p}\phi_i \Delta y_{t-i} + \sum_{j=0}^{q}\theta_j \Delta x_{t-j} + \lambda (y_{t-1} - \beta x_{t-1}) + \varepsilon_t

The key economic object here is (\lambda): the adjustment term. If it is negative and significant, deviations from the long-run relationship tend to be corrected. That is the econometric counterpart of an economist saying: “Okun’s Law is not just a short-run wiggle; it has a pull.”

The pipeline also explains how the bounds test operationalises the decision: it tests the joint significance of lagged level terms, and the inference depends on whether the F-statistic sits below the lower bound (no cointegration), above the upper bound (cointegration), or in the uncomfortable middle (inconclusive). That is a disciplined way of acknowledging uncertainty rather than hiding it behind a single p-value.

5.4 “Don’t bet the paper on one test”: Combined approaches

The methodology also uses combined testing logic, explicitly valuing complementarity. The reasoning is straightforward: if different tests based on different assumptions converge on the same conclusion, we can be more confident; if they diverge, we learn something about fragility and specification dependence rather than pretending the data are “wrong.”

6. Fifth, once you have a model, act like it can fail

A credible methodology does not stop at estimation. It forces the model to earn its keep.

That is why the pipeline emphasises diagnostic checking and stability testing. In macro time series, the biggest threat is not that a coefficient is “insignificant”. The threat is that the model is misspecified, lag structure too short or too long, residual autocorrelation that invalidates inference, heteroskedasticity that inflates confidence, or parameter instability that turns an average effect into a misleading fiction.

Here the study explicitly endorses a general-to-specific approach as “particularly valuable” because theory rarely pins down the correct lag structure in macro time series. Start with a sufficiently general ARDL/NARDL specification, test down by removing insignificant lags only if diagnostics remain acceptable, and use stability checks (including CUSUM-type tests) as a veto. The point is not to worship diagnostics. The point is to prevent a neat story from being built on a broken engine.

This is also where the workflow earns its replicability. A “pipeline” is replicable if it tells you what to do when things go wrong. Here, the rule is: reductions are allowed only if they do not worsen fit or violate diagnostics. That is the difference between a workflow and a collection of results.

7. Sixth, allow asymmetry, because labour markets do

Okun’s Law is often narrated as symmetric: growth up, unemployment down; growth down, unemployment up. But labour markets can be lopsided. Hiring can be slower than firing; unemployment may rise quickly in downturns and fall slowly in expansions; or the response can depend on where you are relative to capacity.

That is why the methodology introduces the nonlinear ARDL (NARDL) framework, which allows “asymmetric adjustment.” Economically, the goal is not to complicate the story, it is to test whether the mapping from output movements to unemployment movements depends on the sign of the movement.

The study implements this by decomposing the explanatory variable into positive and negative partial sums. In template form:

x_t^+ = \sum_{j=1}^{t}\max(\Delta x_j,0), \quad x_t^- = \sum_{j=1}^{t}\min(\Delta x_j,0)

The NARDL model then includes both (x^+) and (x^-), and tests whether their long-run and short-run effects differ. The methodological justification is clean: if the economy’s expansions and contractions transmit differently into unemployment, a symmetric model averages the two and risks concluding that “Okun is weak” when, in fact, Okun is merely asymmetric.

This asymmetry logic is also kept compatible with the general-to-specific discipline: start general, test down, insist on diagnostics. In other words, asymmetry is treated as a hypothesis to be earned, not as a decorative feature.

8. Seventh, panels: a companion track, not a religion

The pipeline adds panel tools because macro datasets can be short in time, especially annual data, and pooling information across countries can increase statistical power. But panels also change the nature of the problem. Countries are not independent draws from a jar. They share shocks, trade links, financial cycles, and policy regimes. If we pretend otherwise, panel inference becomes overconfident.

So the workflow treats panel methods as a companion track: useful, but only when accompanied by diagnostics for cross-sectional dependence and heterogeneity.

8.1 Panel cointegration as an “is there a long run?” check

The study explains why panel cointegration matters in the Okun context: it helps assess whether the relationship holds in the long run for a set of units, and it can be particularly valuable when time series are short. It then deploys three major panel cointegration tests, Pedroni, Kao, and Westerlund, explicitly because they provide “complementary perspectives” and differ in their assumptions about heterogeneity and partial cointegration.

The careful part is the interpretation of alternatives. Kao and Pedroni’s within-dimension statistics effectively assume cointegration for all units under the alternative, while Pedroni’s between-dimension approach and Westerlund can accommodate partial cointegration, only a subset of countries sharing a long-run path. That distinction is not technical trivia. In the Okun setting, it corresponds to a plausible economic reality: some labour markets may share a common long-run unemployment–output relationship, others may not, and forcing a single “panel truth” can erase that heterogeneity.

The study also flags sample-size realities: in short panels, power varies, and Westerlund tests often outperform in small samples with structural changes. Again: not a claim of perfection, but a reasoned choice given the data constraints.

8.2 Cross-sectional dependence: the panel’s favourite lie

If there is one panel pitfall the methodology refuses to ignore, it is cross-sectional dependence. The bibliography supporting this track includes Pesaran’s diagnostic tests, plus refinements and related contributions (including bias-corrected approaches and high-dimensional enhancements). The point is not to run every test under the sun. The point is to test the central assumption that makes panel inference legitimate.

The workflow therefore integrates dependence diagnostics as a gatekeeper: detect dependence, then adjust the modelling strategy accordingly rather than treating panel results as automatically “robust”.

9. Eighth, causality, handled with care

Okun’s Law is sometimes casually described as output “causing” unemployment. Methodologically, that is a stronger claim than estimating a relationship. The pipeline therefore treats causality as a separate module rather than as an implied conclusion. Panel causality methods are included precisely because heterogeneity and cross-sectional dependence complicate standard Granger testing in panels; the bibliography reflects that concern.

The economic purpose of the causality stage is interpretive discipline. If the evidence supports directional predictability from output to unemployment (or the reverse), that can sharpen the narrative; if not, it prevents the blog series from smuggling causality into correlation.

10. Ninth, implementation is secondary, replicability is the prize

The study notes implementation in standard tools (R/Stata) only in passing; the more important point is that the workflow is reproducible because it is consequential. It is not “run these tests because people do.” It is “run these tests because the output determines what you are allowed to estimate next, and how you are allowed to interpret it.”

That is the central methodological promise of this series: we are not offering a single regression as proof of a “law”. We are offering a pipeline that can survive a hostile audit, by you, by a reviewer, or by the data itself.

11. What comes next: Methods to evidence, then former Yugoslavia

This post has been about guardrails: stationarity checks to avoid spurious inference; break-aware testing to respect history; cointegration and ARDL/NARDL frameworks to separate short-run dynamics from long-run gravity; diagnostics and stability checks to prevent fragile stories; and a panel companion track that treats dependence and heterogeneity as central, not optional.

The next post is not another methodology post, because you’ve had enough. The next post is the application: we will empirically verify Okun’s Law using data from former Yugoslavia countries, using this pipeline as our map and our constraint. If Okun’s Law behaves, we will say so. If it doesn’t, we will not blame the economy for refusing to obey our regression.

References

Bai, J., & Perron, P. (1998). Estimating and testing linear models with multiple structural changes. Econometrica, 66(1), 47–78. DOI: https://doi.org/10.2307/2998540

Bayer, C., & Hanck, C. (2013). Combining non-cointegration tests. Journal of Time Series Analysis, 34(1), 83–95. DOI: https://doi.org/10.1111/j.1467-9892.2012.00814.x

Ditzen, J., Karavias, Y., & Westerlund, J. (2024). Multiple structural breaks in interactive effects panel data models. Journal of Applied Econometrics, 40(1), 74–88. DOI: https://doi.org/10.1002/jae.3097

Engle, R. F., & Granger, C. W. J. (1987). Co-integration and error correction: Representation, estimation, and testing. Econometrica, 55(2), 251–276. DOI: https://doi.org/10.2307/1913236

Fan, J., Liao, Y., & Yao, J. (2015). Power enhancement in high-dimensional cross-sectional tests. Econometrica, 83, 1497–1541. DOI: https://doi.org/10.3982/ECTA12749

Gregory, A. W., & Hansen, B. E. (1996). Residual-based tests for cointegration in models with regime shifts. Journal of Econometrics, 70(1), 99–126. DOI: https://doi.org/10.1016/0304-4076(69)41685-7

Hadri, K. (2000). Testing for stationarity in heterogeneous panel data. The Econometrics Journal, 3(2), 148–161. DOI: https://doi.org/10.1111/1368-423X.00043

Hodrick, R. J., & Prescott, E. C. (1997). Postwar U.S. business cycles: An empirical investigation. Journal of Money, Credit, and Banking, 29(1), 1–16. DOI: https://doi.org/10.2307/2953682

Im, K. S., Pesaran, M. H., & Shin, Y. (2003). Testing for unit roots in heterogeneous panels. Journal of Econometrics, 115(1), 53–74. DOI: https://doi.org/10.1016/S0304-4076(03)00092-7

Johansen, S. (1991). Estimation and hypothesis testing of cointegration vectors in Gaussian vector autoregressive models. Econometrica, 59(6), 1551–1580. DOI: https://doi.org/10.2307/2938278

Johansen, S. (1995). Likelihood-based inference in cointegrated vector autoregressive models. Oxford University Press. DOI: https://doi.org/10.1093/0198774508.001.0001

Juodis, A., Karavias, Y., & Sarafidis, V. (2021). A homogeneous approach to testing for Granger non-causality in heterogeneous panels. Empirical Economics, 60, 93–112. DOI: https://doi.org/10.1007/s00181-020-01970-9

Kapetanios, G., Shin, Y., & Snell, A. (2003). Testing for a unit root in the nonlinear STAR framework. Journal of Econometrics, 112(2), 359–379. DOI: https://doi.org/10.1016/S0304-4076(02)00202-6

Kao, C. (1999). Spurious regression and residual-based tests for cointegration in panel data. Journal of Econometrics, 90, 1–44. DOI: https://doi.org/10.1016/S0304-4076(98)00023-2

Levin, A., Lin, C. F., & Chu, C. S. J. (2002). Unit root tests in panel data: Asymptotic and finite-sample properties. Journal of Econometrics, 108(1), 1–24. DOI: https://doi.org/10.1016/S0304-4076(01)00098-7

Maddala, G. S., & Wu, S. (1999). A comparative study of unit root tests with panel data and a new simple test. Oxford Bulletin of Economics and Statistics, 61(S1), 631–652. DOI: https://doi.org/10.1111/1468-0084.0610s1631

Pedroni, P. (1999). Critical values for cointegration tests in heterogeneous panels with multiple regressors. Oxford Bulletin of Economics and Statistics, 61, 653–670.

Pedroni, P. (2004). Panel cointegration: Asymptotic and finite sample properties of pooled time series tests with an application to the PPP hypothesis. Econometric Theory, 20, 597–625. DOI: https://doi.org/10.1017/S0266466604203073

Pesaran, M. H. (2004). General diagnostic tests for cross-section dependence in panels. CESifo Working Paper Series No. 1229.

Pesaran, M. H. (2007). A simple panel unit root test in the presence of cross-section dependence. Journal of Applied Econometrics, 22(2), 265–312. DOI: https://doi.org/10.1002/jae.951

Pesaran, M. H. (2015). Testing weak cross-sectional dependence in large panels. Econometric Reviews, 34(6–10), 1089–1117. DOI: https://doi.org/10.1080/07474938.2014.956623

Pesaran, M. H. (2021). General diagnostic tests for cross-sectional dependence in panels. Empirical Economics, 60, 13–50. DOI: https://doi.org/10.1007/s00181-020-01875-7

Pesaran, M. H., Shin, Y., & Smith, R. J. (2001). Bounds testing approaches to the analysis of level relationships. Journal of Applied Econometrics, 16(3), 289–326. DOI: https://doi.org/10.1002/jae.616

Pesaran, M. H., & Yamagata, T. (2008). Testing slope homogeneity in large panels. Journal of Econometrics, 142(1), 50–93. DOI: https://doi.org/10.1016/j.jeconom.2007.05.010

Persyn, D., & Westerlund, J. (2008). Error correction based cointegration tests for panel data. The Stata Journal, 8(2), 232–241. DOI: https://doi.org/10.1177/1536867X0800800

Shin, Y., Yu, B., & Greenwood-Nimmo, M. (2014). Modelling asymmetric cointegration and dynamic multipliers in a nonlinear ARDL framework. In Festschrift in Honor of Peter Schmidt (pp. 281–314). Springer. DOI: https://doi.org/10.1007/978-1-4899-8008-3_9

Westerlund, J. (2007). Testing for error correction in panel data. Oxford Bulletin of Economics and Statistics, 69(6), 709–748. DOI: https://doi.org/10.1111/j.1468-0084.2007.00477.x

Zivot, E., & Andrews, D. W. K. (1992). Further evidence on the great crash, the oil-price shock, and the unit-root hypothesis. Journal of Business & Economic Statistics, 10(3), 251–270. DOI: https://doi.org/10.1080/07350015.1992.10509904

LEAVE A RESPONSE

Director of Wellington based My Statistical Consultant Ltd company. Retired Associate Professor in Statistics. Has a PhD in Statistics and over 45 years experience as a university professor, international researcher and government consultant.