Stylized Facts

Empirical insights into former Yugoslav economies

GDP Real Sector Unemployment

Okun’s Law, Balkan edition: A methodology guide for Serbia (without the pain) – Part II

Reading Time: 7 minutes

From “is this data even usable?” to “does growth lead jobs?”, one careful step at a time.

1. From charts to checks: when intuition stops being enough

Graphical inspection is a useful first pass, but it can also be a trap: Serbia’s recent economic history is full of shocks and discontinuities, and those features can fool a simple regression into “finding” relationships that are really just shared turbulence. That is why the study moves from pictures to diagnostics, first asking whether the series behave in ways that allow meaningful inference, then asking whether the relationship is long-run or merely episodic, and only then deciding what kind of model is justified. The pipeline is explicit: stationarity first, cointegration second, dynamic models third, and causality last.

2. First, don’t trip over stationarity: What kind of series are we dealing with?

The study begins the formal results with unit root testing because everything downstream depends on it. If a series wanders without a stable mean (nonstationary), regressions can look “significant” even when the underlying relationship is spurious. Conversely, if the series is stationary, short-run models can be run directly without extra machinery.

Here the study applies two familiar “first-generation” tests, DF-GLS and KPSS, explicitly as complementary lenses, because they reverse the burden of proof: DF-GLS tests for a unit root, KPSS tests for stationarity. When both broadly agree, the analyst can proceed with more confidence; when they diverge, the model choice needs extra care.

What do they say in Serbia’s case? In levels, log GDP (LGDP) and the unemployment rate (UR) are characterised as nonstationary and best treated as I(1) processes, meaning they are likely stationary only after differencing. In contrast, the output gap and unemployment gap (constructed via HP filtering in this study) appear stationary in levels, i.e., I(0). The unemployment gap, in particular, is described as showing strong stationarity evidence under both tests.

That split matters economically, not just statistically. It suggests two different “worlds” inside the same dataset. In the levels world (LGDP and UR), long-run drift dominates; trends and breaks can overwhelm cyclical co-movement. In the gaps world (output gap and unemployment gap), the series are already defined as deviations from trend and therefore behave more like cyclical objects. That creates an immediate modelling implication: the study can treat gap models as short-run relationships among stationary variables, while level models need either differencing or a credible long-run framework (cointegration) before the coefficients mean what we want them to mean.

The study also motivates break-aware unit-root tests, Zivot-Andrews, Clemente-Montañés-Reyes, Lee-Strazicich, and Kapetanios-Shin-Snell, explicitly because Serbia’s “turbulent economic history” makes it plausible that shocks altered the time-series properties of GDP and unemployment. In other words: if breaks are real, ignoring them can make a stationary series look nonstationary, or the other way around.

3. Then, don’t assume a long run: Do output and unemployment actually cointegrate?

Once the stationarity picture is drawn, the next question is whether GDP and unemployment share a stable long-run relationship despite short-run noise, that is the practical meaning of cointegration in this context. The study’s approach is deliberately plural: it begins with Engle-Granger as a baseline, adds Gregory-Hansen to allow a break in the cointegrating relationship, and uses Johansen’s system approach, followed by Bayer-Hanck as a combined, more conservative check. The point is not to drown readers in tests; it is to reduce the risk that one fragile method dictates the story.

The study’s cointegration results are, bluntly, not a clean victory parade.

Engle-Granger finds no cointegration in either direction across specifications, which points toward the absence of a stable long-run relationship under the assumption of no breaks.

Gregory-Hansen, which allows for a structural break, offers at best weak support, with one case described as “close” to significance under a regime-trend shift specification. This is presented as suggestive rather than definitive, and the broader message is that cointegration detection is sensitive to both specification and how structural change is treated.

Johansen is presented as potentially more robust here because it treats the relationship as a system rather than a single equation and can incorporate deterministic components and diagnostics more flexibly. In the study’s synthesis, Johansen appears to indicate cointegration in some specifications, while the other approaches are more reluctant.

Then comes the Bayer-Hanck “meta” test, designed to combine information from multiple cointegration tests and therefore to be, in spirit, harder to impress. It delivers the clearest veto: no evidence to reject the null of no cointegration across the four model variants, both for first-difference models and gap-based models, with or without deterministic trend. The study stresses that the combined statistic falls short of the critical thresholds, and it interprets this as a conservative judgement that earlier “positive” findings from single tests may be spurious or sample-specific.

The study is candid about why this is messy. The sample is small, only about 29 annual observations for GDP in the stated window, so asymptotic tests can misbehave: over-rejecting or under-rejecting depending on specification. And structural breaks, 1999 being explicitly modelled via a dummy in later frameworks, complicate the notion of a single stable long-run equilibrium.

The practical implication is not “cointegration never exists.” It is: in this dataset, under this time span, the evidence for a stable long-run Okun relationship is not robust enough to treat as the backbone of the empirical story. That conclusion matters because it disciplines the modelling: it pushes the analysis toward short-run dynamics rather than error-correction narratives that assume a stable long-run anchor.

4. A small equation, a big decision: What do we estimate once cointegration looks shaky?

Here is the quiet heart of the methodology: you do not choose ARDL/VECM/ECM because they are fashionable; you choose them because the data earn them.

The study’s logic is explicit. Where cointegration is not confirmed robustly, especially after Bayer-Hanck, the “first-difference models” become “econometrically more appropriate” because they do not presume a stable long-run relationship. Economically, the study links this to the realities of a transition economy where rigidities, structural unemployment, and repeated shocks can disrupt long-term stability between output and unemployment.

A minimalist template of the kind of error-correction form that would be warranted only under cointegration looks like this (in spirit, not as the study’s main focus):

\Delta y_t = \alpha + \lambda (y_{t-1} - \beta x_{t-1}) + \sum \phi_i \Delta y_{t-i} + \sum \theta_j \Delta x_{t-j} + u_t

The key object is the term in parentheses: the “long-run gap” from equilibrium. If cointegration is weak or unstable, that term becomes a strong claim on thin evidence.

This is why, in the study’s interpretation, both the difference model and the gap model should be specified without an error-correction term when the broader evidence does not support long-run equilibrium.

5. ARDL as a disciplined compromise: Dynamic relationships without pretending to see forever

The ARDL framework is introduced as a pragmatic tool for modelling both short- and long-run dynamics in small samples and with mixed integration orders. The study emphasises ARDL’s suitability in such contexts and then extends it to NARDL to allow asymmetric effects, whether the labour market reacts differently to positive and negative output changes. That asymmetry motivation is not cosmetic; it is tied to Serbia-specific considerations mentioned in the study, like segmentation and downward wage rigidity.

When the study turns to the Serbia ARDL exercise, it is explicit about scope: four annual models for 1995–2023, two in first differences (LGDP and UR as dependent variables in separate models) and two in gaps (output gap and unemployment gap), all with a 1999 dummy (D1999) to account for the Kosovo war and NATO intervention, and lag selection guided by SIC.

The economics of this step is straightforward: if the long run is unreliable, you focus on how shocks transmit over time, with lags, persistence, and possibly break effects, rather than insisting on a single timeless coefficient. The D1999 dummy plays an important role in that narrative: it is an explicit acknowledgement that Serbia’s macro-labour relationship likely has “event-time” features that a smooth model would otherwise misread.

6. Causality last: Does growth “lead” unemployment, or is the story weaker than that?

After stationarity and cointegration diagnostics, and after dynamic modelling frameworks are considered, the study runs Granger causality tests, using the Toda-Yamamoto approach. The purpose here is not philosophical causality; it is predictive precedence: do past values of one variable help forecast the other? The study motivates this step explicitly, noting that while Okun’s Law is often interpreted as output affecting unemployment, reverse causality is plausible, especially where labour market conditions influence investment, productivity, or social stability.

The punchline is striking in its consistency. For the gap models, there is no evidence of causality in either direction: in levels, p-values for unemployment gap → output gap and output gap → unemployment gap are reported as far from conventional significance; the same holds for first differences. The absence persists even after including the D1999 dummy. The study interprets this uniform insignificance as an absence of predictive power either way, GDP does not forecast unemployment in the short run, and unemployment does not forecast GDP, at least in this framework and sample.

The broader synthesis goes further: the study states that the Toda-Yamamoto tests offer no evidence of unidirectional or bidirectional causality between GDP and unemployment for the period, and that this is consistent with the Bayer-Hanck finding of no cointegration, i.e., no short-run predictive link and no stable long-run anchor. It explicitly notes the small sample size and the possibility that structural breaks can suppress detectable causal links, even with D1999 included. But it also argues that the null results are robust across configurations, increasing confidence in the finding.

That leads to a hard-nosed modelling implication: frameworks that “posit a causal or cointegrating link” (error-correction and cointegration-based approaches) are “not empirically justified” here, and both difference and gap versions should focus on short-run interactions without error correction.

And it leads to a policy implication that is as uncomfortable as it is useful: if neither direction has predictive precedence in the tested sense, then growth-first policy narratives, “stimulate GDP and unemployment will fall”, may have limited effect unless paired with structural reforms. The study interprets the decoupling as potentially reflecting deeper labour-market features such as informality, rigidities, or mismatch.

7. What Part III will do next

Part II has been deliberately narrow: it follows the study from unit-root results through cointegration, dynamic modelling justification, and causality, without jumping into the big policy wrap-up. Part III will do that integration: it will pull together the graphical evidence and the formal results into a single interpretation of what Serbia’s Okun relationship does, and, more importantly, what it fails to do, then translate that into labour-market, macroeconomic, and financial-policy implications consistent with the study’s framing.

If Part II feels like a checklist, that is the point. In a dataset this short, and in an economy with this many breaks, the biggest mistake is not statistical. It is storytelling: pretending the data are offering a clean law when they are, at best, offering a conditional tendency that comes and goes with the regime.

LEAVE A RESPONSE

Director of Wellington based My Statistical Consultant Ltd company. Retired Associate Professor in Statistics. Has a PhD in Statistics and over 45 years experience as a university professor, international researcher and government consultant.