Stylized Facts

Empirical insights into former Yugoslav economies

GDP Real Sector Unemployment

Croatia’s Okun Law goes to court – Part II

Reading Time: 13 minutes

Unit roots, breaks, cointegration, asymmetry and causality, Croatia’s quarterly data give testimony, not a confession.

1. From pictures to proof: Why the checklist matters

Part I let the charts do what charts do best: reveal co-movement, highlight awkward lags, and remind us that a few dramatic quarters can bend a “law” into a rumour. Part II is where the report stops admiring the scenery and starts checking the map. The logic is sequential and unforgiving: before estimating Okun’s Law in Croatia’s quarterly data, the report insists on verifying that the series behave in ways that make regression meaningful, that major disruptions are treated as structural features rather than statistical annoyances, and that any long-run relationship (if it exists) is tested rather than assumed.

The report’s pipeline reflects a basic rule of applied macroeconomics: you can’t interpret coefficients with confidence if the underlying variables are drifting, breaking, or pretending to be stable when they’re not. So it proceeds in the standard order. First: unit roots and stationarity. Then: structural breaks. Next: cointegration, where the question becomes “is there a long-run tether?” Only after that does it move to dynamic models, ARDL and nonlinear ARDL, where short-run adjustments and long-run equilibria can be disentangled. Finally, it asks a more practical question: who predicts whom? That is the role of Granger causality tests, both in a full-sample (static) and time-varying framework.

The payoff is not a single magic coefficient. It is a disciplined interpretation of what Croatia’s quarterly output and unemployment data are willing to support, especially across a period marked by large shocks and shifting regimes.

2. First, don’t trip over stationarity: What the unit-root results imply

The unit-root section is the report’s attempt to stop the analysis from committing the oldest sin in macroeconometrics: mistaking shared trends for relationships. It begins with first-generation unit-root tests, DF-GLS (Elliott–Rothenberg–Stock) and KPSS, applied to the two headline variables and their transformed versions: log GDP per capita (LGDP), the unemployment rate (UR), their first differences (DLGDP and DUR), and the HP-filtered cyclical measures (output gap and unemployment gap).

The initial message is clear and conventional. In the report’s summary of the DF-GLS and KPSS results, LGDP and UR in levels behave as non-stationary series, while their first differences are stationary. Put in plain English, output (in log levels) and unemployment (in levels) wander in ways that make “levels-on-levels” regressions potentially misleading unless a long-run equilibrium structure exists. But once differenced, the series behave like stable short-run movements. The report therefore treats both LGDP and UR as integrated of order one, I(1), and treats DLGDP and DUR as I(0).

The gaps behave differently, and that matters. Because output and unemployment gaps are constructed as deviations from trend using the HP filter, the report expects them to be more cycle-like than drift-like. The unit-root results support that expectation: output gap and unemployment gap are classified as stationary in level form, making them more natural candidates for models that aim to explain cyclical slack rather than long-run drift. That finding also foreshadows a theme that will repeat in the cointegration and ARDL sections: the gap specification tends to deliver cleaner empirical behaviour in this dataset.

The report also inserts two cautions at this stage, and both are economically meaningful. First, unit-root tests can be underpowered even with quarterly data, Croatia’s sample is larger than an annual sample, but it is still finite, and inference can be conservative. Second, and more importantly, structural breaks can distort unit-root conclusions. If the economy passes through major shocks, such as the 2008–2009 crisis and the 2020 pandemic, tests that assume a stable data-generating process can misclassify persistence as drift or stability as noise. The report therefore treats “DF-GLS says I(1)” as a starting point, not a verdict.

3. Breaks are not bad manners: Unit roots with structural change

Having raised the problem of breaks, the report then does the obvious thing: it tests for stationarity while allowing for breaks. It uses the Zivot–Andrews (ZA) test for a single endogenous break and the Clemente–Montañés–Reyes (CMR) test for two endogenous breaks, with both additive outlier (AO) and innovative outlier (IO) variants. This is not technical ornamentation. It is an explicit recognition that Croatia’s macro history includes episodes that are better thought of as regime shifts than as “large residuals.”

The ZA results, as the report summarises them, largely reinforce the earlier message for the level variables: LGDP and UR remain non-stationary in levels even when a break is permitted, while DLGDP and DUR reject the unit-root null across break specifications. In other words, allowing for one break does not magically turn the level variables into well-behaved stationary series. But the break dates identified for the differenced series cluster in the expected neighbourhood: the report notes that breakpoints are typically located around crisis periods spanning 2008–2020, which aligns with the idea that shocks alter the short-run dynamics of change even if they do not eliminate long-run persistence in levels.

For the gap variables, the break-aware results are more comforting. The report states that, under ZA, output gap and unemployment gap reject the unit-root null in all break specifications, indicating that these cyclical measures remain stationary even when structural change is allowed. That matters because the gap model is often used precisely to talk about slack in turbulent times; having gap variables that behave as stationary cycles makes the later modelling choices less fragile.

The CMR tests add another layer. The report emphasises that CMR’s two-break framework can capture both sudden breaks (AO) and gradual shifts (IO), which makes it well-suited to a period with both a long crisis/recovery arc and a sudden pandemic shock. Here, the report’s interpretation becomes more nuanced: it states that CMR results corroborate ZA findings and also show that, under certain break structures, level LGDP and UR can become stationary once two breakpoints are allowed, with breaks corresponding to episodes such as 2005Q1, 2008Q4, 2015Q3, 2019Q4, and 2020Q1 (the report lists these as breakpoints appearing in different AO/IO settings). The economic translation is straightforward: Croatia’s “persistence” in levels may partly reflect the fact that the economy’s trend and labour-market regime were not stable across the full period. In practical terms, if you try to estimate one Okun relationship across all quarters without acknowledging regime shifts, you risk averaging across different worlds.

The report’s own “discussion” conclusion from the full battery is pointed: stationarity and structural stability appear more plausible when Okun’s Law is modelled using gap specifications rather than level specifications, especially in macro data prone to shocks. If you insist on level models, the report implies, you need to take breaks seriously, or your inference will not.

4. Cointegration: Does Croatia have a long-run tether, and for which specification?

Cointegration is where the report switches from “how do the series behave?” to “do they move together in the long run?” This is not a philosophical question. It determines whether it is legitimate to talk about equilibrium relationships and error-correction dynamics, or whether the relationship is better treated as short-run association.

The report uses multiple cointegration approaches: Engle–Granger, Gregory–Hansen (allowing a structural break in the cointegrating relationship), Johansen’s system method, and the Bayer–Hanck meta framework. The intent is not to drown the reader in tests; it is to reduce the risk that a single method’s limitations drive the conclusion.

Engle–Granger delivers the first clear split between specifications. For LGDP and UR in levels, the report states there is no evidence of cointegration under both constant-only and trend-inclusive settings: the test statistics do not cross critical thresholds in either direction. Economically, this means the report does not find a stable long-run equilibrium binding output levels and unemployment levels over the entire sample in this bivariate framework.

For the gap variables, the tone changes. The report describes strong evidence of cointegration between output gap and unemployment gap, with statistics that decisively reject the null of no cointegration. The conclusion is intuitive: cyclical slack measures are constructed to represent deviations from trend, and the report finds that those deviations appear tied together by a mean-reverting relationship.

Gregory–Hansen adds a break-aware lens. The report notes a somewhat unexpected detail in the level case: only under one statistic (Zt) with LGDP as the dependent variable does it find cointegration with a structural break in trend around 2019Q2, and the report itself remarks that this break date is not the most obvious candidate based on the time-series plots. For the gap variables, the Gregory–Hansen results are more supportive: the report states that stronger support for cointegration emerges across specifications, with identified breaks clustering around 2006–2007 in that test’s single-break framework. The report also offers a plausible reconciliation: different procedures allow different numbers of breaks, and the CMR unit-root tests identify breakpoints around crisis periods such as 2008Q4 and 2020Q1 for gap series under certain models. The underlying message is not “the break date is exactly X.” It is “the relationship is not structurally stable across the entire sample, and different break structures highlight different turning points.”

Johansen largely aligns with Engle–Granger for the levels pair: across deterministic-trend models, the report states that trace and maximum eigenvalue tests fail to reject no cointegration for LGDP and UR; the cointegration rank remains at zero. The report interprets this as implying that the first-difference specification is justified for those variables: you should not impose a long-run VECM structure if Johansen finds no such equilibrium.

Then the report complicates the story, usefully, through Bayer–Hanck. It explains that in constant-only settings, evidence of cointegration between LGDP and UR is mixed and generally weak: in one direction it fails to pass combined-test thresholds, while in the reverse direction there is marginal evidence. For the gap variables, Bayer–Hanck provides strong confirmation of cointegration. But the report adds a crucial nuance: when a deterministic trend is included in Bayer–Hanck tests, the broader inferential framework suggests the earlier “no cointegration” conclusion for levels deserves revision in light of pooled evidence. This does not magically overturn Engle–Granger and Johansen. It tells you that the long-run question for levels is sensitive to specification choices and that the evidence is more supportive in some combined frameworks than in others.

The report’s “discussion of cointegration results” therefore lands in a coherent place. For the first-difference formulation, it treats the absence of robust cointegration evidence as a reason to focus on short-run dynamics rather than long-run equilibrium claims. For the gap formulation, it treats the evidence as strong enough to justify error-correction style modelling and to explore nonlinearities with less fear of spurious regression.

5. ARDL: Dynamic modelling that admits reality has lags (and dummies)

Cointegration evidence informs the modelling choice, but it does not replace it. The report then estimates a series of ARDL models for both the level/difference framework and the gap framework, using bounds testing and error-correction interpretations to decide when long-run relationships are credible.

In the LGDP–UR block, the report estimates multiple ARDL specifications with variations in deterministic components and dummy variables. The key message is asymmetric across dependent variables. When LGDP is the dependent variable, the report states that bounds tests do not support cointegration in any of the four specifications: F– and t-statistics fall below lower bounds. It also notes a temptation: these models fit extremely well in-sample (adjusted R² exceeding 0.96), but without cointegration such fit is not evidence of meaningful long-run structure. In Economist terms: an impressive-looking regression can still be politely meaningless if the underlying relationship is not anchored.

When UR is the dependent variable, the report finds stronger support. In three out of four unemployment-dependent ARDL models, bounds tests exceed upper critical values, implying a cointegrating relationship, except for one specification (an ARDL(12,1) with constant, trend, and dummies) where bounds tests fall short. The report then selects a preferred specification based on Schwarz criterion and diagnostics, noting that an ARDL(11,1) with constant and trend performs best on diagnostics even if it is not the absolute SIC-minimiser.

The economic content appears in the error-correction term. For the preferred unemployment-dependent level model, the report reports an ECM coefficient of -0.298 (p < .01). Interpreted as the report does: roughly 30% of deviation from long-run equilibrium between unemployment and GDP is corrected within one quarter. That is a meaningful speed of adjustment in quarterly terms, fast enough to matter for policy horizons, but not instant. It implies inertia: the labour market adjusts, but not at the speed of a press release.

The report also situates these findings within a broader caution: modelling GDP as dependent variable tends to produce unstable or unreliable long-run structure, while unemployment-as-dependent is more consistent with Okun’s conceptual direction and empirically better supported in this framework.

6. Gap-ARDL: Stronger long-run structure, but diagnostics still decide who gets invited back

The report’s gap-ARDL exercise is extensive: it estimates models where output gap is dependent and where unemployment gap is dependent, under four variants each (constant; constant+dummy; constant+trend; constant+trend+dummy). It uses bounds tests to assess cointegration and then reads the adjustment coefficients as speeds of convergence.

The bounds-test results are strikingly supportive when output gap is the dependent variable. The report states that for all four variants, F-statistics exceed upper bounds and t-statistics are strongly negative (ranging roughly from –4.67 to –5.98), confirming a long-run relationship. On paper, this is the sort of result that makes a modeller cheerful.

But the report refuses to stop at “on paper”. It looks at diagnostics, and this is where it becomes candid. It reports that the output-gap-dependent models perform poorly on key diagnostics: residuals fail normality tests and functional-form tests, and coefficient stability cannot be confirmed by CUSUM and CUSUMSQ. In short, the models that appear “most cointegrated” are not necessarily the models you should trust. The report therefore chooses not to report these models in detail, arguing that their diagnostic failures make them unreliable even if their bounds tests are supportive. This is applied work behaving responsibly: it refuses to confuse cointegration evidence with model adequacy.

When unemployment gap is the dependent variable, the report describes the evidence as mixed in a specific way. F-statistics support cointegration across variants (in the report’s summary, they range roughly from 8.03 to 13.10 and exceed upper bounds), but t-statistics are less definitive, with several variants falling into inconclusive territory. That means the precondition for an unambiguous VECM-style interpretation is weaker. Yet the report still finds that adjustment coefficients (error-correction terms) are consistently negative and significant across models, suggesting meaningful convergence even when the bounds-test evidence is not perfectly symmetric across statistics.

The adjustment speeds themselves tell a story the report leans into. When output gap is dependent, adjustment is fast, around –0.43 to –0.45, implying 43–45% correction per quarter. When unemployment gap is dependent, adjustment is slower, around –0.15 to –0.17, implying 15–17% correction per quarter. The report interprets this as labour-market inertia: output can adjust more flexibly than unemployment, which is constrained by institutional and behavioural rigidities.

The report ultimately selects one gap-ARDL model for detailed analysis: an ARDL(5,1) with unemployment gap as the dependent variable and constant only. In this model, the reported ECM is -0.154 (p < .01), again pointing to slow correction of unemployment slack. The long-run relationship is negative and significant, with a reported long-run coefficient around -0.747 (p = .03) for the effect of output gap on unemployment gap. In words: stronger cyclical output conditions narrow cyclical unemployment slack in the long run, Okun’s Law in its most policy-friendly formulation.

The short-run coefficient on contemporaneous output gap growth is negative but not statistically significant in this selected model, which the report interprets as evidence that unemployment gap responds sluggishly in the short run. It is the long run, via error correction and the long-run coefficient, where the relationship becomes more dependable.

7. NARDL: Asymmetry shows up where politics lives, short run pain, slower gain

After linear ARDL comes nonlinear ARDL (NARDL), where the report allows positive and negative output movements to have different effects. This is not just an econometric flourish. It speaks to a lived policy asymmetry: recessions can destroy jobs quickly; expansions do not always rebuild them at the same speed.

The report is blunt about what does not work. NARDL models with economic growth as the dependent variable fail to provide evidence of cointegration in most specifications, show instability, violate key residual assumptions, and have extremely weak explanatory power (adjusted R² in the report’s summary ranges roughly from 0.02 to 0.14). The report therefore does not report these models. In an Economist voice: the data did not consent to being modelled that way.

The models that do carry weight are those with unemployment-side dependence.

In the first-difference unemployment formulation (DUR as dependent, with GDP decomposed into positive and negative changes), the report finds strong evidence of cointegration under asymmetric dynamics. It reports an F-bounds statistic of 11.35, exceeding the 5% upper bound of 5.73, and a t-bounds statistic around -3.10 relative to a 5% critical value of -3.22, confirming a long-run relationship in that asymmetric setup. The adjustment coefficient is negative and significant (-0.112, p < .01). The report then emphasises the asymmetry: long-run asymmetry is not supported, but short-run asymmetry is robust, with the short-run asymmetry test strongly significant (F = 14.11, p < .01). A second model that includes a trend replicates the broad picture: cointegration confirmed; adjustment coefficient negative (-0.106, p < .01); short-run asymmetry significant; long-run asymmetry absent.

The report interprets this as consistent with a “jobless growth” style dynamic: the labour market reacts more strongly, at least in the short run, to negative output movements than to positive ones. In other words, contractions raise unemployment faster than expansions reduce it, and the asymmetry appears in transitory adjustments rather than in the long-run equilibrium.

Then the report turns to what it treats as the more stable and substantively informative space: the gap-based NARDL models with unemployment gap as dependent and output gap decomposed into positive and negative components. Here, the report finds decisive cointegration (for example, it reports F = 17.90 exceeding the 1% upper bound and a t-statistic around -6.75 far beyond critical values), and a strong negative adjustment coefficient (around -0.227, p < .01). But the asymmetry story flips: in the gap formulation, the report finds no evidence of asymmetry in either the short run or the long run (short-run asymmetry test not significant; long-run asymmetry essentially absent). The trend term is also insignificant in the trend-augmented version, reinforcing the idea that the cyclical output gap explains the unemployment gap without needing a drift story.

The report’s comparative conclusion across the four NARDL models is crisp: cointegration is present throughout, asymmetry appears only in the short-run dynamics of the first-difference models, and gap models are symmetric and empirically stronger, with higher adjusted R² (about 0.55 in the report’s summary) and better substantive clarity. The diagnostic issues that remain, heteroskedasticity and non-normality, are flagged, but the report argues these are less damaging than autocorrelation or functional-form errors, which are largely absent in the preferred models.

8. Causality: Who leads whom, and when?

Finally, the report asks the question that matters for forecasting and policy sequencing: does output lead unemployment, or does unemployment lead output?

It begins with static Granger causality using the Toda–Yamamoto approach, applied both to levels and to cyclical gaps. The results are unambiguous in direction. In the level pair, UR does not Granger-cause LGDP (the report gives a p-value of .92), while LGDP Granger-causes UR strongly (Chi-squared 18.71, p < .01). In the gap pair, the pattern repeats: unemployment gap does not Granger-cause output gap (p-value .69), while output gap Granger-causes unemployment gap strongly (Chi-squared 22.71, p < .01). This is the predictive form of Okun’s Law, and the report treats it as fully consistent with the conventional view: output dynamics lead labour-market dynamics, not the reverse.

But the report does not stop at full-sample tests. Because Croatia’s period includes major disruptions, it then applies time-varying Granger causality tests (labelled as LA-VAR time-varying causality with forward/rolling/recursive Wald statistics and bootstrap critical values). For LGDP → UR, the report states that test statistics exceed high-percentile critical values, rejecting no-causality robustly; it notes that the forward expanding Wald statistic exceeds critical values in subperiods such as post-GFC recovery and post-pandemic adjustment. For UR → LGDP, the report finds weaker evidence: the forward statistic does not cross critical values, while rolling and recursive statistics sometimes do, leading the report to characterise reverse causality as absent or weak and method-dependent, more a hint than a headline.

In the gap framework, time-varying tests again support Output gap → Unemployment gap robustly across statistics, while the reverse direction is borderline: the report notes a Max Wald forward statistic near the edge of significance (it reports 15.168 compared with a 99th percentile threshold 15.346), suggesting at most weak feedback from unemployment slack to output slack. The visual discussion underscores the asymmetry: output-to-unemployment causality appears persistently elevated, while reverse-direction statistics remain mostly below critical thresholds.

The report also offers a caution that is more than a technical disclaimer: with around 100 quarterly observations, asymptotic properties may be imperfect, and bootstrap replications are limited. It suggests the forward expanding window test is the most stable and interpretable in small samples, while rolling and recursive versions can be more sensitive but less stable. In policy terms, that means one should treat “reverse causality” hints as fragile, while treating output-to-unemployment predictability as the sturdier result.

9. Bridge to Part III: What this evidence sets up for policy

By the end of Part II, the report has done more than “run tests.” It has clarified which versions of Okun’s Law Croatia’s quarterly data can sustain.

First, the data behave in ways that justify treating levels cautiously and gaps more comfortably: LGDP and UR are persistently non-stationary in levels under standard tests, while gaps behave as stationary cyclical measures. Second, the cointegration story is specification-dependent: levels offer weak or mixed evidence depending on method and deterministic components, while gaps show strong long-run linkage. Third, ARDL and especially gap-based models yield meaningful error-correction dynamics: output adjusts faster than unemployment slack, consistent with labour-market inertia. Fourth, nonlinear modelling adds an important political economy nuance: asymmetry is more evident in short-run difference dynamics than in the gap framework, suggesting that recessions may do sharper short-run labour-market damage than booms repair in immediate terms. Finally, causality, static and time-varying, largely supports a unidirectional predictive structure from output (or output gap) to unemployment (or unemployment gap).

Part III will take this full stack of evidence and translate it into the “so what?” that readers actually want: what this implies for Croatia’s labour-market adjustment, macro stabilisation trade-offs, and the policy frame in a country that is both an EU member and, since 2023, a euro-area member, where credibility, constraints, and the design of countercyclical policy matter as much as the coefficient itself.

LEAVE A RESPONSE

Director of Wellington based My Statistical Consultant Ltd company. Retired Associate Professor in Statistics. Has a PhD in Statistics and over 45 years experience as a university professor, international researcher and government consultant.