XLF/XLY Financials vs Consumer Discretionary Strategy

Row-Level Dual-Model with Correlation, Gold, and Regional Bank Signals, 3-Day Hold

Author
Affiliation

Rusty Conover

Query.Farm

Published

April 16, 2026

Show code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

import sys
sys.path.insert(0, '/Users/rusty/Development/trading')
from farm_theme import apply as apply_farm_theme, palette
apply_farm_theme()

df = pd.read_csv('strategy_data.csv', parse_dates=['dt'])
df = df.sort_values('dt').reset_index(drop=True)

capital = 10000
ret_col = 'daily_ret_unscaled'
df['cum_pnl'] = (df[ret_col] * capital).cumsum()
df['drawdown'] = df['cum_pnl'] - df['cum_pnl'].cummax()
df['year'] = df['dt'].dt.year

xlf = pd.read_csv('XLF.csv', parse_dates=['Date'])
xly = pd.read_csv('XLY.csv', parse_dates=['Date'])
prices = xlf[['Date','close']].rename(columns={'close':'xlf_close'}).merge(
    xly[['Date','close']].rename(columns={'close':'xly_close'}), on='Date')
prices = prices.sort_values('Date').reset_index(drop=True)
prices['spread_ratio'] = prices['xlf_close'] / prices['xly_close']
prices = prices[prices['Date'] >= '2020-01-01']

Executive Summary

This document presents a systematic pairs trading strategy on XLF (Financial Select Sector SPDR) vs XLY (Consumer Discretionary Select Sector SPDR). The strategy predicts 3-day forward returns of the XLF-XLY spread using correlation regime delta, gold returns, and regional bank returns as signals.

This is the highest-PnL strategy in the portfolio and is profitable in every calendar year tested.

NoteKey Metrics (2020–2026)
Metric Value
Sharpe Ratio 4.65
Ann. Return 175.7%
Total P&L $17,572 on $10K
Direction Accuracy 62.5%
Max Drawdown -$1,432
Years Profitable 7 / 7
Post-10bps Sharpe 3.93

1. Strategy Overview

1.1 Economic Rationale

Financials (XLF) and consumer discretionary (XLY) are both pro-cyclical sectors but they respond to different macro drivers. XLF tracks rate expectations, credit spreads, and bank earnings; XLY tracks consumer spending and is dominated by Amazon and Tesla. They diverge based on:

  1. Correlation regime shifts: corr_delta is the 5-day change in 20-day correlation between XLF and XLY returns. Sharp shifts in their normal co-movement typically precede sector rotation. When correlation breaks down, one sector is moving on its own (idiosyncratic) drivers, and the spread becomes predictable.

  2. Gold returns: Flight-to-safety hits XLF (rate-sensitive financials, falling yields compress NIM) harder than XLY (consumer staples-adjacent durables). Gold daily moves are a leading indicator for the spread direction.

  3. Regional banks (KRE): KRE leads XLF on credit-cycle news (KRE has more loan exposure than the diversified XLF). Movements in KRE amplify or dampen the same forces hitting XLF, and provide an early read on financial stress.

  4. 3-day holding period: Sector rotation through macro signals takes 2-3 days to fully transmit. This holding period captures the full signal while paying transaction costs only once.

1.2 Features (4 inputs)

Feature Rationale
spread XLF - XLY daily log return
corr_delta 20d corr(XLF, XLY) minus the same lagged 5d – regime transition
gld_ret Gold daily return – flight-to-safety / rate proxy
kre_ret Regional banks daily return – credit cycle leader

1.3 Position Sizing and Holding

Enter when the model predicts a 3-day spread move exceeding 0.8%. Hold for 3 trading days. Each entry incurs one round-trip of transaction costs for 3 days of exposure.

2. Performance Analysis

2.1 P&L and Spread

Show code
fig, (ax1, ax2, ax3) = plt.subplots(3, 1, figsize=(10, 9), sharex=True,
                                     gridspec_kw={'height_ratios': [2, 1.5, 1.5]})

ax1.plot(df['dt'], df['cum_pnl'], color='#1565C0', linewidth=1.5)
ax1.fill_between(df['dt'], 0, df['cum_pnl'], alpha=0.1, color='#1565C0')
ax1.axhline(y=0, color='gray', linewidth=0.5, linestyle='--')
ax1.set_ylabel('Cumulative P&L ($)')
ax1.set_title('Cumulative P&L ($10K Capital)')
ax1.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'${x:,.0f}'))

ax2.plot(prices['Date'], prices['xlf_close'], color='#1565C0', linewidth=1, label='XLF')
ax2.plot(prices['Date'], prices['xly_close'], color='#E65100', linewidth=1, label='XLY')
ax2.set_ylabel('Price ($)')
ax2.set_title('XLF and XLY Prices')
ax2.legend(loc='upper left', fontsize=9)

ax3.plot(prices['Date'], prices['spread_ratio'], color='#2E7D32', linewidth=1)
ax3.axhline(y=prices['spread_ratio'].mean(), color='gray', linewidth=0.5, linestyle='--',
            label=f'Mean: {prices["spread_ratio"].mean():.3f}')
ax3.set_ylabel('XLF / XLY')
ax3.set_title('Spread Ratio')
ax3.legend(loc='upper left', fontsize=9)

for ax in [ax1, ax2, ax3]:
    for yr in range(df['dt'].dt.year.min(), df['dt'].dt.year.max() + 2):
        ax.axvline(x=pd.Timestamp(f'{yr}-01-01'), color='gray', linewidth=0.3, linestyle=':')

ax3.set_xlim(df['dt'].min(), df['dt'].max())
plt.show()
Figure 1: Cumulative P&L (top), XLF and XLY prices (middle), and spread ratio (bottom)

2.2 Drawdown

Show code
fig, ax = plt.subplots(figsize=(10, 4), constrained_layout=True)
ax.fill_between(df['dt'], df['drawdown'], 0, color='#E53935', alpha=0.4)
ax.set_ylabel('Drawdown ($)')
ax.set_title(f'Drawdown — Max: ${df["drawdown"].min():,.0f}')
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'${x:,.0f}'))
ax.set_xlim(df['dt'].min(), df['dt'].max())
plt.show()
Figure 2: Underwater equity curve

2.3 Yearly Performance

Show code
df2020 = df[df['dt'] >= '2020-01-01']

yearly = df2020.groupby('year').agg(
    traded=('active', 'sum'),
    pnl=(ret_col, lambda x: (x * capital).sum()),
    ret_mean=(ret_col, lambda x: x[x != 0].mean() if (x != 0).any() else 0),
    ret_std=(ret_col, lambda x: x[x != 0].std() if (x != 0).sum() > 1 else 1),
).reset_index()
yearly['sharpe'] = yearly['ret_mean'] / yearly['ret_std'] * np.sqrt(252)

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4), constrained_layout=True)

colors = ['#E53935' if p < 0 else '#43A047' for p in yearly['pnl']]
ax1.bar(yearly['year'], yearly['pnl'], color=colors, alpha=0.7)
ax1.axhline(y=0, color='gray', linewidth=0.5)
ax1.set_title('Yearly P&L')
ax1.set_ylabel('P&L ($)')
ax1.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'${x:,.0f}'))

colors_s = ['#E53935' if s < 0 else '#43A047' for s in yearly['sharpe']]
ax2.bar(yearly['year'], yearly['sharpe'], color=colors_s, alpha=0.7)
ax2.axhline(y=0, color='gray', linewidth=0.5)
ax2.axhline(y=1, color='green', linewidth=0.5, linestyle='--', alpha=0.5)
ax2.set_title('Yearly Sharpe Ratio')
ax2.set_ylabel('Sharpe')

plt.show()
Figure 3: Yearly P&L and Sharpe ratios – profitable in every year tested

2.4 Monthly Returns Heatmap

Show code
df2020 = df[df['dt'] >= '2020-01-01'].copy()
df2020['month'] = df2020['dt'].dt.month
df2020['yr'] = df2020['dt'].dt.year
monthly = df2020.groupby(['yr', 'month']).agg(pnl=(ret_col, lambda x: (x * capital).sum())).reset_index()
pivot = monthly.pivot(index='yr', columns='month', values='pnl').fillna(0)

fig, ax = plt.subplots(figsize=(10, 4), constrained_layout=True)
im = ax.imshow(pivot.values, cmap='RdYlGn', aspect='auto', vmin=-1500, vmax=1500)
ax.set_xticks(range(12))
ax.set_xticklabels(['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec'])
ax.set_yticks(range(len(pivot.index)))
ax.set_yticklabels(pivot.index)
ax.set_title('Monthly P&L Heatmap')

for i in range(len(pivot.index)):
    for j in range(12):
        val = pivot.values[i, j]
        if abs(val) > 10:
            color = 'white' if abs(val) > 700 else 'black'
            ax.text(j, i, f'${val:.0f}', ha='center', va='center', fontsize=8, color=color)

plt.colorbar(im, ax=ax, label='P&L ($)', shrink=0.8)
plt.show()
Figure 4: Monthly P&L heatmap (2020–2026)

3. Risk Analysis

3.1 Return Distribution

Show code
traded = df2020[df2020['active'] == 1]
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4), constrained_layout=True)

rets = traded[ret_col] * 100
ax1.hist(rets, bins=50, color='#1565C0', alpha=0.7, edgecolor='white', linewidth=0.3)
ax1.axvline(x=rets.mean(), color='red', linewidth=1, linestyle='--', label=f'Mean: {rets.mean():.3f}%')
ax1.axvline(x=0, color='gray', linewidth=0.5)
ax1.set_title('Return Distribution (3-day holds)')
ax1.set_xlabel('Return (%)')
ax1.legend()

from scipy import stats
stats.probplot(rets.dropna(), dist="norm", plot=ax2)
ax2.set_title('Q-Q Plot vs Normal')
ax2.get_lines()[0].set_markerfacecolor('#1565C0')
ax2.get_lines()[0].set_markersize(3)

plt.show()
Figure 5: Return distribution (3-day holds)

3.2 Rolling Metrics

Show code
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 6), constrained_layout=True, sharex=True)

roll_mean = df2020[ret_col].rolling(63).apply(lambda x: x[x!=0].mean() if (x!=0).any() else 0)
roll_std = df2020[ret_col].rolling(63).apply(lambda x: x[x!=0].std() if (x!=0).sum() > 5 else np.nan)
rolling_sharpe = roll_mean / roll_std * np.sqrt(252)

ax1.plot(df2020['dt'], rolling_sharpe, color='#43A047', linewidth=1)
ax1.axhline(y=0, color='gray', linewidth=0.5, linestyle='--')
ax1.axhline(y=1, color='green', linewidth=0.5, linestyle='--', alpha=0.5)
ax1.set_title('Rolling 63-day Sharpe Ratio')
ax1.set_ylabel('Sharpe')
ax1.set_ylim(-10, 20)

df2020_copy = df2020.copy()
df2020_copy['correct'] = (df2020_copy['active'] == 1) & (np.sign(df2020_copy['pred']) == np.sign(df2020_copy['spread_ret']))
rolling_acc = df2020_copy['correct'].rolling(63).mean() * 100
ax2.plot(df2020['dt'], rolling_acc, color='#FF8F00', linewidth=1)
ax2.axhline(y=50, color='gray', linewidth=0.5, linestyle='--')
ax2.set_title('Rolling 63-day Direction Accuracy')
ax2.set_ylabel('Accuracy (%)')
ax2.set_xlim(df2020['dt'].min(), df2020['dt'].max())

plt.show()
Figure 6: Rolling Sharpe and accuracy

4. Detailed Statistics

Show code
traded = df2020[df2020['active'] == 1]
total_pnl = (df2020[ret_col] * capital).sum()
sharpe = traded[ret_col].mean() / traded[ret_col].std() * np.sqrt(252)
downside = traded.loc[traded[ret_col] < 0, ret_col]
sortino = traded[ret_col].mean() / np.sqrt((downside**2).mean()) * np.sqrt(252)
max_dd = df2020['drawdown'].min()
wins = traded[traded[ret_col] > 0][ret_col]
losses = traded[traded[ret_col] < 0][ret_col]

stats_dict = {
    'Period': f'{df2020["dt"].min().strftime("%Y-%m-%d")} to {df2020["dt"].max().strftime("%Y-%m-%d")}',
    'Traded Periods': len(traded),
    'Total P&L': f'${total_pnl:,.0f}',
    'Sharpe Ratio': f'{sharpe:.2f}',
    'Sortino Ratio': f'{sortino:.2f}',
    'Max Drawdown': f'${max_dd:,.0f}',
    'MAR Ratio': f'{traded[ret_col].mean() * 252 / abs(max_dd / capital):.2f}',
    'Direction Accuracy': f'{(np.sign(traded["pred"]) == np.sign(traded["spread_ret"])).mean()*100:.1f}%',
    'Win/Loss Ratio': f'{abs(wins.mean()/losses.mean()):.2f}',
    'Holding Period': '3 trading days',
    'p/n Ratio': '0.02 (4 dims / 199 samples)',
}

pd.DataFrame(list(stats_dict.items()), columns=['Metric', 'Value']).style.hide(axis='index')
Table 1
Metric Value
Period 2020-01-08 to 2026-04-15
Traded Periods 272
Total P&L $17,572
Sharpe Ratio 4.65
Sortino Ratio 5.33
Max Drawdown $-1,432
MAR Ratio 11.37
Direction Accuracy 62.5%
Win/Loss Ratio 1.33
Holding Period 3 trading days
p/n Ratio 0.02 (4 dims / 199 samples)
Show code
yearly_data = []
for yr in sorted(df2020['year'].unique()):
    ydf = df2020[df2020['year'] == yr]
    yt = ydf[ydf['active'] == 1]
    if len(yt) == 0:
        continue
    pnl = (ydf[ret_col] * capital).sum()
    s = yt[ret_col].mean() / yt[ret_col].std() * np.sqrt(252) if yt[ret_col].std() > 0 else 0
    ds = yt.loc[yt[ret_col] < 0, ret_col]
    so = yt[ret_col].mean() / np.sqrt((ds**2).mean()) * np.sqrt(252) if len(ds) > 0 else 0
    acc = (np.sign(yt['pred']) == np.sign(yt['spread_ret'])).mean() * 100
    yearly_data.append({
        'Year': yr, 'Traded': len(yt), 'Sat Out': len(ydf) - len(yt),
        'Accuracy': f'{acc:.1f}%', 'P&L': f'${pnl:,.0f}',
        'Sharpe': f'{s:.2f}', 'Sortino': f'{so:.2f}'
    })

pd.DataFrame(yearly_data).style.hide(axis='index')
Table 2
Year Traded Sat Out Accuracy P&L Sharpe Sortino
2020 47 65 63.8% $5,145 5.41 6.60
2021 45 65 73.3% $4,007 8.56 9.85
2022 63 44 61.9% $5,067 5.22 5.84
2023 51 61 58.8% $1,434 2.55 2.99
2024 30 81 50.0% $542 1.62 1.72
2025 29 77 58.6% $488 1.84 1.73
2026 7 28 85.7% $889 13.09 10.87

5. Strategy Construction

5.1 Model Architecture

5.2 Why 3-Day Holding Works

Sector rotation via macro signals is fundamentally slower than within-sector mean-reversion:

  • Correlation regime shifts propagate through positioning and rebalancing flows over multiple days
  • Gold/safety signals affect rate-sensitive financials with a 1-2 day lag through the yield curve
  • Regional bank moves spread to diversified XLF over 1-3 days as the credit story develops

Daily trading on this pair has weak signal (Sharpe 0.69 gross, negative net of costs); the 3-day hold improves to 4.65 because:

  1. The signal is stronger over 3 days (62.5% accuracy vs ~51% daily)
  2. Transaction costs are paid once for 3 days of exposure
  3. Daily noise washes out, leaving the macro signal

5.3 Model Code

class Aggregate:
    @staticmethod
    def finalize(table, params):
        if table.num_rows < 2:
            return None
        data = table.to_pandas().values.astype(np.float64)
        n, nc = data.shape
        seed = int(params.get('seed', 42))
        conf_thresh = params.get('conf', 0.60)
        min_move = params.get('min_move', 0.008)
        fc = int(params.get('fwd_col', nc - 1))  # last col = target
        hold = int(params.get('hold', 3))

        if n < 10 + hold:
            return None

        X = data[:-(hold), :fc]   # features
        y_ret = data[hold:, fc]   # 3-day forward spread return

        if np.any(np.isnan(X)) or np.any(np.isnan(y_ret)):
            return 0.0

        y_dir = (y_ret > 0).astype(int)
        last = data[-1:, :fc]

        from sklearn.linear_model import LogisticRegression, Ridge
        from sklearn.pipeline import make_pipeline
        from sklearn.preprocessing import StandardScaler

        if len(set(y_dir)) < 2:
            return 0.0

        clf = make_pipeline(
            StandardScaler(),
            LogisticRegression(C=0.1, max_iter=1000, random_state=seed)
        )
        clf.fit(X, y_dir)
        prob_up = clf.predict_proba(last)[0][1]

        reg = make_pipeline(StandardScaler(), Ridge(alpha=1.0))
        reg.fit(X, y_ret)
        pred_mag = abs(float(reg.predict(last)[0]))

        if pred_mag < min_move:
            return 0.0

        if prob_up > conf_thresh:
            return pred_mag
        elif prob_up < (1.0 - conf_thresh):
            return -pred_mag
        else:
            return 0.0

6. Limitations and Risks

  1. Best years are 2020-2022 ($14,219 of $17,572 total). Strategy made progressively less in 2023-2025 ($542 in 2024). Recent regime may have arbitraged out the easiest signal.

  2. 272 trades over 6.3 years (~43/year). With 3-day holds, this is about 14 independent entries per year. Statistical confidence on 62.5% accuracy with 272 trades has a 95% CI of roughly 56-68%.

  3. Sharpe of 4.65 is suspiciously high. Selected via grid search over holding periods and min_move thresholds. True out-of-sample Sharpe is likely lower.

  4. Sector composition risk: XLF and XLY composition drifts over time. XLY is now ~35% Amazon+Tesla; if those names decouple from broader consumer discretionary, the spread dynamics shift.

  5. Seed sensitivity: Zero – deterministic.

7. Reproducibility

bash scripts/run_backtest.sh
bash tests/test_backtest.sh

Parameters

Parameter Value
Training window 200 days
Confidence threshold 0.60
Min predicted move 0.008 (0.8% over 3 days)
Holding period 3 trading days
Position sizing Binary (100%)
Gates None
LogReg C 0.1
Ridge alpha 1.0

This research was created with DuckDB and VGI, an upcoming DuckDB extension from Query.Farm that allows custom aggregate functions to be written in any language with an Apache Arrow implementation.