Logo

Goal Signal

AI-Powered Match Analysis

© 2025 Goal Signal

Statistics
📅 December 5, 2025⏱️ 8 min read

How to Calculate Expected Goals (xG) in Football

Learning how to calculate xG is essential for anyone serious about football analytics. While professional systems use complex machine learning models, understanding the xG calculation formula and key factors helps analysts make better predictions and evaluations. This guide explains the xG calculati

✍️

Gol Sinyali

Editör

How to Calculate Expected Goals (xG) in Football - Golsinyali Blog Görseli

How to Calculate Expected Goals (xG) in Football

Introduction

Learning how to calculate xG is essential for anyone serious about football analytics. While professional systems use complex machine learning models, understanding the xG calculation formula and key factors helps analysts make better predictions and evaluations. This guide explains the xG calculation process, from basic methods to advanced models.

Basic xG Calculation Methodology

Core Concept

Expected Goals Formula (Simplified):

xG = Probability(Goal | Shot Characteristics)

Translation: What is the probability this shot results in a goal, given its characteristics?

Key Input Variables

1. Distance from Goal

  • 6 yards: ~70% goal probability (xG: 0.7)
  • 12 yards: ~20-30% (xG: 0.2-0.3)
  • 18 yards (penalty spot): ~79% from penalty (xG: 0.79)
  • 25+ yards: ~2-5% (xG: 0.02-0.05)

2. Angle to Goal

  • Central (0-15°): Highest xG
  • Moderate angle (15-30°): Medium xG
  • Wide angle (30-45°+): Low xG

3. Body Part Used

  • Foot: Standard xG
  • Header: -0.05 to -0.10 modifier (headers are harder)
  • Other (chest, etc.): Rare, very low xG

4. Shot Type

  • Open play: Standard
  • Counter-attack: +0.05 to +0.10 (less defensive organization)
  • Set piece: Varies (free kick ~0.05, penalty ~0.79)

5. Assist Type

  • Through ball: +0.10 to +0.15 (1-on-1 situations)
  • Cross: -0.05 to -0.10 (usually headers)
  • Cut-back: +0.05 to +0.10 (better shooting angles)
  • Individual creation: Standard

Manual xG Calculation Example

Shot Scenario:

Distance: 12 yards
Angle: 20° (fairly central)
Body Part: Right foot
Situation: Through ball, 1-on-1 with goalkeeper
Defenders: Only goalkeeper to beat

Step-by-Step Calculation:

Step 1: Base xG from Distance

12 yards → Base xG: 0.25

Step 2: Angle Adjustment

20° central angle → Multiplier: 1.2
Adjusted xG: 0.25 × 1.2 = 0.30

Step 3: Situation Adjustment

Through ball (1-on-1) → +0.15 bonus
Adjusted xG: 0.30 + 0.15 = 0.45

Step 4: Body Part Adjustment

Foot (not header) → No penalty
Final xG: 0.45

Result: xG = 0.45 (45% chance of goal)

Machine Learning xG Models

How Professional Systems Calculate xG

Data Requirements:

Training Data:

  • Minimum: 10,000+ historical shots
  • Ideal: 100,000+ shots from multiple seasons
  • Variables: 15-50 features per shot

Machine Learning Algorithm: Most common: Logistic Regression or XGBoost (Gradient Boosting)

Feature Engineering

Input Features (Variables):

Basic Features:

  1. Shot distance (continuous)
  2. Shot angle (continuous)
  3. Body part (categorical: foot/header/other)
  4. Assist type (categorical: cross/through ball/cutback/none)
  5. Game situation (categorical: open play/counter/set piece)

Advanced Features: 6. Defender pressure (0-5 scale) 7. Goalkeeper position (x, y coordinates) 8. Shot speed (if tracking data available) 9. Previous action (dribble/pass/rebound) 10. Game state (score difference, time remaining)

Spatial Features: 11. X coordinate (distance from goal line) 12. Y coordinate (lateral position) 13. Distance to nearest defender 14. Number of defenders in shot path

Contextual Features: 15. Team strength (Elo rating or similar) 16. Player quality (career goals, xG overperformance)

Sample Python xG Model

Logistic Regression Model:

import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

# Load historical shot data
shots = pd.read_csv('shots_data.csv')

# Features
X = shots[['distance', 'angle', 'is_header',
           'is_through_ball', 'is_cross',
           'defenders_in_cone']]

# Target (1 = goal, 0 = no goal)
y = shots['is_goal']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train model
model = LogisticRegression()
model.fit(X_train, y_train)

# Predict xG for new shot
new_shot = [[12, 20, 0, 1, 0, 1]]
# distance=12, angle=20, not header, through ball, not cross, 1 defender

xG = model.predict_proba(new_shot)[0][1]
print(f'xG: {xG:.2f}')  # e.g., 0.42

XGBoost Model (More Accurate):

from xgboost import XGBClassifier

# Train XGBoost
xgb_model = XGBClassifier(
    max_depth=6,
    learning_rate=0.1,
    n_estimators=100
)

xgb_model.fit(X_train, y_train)

# Predict
xG_xgb = xgb_model.predict_proba(new_shot)[0][1]
print(f'xG (XGBoost): {xG_xgb:.3f}')  # e.g., 0.456

Step-by-Step: Building Your Own xG Model

Step 1: Collect Data

Data Sources:

  • StatsBomb Open Data: Free, high-quality event data
  • Wyscout: Academic use available
  • Scraping: Understat.com (legal gray area, check ToS)

Minimum Data:

  • 5,000+ shots
  • Shot location (x, y coordinates)
  • Shot outcome (goal/no goal)
  • Body part, assist type

Step 2: Feature Engineering

Calculate Distance:

import numpy as np

def calculate_distance(x, y, goal_x=120, goal_y=40):
    """
    Calculate distance from shot to goal center.
    Assuming pitch: 120 yards long, 80 yards wide
    """
    distance = np.sqrt((goal_x - x)**2 + (goal_y - y)**2)
    return distance

shots['distance'] = shots.apply(
    lambda row: calculate_distance(row['x'], row['y']),
    axis=1
)

Calculate Angle:

def calculate_angle(x, y, goal_x=120, goal_y_min=36, goal_y_max=44):
    """
    Calculate angle to goal (in degrees).
    Goal posts at y=36 and y=44 (8 yards wide)
    """
    angle_left = np.arctan2(goal_y_min - y, goal_x - x)
    angle_right = np.arctan2(goal_y_max - y, goal_x - x)
    angle = abs(angle_right - angle_left)
    return np.degrees(angle)

shots['angle'] = shots.apply(
    lambda row: calculate_angle(row['x'], row['y']),
    axis=1
)

Step 3: Train Model

Use scikit-learn:

from sklearn.ensemble import RandomForestClassifier

# Prepare data
features = ['distance', 'angle', 'is_header', 'is_through_ball']
X = shots[features]
y = shots['is_goal']

# Train Random Forest
rf_model = RandomForestClassifier(
    n_estimators=100,
    max_depth=10,
    random_state=42
)

rf_model.fit(X, y)

Step 4: Calibrate and Validate

Calibration: Check if predicted probabilities match actual frequencies.

from sklearn.calibration import calibration_curve

# Get predictions
y_pred_proba = rf_model.predict_proba(X_test)[:, 1]

# Calibration curve
fraction_of_positives, mean_predicted_value = calibration_curve(
    y_test, y_pred_proba, n_bins=10
)

# Plot
import matplotlib.pyplot as plt
plt.plot(mean_predicted_value, fraction_of_positives, marker='o')
plt.plot([0, 1], [0, 1], linestyle='--')  # Perfect calibration
plt.xlabel('Mean Predicted Probability')
plt.ylabel('Fraction of Positives')
plt.title('Calibration Curve')
plt.show()

Validation Metrics:

  • Accuracy: ~70-75% for good models (not ideal metric)
  • Log Loss: ~0.35-0.40 (lower is better)
  • Brier Score: ~0.08-0.10 (lower is better)

Step 5: Make Predictions

# New shot
new_shot_data = {
    'distance': 14,
    'angle': 25,
    'is_header': 0,
    'is_through_ball': 1
}

new_shot_df = pd.DataFrame([new_shot_data])
xG_prediction = rf_model.predict_proba(new_shot_df)[0][1]

print(f'Predicted xG: {xG_prediction:.3f}')
# Output: Predicted xG: 0.387

Advanced xG Calculations

1. Tracking Data Integration

With Player/Ball Tracking:

  • Defender positions (x, y for each player)
  • Goalkeeper positioning
  • Ball velocity

Enhanced Formula:

xG = f(distance, angle, defenders_in_cone,
        goalkeeper_position, ball_speed, shot_technique)

Example:

Standard Shot: 12 yards, central → xG: 0.30
With Tracking Data:
- 2 defenders blocking (in shooting cone)
- Goalkeeper off-line (poor positioning)
- Adjusted xG: 0.25 (defenders) × 1.15 (GK error) = 0.29

2. Player-Specific xG

Adjust for Player Quality:

Player xG = Base xG × Player Finishing Modifier

Example:

Base xG: 0.40
Haaland (elite finisher, +15% modifier): 0.40 × 1.15 = 0.46 xG
Average player: 0.40 xG
Poor finisher (-10% modifier): 0.40 × 0.90 = 0.36 xG

Note: Most public xG models don't adjust for player quality to remain objective.

3. Post-Shot xG (PSxG)

Calculation:

PSxG = f(shot_trajectory, shot_speed, goalkeeper_position)

Process:

  1. Track ball trajectory after shot
  2. Determine target location (e.g., top corner)
  3. Calculate save probability based on:
    • Shot power
    • Goalkeeper reaction time
    • Goalkeeper positioning

Example:

Pre-shot xG: 0.30 (12 yards, central)
Shot goes top corner → PSxG: 0.75 (much harder to save)
Goalkeeper saved it → Exceptional save!

Common Calculation Errors

Error 1: Not Accounting for Defenders

Wrong Model: Only uses distance and angle.

Better Model: Includes number of defenders in shooting cone.

Impact:

Shot: 12 yards, 0° angle
Wrong xG: 0.40 (ignores defenders)
Correct xG: 0.25 (3 defenders blocking)

Error 2: Small Training Dataset

Problem: Training on only 1,000 shots → Overfitting

Solution: Minimum 5,000 shots, ideally 20,000+

Error 3: Not Validating Calibration

Problem: Model predicts well but probabilities are miscalibrated.

Example:

  • Model says 0.50 xG
  • In reality, only 35% of such shots score
  • Issue: Overconfident predictions

Solution: Use calibration curves and recalibrate if needed.

Conclusion

Calculating xG requires understanding shot characteristics, historical data, and machine learning models. While basic manual calculations provide rough estimates, professional xG models use thousands of shots and advanced algorithms for accuracy.

Key Takeaways:

  1. Distance and angle are the most important factors
  2. Machine learning (Logistic Regression, XGBoost) powers modern xG
  3. Feature engineering (defenders, assist type) improves accuracy
  4. Calibration ensures predictions match reality
  5. Advanced models incorporate tracking data and player quality

Golden Rule: Start simple (distance + angle), then add complexity as data and skills improve.

Frequently Asked Questions

Can I calculate xG without machine learning?

Yes. Use lookup tables based on distance and angle from historical data. Less accurate than ML models but provides reasonable estimates. Online xG calculators exist for this purpose.

What's the minimum data needed to build an xG model?

Minimum 5,000 shots for basic model. 20,000+ for reliable results. More data = better model, especially for rare situations (headers from 25 yards, etc.).

How accurate are professional xG models?

Best models achieve 0.08-0.10 Brier Score and ~73-75% accuracy. No model is perfect due to football's inherent randomness. Even "perfect" models would max out around 80% accuracy.

Which machine learning algorithm is best for xG?

Logistic Regression (simple, interpretable) or XGBoost (most accurate). Neural networks can work but often overfit with limited data. Start with Logistic Regression.

Where can I get shot data to train my model?

StatsBomb Open Data (free, high quality), Wyscout (academic access), FBref (manual collection), or scraping Understat (check ToS first).


Meta Description: How to calculate xG: Learn expected goals formula, machine learning models, Python code examples, and step-by-step guide to building your own xG model.

Keywords: how to calculate xg, xg calculation formula, expected goals formula, calculate expected goals, xg model python, build xg model

Category: Education

Word Count: ~1,450 words

🎯 Start Free

Start with AI-Powered Match Analysis

Professional match analysis in 180+ leagues, predictions with 83% success rate, and real-time statistics. Create your free account now!

  • ✓ Create free account
  • ✓ 180+ league match analyses
  • ✓ Real-time statistics
Create Free Account
30% OFF
⭐ Go Premium

Unlimited Analysis and Advanced Features

With premium membership, access unlimited AI analysis, advanced statistics, and special prediction strategies for all matches.

  • ✓ Unlimited match analysis
  • ✓ Advanced AI predictions
  • ✓ Priority support
Upgrade to Premium

Tags

#how to calculate xG#xG formula#expected goals calculation#xG methodology#xG statistics

Did you like this article?

Share on social media