How to Calculate Expected Goals (xG) in Football
Learning how to calculate xG is essential for anyone serious about football analytics. While professional systems use complex machine learning models, understanding the xG calculation formula and key factors helps analysts make better predictions and evaluations. This guide explains the xG calculati
Gol Sinyali
Editör

How to Calculate Expected Goals (xG) in Football
Introduction
Learning how to calculate xG is essential for anyone serious about football analytics. While professional systems use complex machine learning models, understanding the xG calculation formula and key factors helps analysts make better predictions and evaluations. This guide explains the xG calculation process, from basic methods to advanced models.
Basic xG Calculation Methodology
Core Concept
Expected Goals Formula (Simplified):
xG = Probability(Goal | Shot Characteristics)
Translation: What is the probability this shot results in a goal, given its characteristics?
Key Input Variables
1. Distance from Goal
- 6 yards: ~70% goal probability (xG: 0.7)
- 12 yards: ~20-30% (xG: 0.2-0.3)
- 18 yards (penalty spot): ~79% from penalty (xG: 0.79)
- 25+ yards: ~2-5% (xG: 0.02-0.05)
2. Angle to Goal
- Central (0-15°): Highest xG
- Moderate angle (15-30°): Medium xG
- Wide angle (30-45°+): Low xG
3. Body Part Used
- Foot: Standard xG
- Header: -0.05 to -0.10 modifier (headers are harder)
- Other (chest, etc.): Rare, very low xG
4. Shot Type
- Open play: Standard
- Counter-attack: +0.05 to +0.10 (less defensive organization)
- Set piece: Varies (free kick ~0.05, penalty ~0.79)
5. Assist Type
- Through ball: +0.10 to +0.15 (1-on-1 situations)
- Cross: -0.05 to -0.10 (usually headers)
- Cut-back: +0.05 to +0.10 (better shooting angles)
- Individual creation: Standard
Manual xG Calculation Example
Shot Scenario:
Distance: 12 yards
Angle: 20° (fairly central)
Body Part: Right foot
Situation: Through ball, 1-on-1 with goalkeeper
Defenders: Only goalkeeper to beat
Step-by-Step Calculation:
Step 1: Base xG from Distance
12 yards → Base xG: 0.25
Step 2: Angle Adjustment
20° central angle → Multiplier: 1.2
Adjusted xG: 0.25 × 1.2 = 0.30
Step 3: Situation Adjustment
Through ball (1-on-1) → +0.15 bonus
Adjusted xG: 0.30 + 0.15 = 0.45
Step 4: Body Part Adjustment
Foot (not header) → No penalty
Final xG: 0.45
Result: xG = 0.45 (45% chance of goal)
Machine Learning xG Models
How Professional Systems Calculate xG
Data Requirements:
Training Data:
- Minimum: 10,000+ historical shots
- Ideal: 100,000+ shots from multiple seasons
- Variables: 15-50 features per shot
Machine Learning Algorithm: Most common: Logistic Regression or XGBoost (Gradient Boosting)
Feature Engineering
Input Features (Variables):
Basic Features:
- Shot distance (continuous)
- Shot angle (continuous)
- Body part (categorical: foot/header/other)
- Assist type (categorical: cross/through ball/cutback/none)
- Game situation (categorical: open play/counter/set piece)
Advanced Features: 6. Defender pressure (0-5 scale) 7. Goalkeeper position (x, y coordinates) 8. Shot speed (if tracking data available) 9. Previous action (dribble/pass/rebound) 10. Game state (score difference, time remaining)
Spatial Features: 11. X coordinate (distance from goal line) 12. Y coordinate (lateral position) 13. Distance to nearest defender 14. Number of defenders in shot path
Contextual Features: 15. Team strength (Elo rating or similar) 16. Player quality (career goals, xG overperformance)
Sample Python xG Model
Logistic Regression Model:
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
# Load historical shot data
shots = pd.read_csv('shots_data.csv')
# Features
X = shots[['distance', 'angle', 'is_header',
'is_through_ball', 'is_cross',
'defenders_in_cone']]
# Target (1 = goal, 0 = no goal)
y = shots['is_goal']
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Train model
model = LogisticRegression()
model.fit(X_train, y_train)
# Predict xG for new shot
new_shot = [[12, 20, 0, 1, 0, 1]]
# distance=12, angle=20, not header, through ball, not cross, 1 defender
xG = model.predict_proba(new_shot)[0][1]
print(f'xG: {xG:.2f}') # e.g., 0.42
XGBoost Model (More Accurate):
from xgboost import XGBClassifier
# Train XGBoost
xgb_model = XGBClassifier(
max_depth=6,
learning_rate=0.1,
n_estimators=100
)
xgb_model.fit(X_train, y_train)
# Predict
xG_xgb = xgb_model.predict_proba(new_shot)[0][1]
print(f'xG (XGBoost): {xG_xgb:.3f}') # e.g., 0.456
Step-by-Step: Building Your Own xG Model
Step 1: Collect Data
Data Sources:
- StatsBomb Open Data: Free, high-quality event data
- Wyscout: Academic use available
- Scraping: Understat.com (legal gray area, check ToS)
Minimum Data:
- 5,000+ shots
- Shot location (x, y coordinates)
- Shot outcome (goal/no goal)
- Body part, assist type
Step 2: Feature Engineering
Calculate Distance:
import numpy as np
def calculate_distance(x, y, goal_x=120, goal_y=40):
"""
Calculate distance from shot to goal center.
Assuming pitch: 120 yards long, 80 yards wide
"""
distance = np.sqrt((goal_x - x)**2 + (goal_y - y)**2)
return distance
shots['distance'] = shots.apply(
lambda row: calculate_distance(row['x'], row['y']),
axis=1
)
Calculate Angle:
def calculate_angle(x, y, goal_x=120, goal_y_min=36, goal_y_max=44):
"""
Calculate angle to goal (in degrees).
Goal posts at y=36 and y=44 (8 yards wide)
"""
angle_left = np.arctan2(goal_y_min - y, goal_x - x)
angle_right = np.arctan2(goal_y_max - y, goal_x - x)
angle = abs(angle_right - angle_left)
return np.degrees(angle)
shots['angle'] = shots.apply(
lambda row: calculate_angle(row['x'], row['y']),
axis=1
)
Step 3: Train Model
Use scikit-learn:
from sklearn.ensemble import RandomForestClassifier
# Prepare data
features = ['distance', 'angle', 'is_header', 'is_through_ball']
X = shots[features]
y = shots['is_goal']
# Train Random Forest
rf_model = RandomForestClassifier(
n_estimators=100,
max_depth=10,
random_state=42
)
rf_model.fit(X, y)
Step 4: Calibrate and Validate
Calibration: Check if predicted probabilities match actual frequencies.
from sklearn.calibration import calibration_curve
# Get predictions
y_pred_proba = rf_model.predict_proba(X_test)[:, 1]
# Calibration curve
fraction_of_positives, mean_predicted_value = calibration_curve(
y_test, y_pred_proba, n_bins=10
)
# Plot
import matplotlib.pyplot as plt
plt.plot(mean_predicted_value, fraction_of_positives, marker='o')
plt.plot([0, 1], [0, 1], linestyle='--') # Perfect calibration
plt.xlabel('Mean Predicted Probability')
plt.ylabel('Fraction of Positives')
plt.title('Calibration Curve')
plt.show()
Validation Metrics:
- Accuracy: ~70-75% for good models (not ideal metric)
- Log Loss: ~0.35-0.40 (lower is better)
- Brier Score: ~0.08-0.10 (lower is better)
Step 5: Make Predictions
# New shot
new_shot_data = {
'distance': 14,
'angle': 25,
'is_header': 0,
'is_through_ball': 1
}
new_shot_df = pd.DataFrame([new_shot_data])
xG_prediction = rf_model.predict_proba(new_shot_df)[0][1]
print(f'Predicted xG: {xG_prediction:.3f}')
# Output: Predicted xG: 0.387
Advanced xG Calculations
1. Tracking Data Integration
With Player/Ball Tracking:
- Defender positions (x, y for each player)
- Goalkeeper positioning
- Ball velocity
Enhanced Formula:
xG = f(distance, angle, defenders_in_cone,
goalkeeper_position, ball_speed, shot_technique)
Example:
Standard Shot: 12 yards, central → xG: 0.30
With Tracking Data:
- 2 defenders blocking (in shooting cone)
- Goalkeeper off-line (poor positioning)
- Adjusted xG: 0.25 (defenders) × 1.15 (GK error) = 0.29
2. Player-Specific xG
Adjust for Player Quality:
Player xG = Base xG × Player Finishing Modifier
Example:
Base xG: 0.40
Haaland (elite finisher, +15% modifier): 0.40 × 1.15 = 0.46 xG
Average player: 0.40 xG
Poor finisher (-10% modifier): 0.40 × 0.90 = 0.36 xG
Note: Most public xG models don't adjust for player quality to remain objective.
3. Post-Shot xG (PSxG)
Calculation:
PSxG = f(shot_trajectory, shot_speed, goalkeeper_position)
Process:
- Track ball trajectory after shot
- Determine target location (e.g., top corner)
- Calculate save probability based on:
- Shot power
- Goalkeeper reaction time
- Goalkeeper positioning
Example:
Pre-shot xG: 0.30 (12 yards, central)
Shot goes top corner → PSxG: 0.75 (much harder to save)
Goalkeeper saved it → Exceptional save!
Common Calculation Errors
Error 1: Not Accounting for Defenders
Wrong Model: Only uses distance and angle.
Better Model: Includes number of defenders in shooting cone.
Impact:
Shot: 12 yards, 0° angle
Wrong xG: 0.40 (ignores defenders)
Correct xG: 0.25 (3 defenders blocking)
Error 2: Small Training Dataset
Problem: Training on only 1,000 shots → Overfitting
Solution: Minimum 5,000 shots, ideally 20,000+
Error 3: Not Validating Calibration
Problem: Model predicts well but probabilities are miscalibrated.
Example:
- Model says 0.50 xG
- In reality, only 35% of such shots score
- Issue: Overconfident predictions
Solution: Use calibration curves and recalibrate if needed.
Conclusion
Calculating xG requires understanding shot characteristics, historical data, and machine learning models. While basic manual calculations provide rough estimates, professional xG models use thousands of shots and advanced algorithms for accuracy.
Key Takeaways:
- Distance and angle are the most important factors
- Machine learning (Logistic Regression, XGBoost) powers modern xG
- Feature engineering (defenders, assist type) improves accuracy
- Calibration ensures predictions match reality
- Advanced models incorporate tracking data and player quality
Golden Rule: Start simple (distance + angle), then add complexity as data and skills improve.
Frequently Asked Questions
Can I calculate xG without machine learning?
Yes. Use lookup tables based on distance and angle from historical data. Less accurate than ML models but provides reasonable estimates. Online xG calculators exist for this purpose.
What's the minimum data needed to build an xG model?
Minimum 5,000 shots for basic model. 20,000+ for reliable results. More data = better model, especially for rare situations (headers from 25 yards, etc.).
How accurate are professional xG models?
Best models achieve 0.08-0.10 Brier Score and ~73-75% accuracy. No model is perfect due to football's inherent randomness. Even "perfect" models would max out around 80% accuracy.
Which machine learning algorithm is best for xG?
Logistic Regression (simple, interpretable) or XGBoost (most accurate). Neural networks can work but often overfit with limited data. Start with Logistic Regression.
Where can I get shot data to train my model?
StatsBomb Open Data (free, high quality), Wyscout (academic access), FBref (manual collection), or scraping Understat (check ToS first).
Meta Description: How to calculate xG: Learn expected goals formula, machine learning models, Python code examples, and step-by-step guide to building your own xG model.
Keywords: how to calculate xg, xg calculation formula, expected goals formula, calculate expected goals, xg model python, build xg model
Category: Education
Word Count: ~1,450 words
Related Guide
AI Football Predictions Guide →Start with AI-Powered Match Analysis
Professional match analysis in 180+ leagues, predictions with 83% success rate, and real-time statistics. Create your free account now!
- ✓ Create free account
- ✓ 180+ league match analyses
- ✓ Real-time statistics
Unlimited Analysis and Advanced Features
With premium membership, access unlimited AI analysis, advanced statistics, and special prediction strategies for all matches.
- ✓ Unlimited match analysis
- ✓ Advanced AI predictions
- ✓ Priority support
Tags
Did you like this article?
Share on social media


