Sports scheduling meets business analytics: why scheduling Major League Baseball is really hard

Mike Trick of Carnegie Mellon University came to the Industrial and Systems Engineering department at UW-Madison to give a colloquium entitled “Sports scheduling meets business analytics.”

How hard is it to schedule 162 game seasons for the 30 MLB teams? It’s really, really hard.

Mike Trick stepped up through what makes for a “good” schedule? Schedules obey many constraints, some of which include:

  • Half of each team’s games are home, half are away.
  • Teams cannot have more than three series away or home.
  • Teams cannot have three home weekends in a row.
  • Teams in the same division play six series: two early on, two in the middle of the season, and two late, with one home and one away each time.
  • Teams play all other teams in at least two series.
  • Schedules should have a good flow, with about one week home followed by one week away.
  • Teams that fly from the west coast to the east coast have a day off in between series.

Teams can make additional scheduling requests. Every team, for example, asks for a home game on Father’s Day, and this can only be achieved for half of the teams in any given year. Mike addresses this by ensuring that no team has more than two away games in a row on Father’s Day.

Mike illustrated how hard it was to create a feasible solution from scratch. You cannot complete a feasible schedule if you try something intuitive like schedule the weekends first and fill out the rest of the schedule later. This leads to infeasible schedules 99% of the time. One of the challenges is that integer programming algorithms do not quickly identify when infeasibility is reached and instead branch and bound for a long while.

Additionally, it is equally hard to change a small piece of a feasible schedule based on a new requirement and easily get another feasible schedule. For example, let’s say the pope decides to visit the United States and wants to use the baseball stadium on a day scheduled for a game. You cannot simply swap that game out with another. Changing the schedule to free up the stadium on that one day leads to a ripple of changes across the entire schedule for the other teams, because changing that one game affects the other visiting team’s schedule and leads to violations in the above constraints (e.g., half of each team’s games are at home, etc). This led to Mike’s development of a large neighborhood search algorithm that efficiently reschedules large parts of the schedule (say, a month) during the schedule generation process.

Mike found that how he structured his integer programming models made a big difference. He did not use the standard approach to defining variables. Instead he used an idea from Branch and Price and embedded more structure in the variables (which ultimately introduced many more variables) to solve the problem more efficiently using commercial integer programming solvers. This led to 6 million variables that allowed him to embed his objectives such as travel costs.

In most real-world problems, Mike noted that there is no natural objective function. MLB schedules are a function of travel distance and “flow,” where flow reflects the goal of alternating home and away weeks. The objective reflects the distance teams travel. He cannot require each team to travel the same amount. Seattle travels a minimum of 48,000 miles per season no matter the schedule because Seattle is far away from most cities. Requiring other teams to travel 48,000 miles in the season leads to schedules where teams often travel from coast to coast on adjacent series to equal Seattle’s distance traveled. That is bad.

Mike ultimately included revenue in his objective, where revenue reflects attendance. He used linear regression to model attendance. He acknowledged that this is a weakness, because attendance does not equal profit. For example, teams can sell out afternoon games when they discount ticket prices. Children come and do not purchase beer at the stadiums, which ultimately fills the stands but does not generate the most revenue.

Mike summarized the keys to his success, which included:

  1. Computing power improved over time
  2. Commercial solvers improved
  3. He solved the right problem
  4. He structured the problem in an effective way
  5. He identified a way to get quick solutions for part of the schedule (useful for when something came up and a game had to change).
  6. He developed a large neighborhood search algorithm that efficiently retools large parts of the schedule.

Three years ago I wrote a blog post about Mike Trick’s keynote talk on Major League Baseball (MLB) scheduling at the German Operations Research Conference (blog post here). that post contains some background information.

 


Leave a comment