April 14, 2026 • 12 min read • School Schedules Database

School Calendar Data for Demand Forecasting: A Practical Guide

School schedules are one of the strongest and most underused signals in demand forecasting. This guide shows you how to integrate district-level school calendar data into your models — with Python examples you can run today.

Why School Calendars Matter for Demand Models

If you're building any kind of demand forecasting model for a business that's affected by family behavior — theme parks, hotels, restaurants, retail, transportation, urgent care clinics — you probably already account for holidays and weekends. But most models treat school breaks as a binary "it's spring break somewhere" flag, if they account for them at all.

The reality is more nuanced. On any given day during spring break season, somewhere between 2 million and 17 million students are out of school. The specific number depends on which districts are on break that day. And which districts are on break determines where the demand shows up — a hotel in Orlando cares about different districts than a ski resort in Colorado.

District-level school calendar data lets you build features that capture this nuance. Instead of a binary "spring break" flag, you can compute the actual number of students on break within a given radius of your location on each day. That's a much stronger signal.

Getting Set Up

The School Schedules Database API returns JSON or CSV. Here's how to pull data in Python:

            Python
import requests
import pandas as pd

API_KEY = "ssd_live_your_key_here"
BASE = "https://api.hazeydata.ai/ssd/v1"
headers = {"Authorization": f"Bearer {API_KEY}"}

# Pull all Florida school days for March 2026
resp = requests.get(
    f"{BASE}/days",
    headers=headers,
    params={
        "state": "FL",
        "start_date": "2026-03-01",
        "end_date": "2026-03-31",
        "format": "csv"
    }
)

df = pd.read_csv(pd.io.common.StringIO(resp.text))
print(df.shape)  # (districts × days, columns)
        

You now have a DataFrame with one row per district per day, including the is_in_session value, day_type, break_name, enrollment, and confidence for each.

Feature Engineering: Five Useful Features

Here are five features you can derive from school calendar data, ranked by how much predictive power they typically add to demand models:

1. Enrollment-Weighted "Students on Break" (National or Regional)

The single most useful feature. For each day, sum the enrollment of all districts where is_in_session < 0.5. This gives you a continuous signal of how many students are available to travel.

            Python
def students_on_break(df, date, states=None):
    """Total students not in session on a given date."""
    day = df[df["date"] == date]
    if states:
        day = day[day["state"].isin(states)]
    out = day[day["is_in_session"] < 0.5]
    return out["enrollment"].sum()

# National
students_on_break(df, "2026-03-23")
# Regional (FL + GA + AL — relevant for Orlando)
students_on_break(df, "2026-03-23", ["FL", "GA", "AL"])
        

2. Break Type Indicator

Not all breaks are equal. Spring break generates travel. A teacher workday usually doesn't. Snow days are last-minute. Create categorical features for the dominant break type:

            Python
def dominant_break_type(df, date):
    """What break type accounts for the most students out?"""
    out = df[(df["date"] == date) & (df["is_in_session"] < 0.5)]
    if out.empty:
        return "in_session"
    return (
        out.groupby("break_name")["enrollment"]
        .sum()
        .idxmax()
    )
        

3. Days Into / Until Break

Demand doesn't spike on the first day of break — it spikes the day before (travel day) and drops on the last day (return travel). Create lead/lag features:

            Python
# For a specific district, compute days-until-next-break
def days_until_break(district_df):
    """Add column: days until the next break starts."""
    on_break = district_df["is_in_session"] < 0.5
    # Find break start boundaries
    starts = on_break & ~on_break.shift(1, fill_value=False)
    result = pd.Series(index=district_df.index, dtype="float")
    # ... fill with day counts
    return result
        

4. Break Density (Rolling Window)

Instead of a point-in-time measure, compute the 7-day rolling percentage of students on break. This smooths out single-day holidays and captures the sustained impact of week-long breaks:

            Python
# Daily national "break density" — 7-day rolling average
daily = (
    df.groupby("date")
    .apply(lambda g: (g["enrollment"] * (1 - g["is_in_session"])).sum())
    / df.groupby("date")["enrollment"].sum()
)
break_density = daily.rolling(7).mean()
        

5. Year-Over-Year Break Shift

With three years of data, you can detect districts that shifted their break timing. This is valuable for models that were trained on historical data — if a district moved spring break a week earlier, your historical pattern breaks.

Feature	Type	Best For
`students_on_break`	Continuous (millions)	National/regional demand models, theme parks, airlines
`dominant_break_type`	Categorical	Distinguishing spring break (travel) from workdays (local)
`days_until_break`	Integer	Anticipatory demand (booking, travel preparation)
`break_density_7d`	Continuous (0–1)	Smoothed signal for weekly/monthly models
`yoy_break_shift`	Integer (days)	Detecting schedule changes that break historical patterns

Confidence Filtering

Not all data points in the SSD have equal quality. Each cell carries a confidence score from 0.0 to 1.0. For demand forecasting, you'll want to set a threshold that balances coverage with accuracy:

            Python
# High-confidence only (directly extracted from calendars)
hq = df[df["confidence"] >= 0.8]

# Broader coverage (includes state-pattern imputation)
broad = df[df["confidence"] >= 0.5]

# For enrollment-weighted features, high-confidence data
# already covers 85%+ of total enrollment because large
# districts have the best-quality data.
        

Practical tip: For most demand forecasting use cases, a confidence threshold of 0.6 or above gives you excellent coverage (90%+ of enrollment) while filtering out the least reliable data points. The districts with low confidence tend to be small and rural — important for completeness, but their enrollment impact on aggregate features is minimal.

Putting It Together: A Complete Example

Here's a simplified but complete example of building a school-calendar-enhanced demand model for a hotel near Walt Disney World:

            Python
import requests, pandas as pd, numpy as np

API_KEY = "ssd_live_your_key_here"
BASE = "https://api.hazeydata.ai/ssd/v1"
headers = {"Authorization": f"Bearer {API_KEY}"}

# 1. Pull full-year school calendar data
resp = requests.get(
    f"{BASE}/days",
    headers=headers,
    params={
        "start_date": "2025-08-01",
        "end_date": "2026-07-31",
        "confidence_min": "0.6",
        "format": "csv"
    }
)
school = pd.read_csv(pd.io.common.StringIO(resp.text))

# 2. Build daily "students on break" features
daily_features = (
    school
    .groupby("date")
    .apply(lambda g: pd.Series({
        "students_on_break": g.loc[
            g["is_in_session"] < 0.5, "enrollment"
        ].sum(),
        "fl_students_on_break": g.loc[
            (g["is_in_session"] < 0.5) &
            (g["state"] == "FL"), "enrollment"
        ].sum(),
        "pct_on_break": (
            g.loc[g["is_in_session"] < 0.5, "enrollment"].sum()
            / g["enrollment"].sum()
        )
    }))
    .reset_index()
)

# 3. Merge with your hotel occupancy data
hotel = pd.read_csv("hotel_occupancy.csv")
model_df = hotel.merge(daily_features, on="date")

# 4. Add rolling features
model_df["break_density_7d"] = (
    model_df["pct_on_break"].rolling(7).mean()
)

# 5. Train your model with school calendar features
features = [
    "day_of_week", "month", "is_weekend",
    "students_on_break",       # national signal
    "fl_students_on_break",    # local signal
    "break_density_7d",        # smoothed trend
]
# ... your model training code here
        

Use Case Patterns

Different industries benefit from different feature combinations:

Theme parks and attractions: Use students_on_break with both national and state-level granularity. The national number captures out-of-state visitors; the local state number captures the base of local/regional guests who drive rather than fly. Weight nearby states more heavily.

Hotels and short-term rentals: Focus on break_density_7d rather than daily counts. Hotel demand responds to week-long breaks more than single days. Combine with your own booking lead-time data — guests from distant districts book further ahead than locals.

Retail: Back-to-school is the big one. The SSD's first_day field for each district tells you exactly when school starts, which drives back-to-school shopping. Use district-level data for stores within commuting distance; use state-level aggregates for e-commerce.

Transportation: School start and end times create daily traffic patterns, but school breaks create weekly patterns. Use is_in_session as a feature in route-level demand models. The 0.5 value for half days is especially useful — a half day generates a midday traffic spike that a full day off doesn't.

Tips for Production

Cache aggressively. School calendar data doesn't change frequently. Pull the full dataset monthly and cache locally. Use the API for real-time spot checks, not bulk queries in your training pipeline.

Use enrollment weighting. A district with 200,000 students going on break is not the same signal as a district with 2,000. Always weight by enrollment in aggregate features.

Account for half days. The is_in_session field is a decimal, not a boolean. A half day (0.5) still puts families on the road by early afternoon. For demand models, treat anything below 0.75 as "available to travel."

Join with geography. Every district has an NCES ID and state code. Join with NCES boundary data to compute distance-weighted features: "total enrollment on break within 200 miles of this location."

Start Building

Full API access. All districts. Three school years. Python-friendly JSON and CSV.

Get API Access — $99/mo →

API documentation · Dataset overview