Scheduling Automation: Running Daily Monte Carlo Checks Using Python

Most project teams do not have a shortage of schedule data. They have a shortage of timely signals. A planner updates the file, a project manager looks at the critical path, and everyone hopes the finish date is still safe. The problem is that hope is not a control system.

That is where a daily Monte Carlo check becomes surprisingly practical. Instead of waiting for a weekly review meeting, you can let Python read the latest schedule export, run a simulation, and tell you when confidence drops or when the P90 finish date moves beyond your target. In plain English, that gives you an early warning before schedule risk becomes a visible delay.

The good news is that this does not need to feel like a data science project. With a clean CSV export and a simple Jupyter Notebook workflow, you can build something realistic, useful, and approachable even if you are a project manager rather than a full-time developer.

Why automate schedule risk checks at all?

A one-time Monte Carlo analysis is useful. A repeatable one is much more valuable.

Projects change constantly. Logic changes. Durations shift. New tasks appear. Procurement slips. Testing expands. If your simulation runs once a month, it quickly becomes a historical artifact rather than a live management tool.

A daily or frequent risk check gives you three practical benefits:

Consistency. The same logic runs every time.
Speed. The team finds out about deteriorating confidence early.
Focus. You only need attention when a threshold is crossed.

This is especially useful on busy projects where nobody wants another dashboard to watch all day. Instead of asking the team to inspect probability curves manually, the notebook can highlight when schedule risk moves outside your agreed comfort zone.

What the workflow looks like in real life

A simple workflow usually looks like this:

Your planning tool exports the latest schedule to a CSV file.
A Jupyter Notebook reads that file.
The notebook validates the task data and predecessor logic.
It runs a Monte Carlo simulation using optimistic, most likely, and pessimistic durations.
It calculates outputs such as confidence of meeting the target date, plus P50, P80, and P90 finish dates.
It shows the result in a readable summary and chart.

That is it. No giant platform. No heavy software stack. Just a reliable way to turn schedule uncertainty into a timely signal.

A practical CSV format

If you are starting from scratch, keep the file structure simple. A useful format looks like this:

task_id,task_name,optimistic_days,most_likely_days,pessimistic_days,predecessors
A,Requirements,3,5,8,
B,Design,4,6,10,A
C,Procurement,5,8,15,A
D,Build Feature 1,6,10,16,B
E,Build Feature 2,5,9,14,B
F,Environment Setup,3,5,9,C
G,System Test,4,7,12,D|E|F
H,Deploy,2,3,5,G

A few design choices make this format easy to work with:

optimistic_days, most_likely_days, and pessimistic_days give the three-point estimate.
predecessors uses a pipe character such as D|E|F rather than commas, so the CSV stays clean.
A blank predecessor cell means the task can start immediately.
Extra columns are fine. The notebook can ignore what it does not need.

This is not the only valid schedule format, but it is a practical one for a first notebook-based workflow.

Example Jupyter Notebook code

The code below is written for Jupyter Notebook, not Streamlit. It reads a schedule CSV, validates the structure, runs a dependency-aware Monte Carlo simulation, calculates confidence metrics, and plots the finish date distribution.

import os
import numpy as np
import pandas as pd
import plotly.express as px
from collections import defaultdict, deque

# ----------------------------
# Configuration
# ----------------------------
CSV_FILE = "Project_Example_1.csv"

PROJECT_START_DATE = pd.Timestamp("2025-05-01")
TARGET_FINISH_DATE = pd.Timestamp("2025-06-15")

ITERATIONS = 5000
CONFIDENCE_THRESHOLD = 0.70

# ----------------------------
# Utility functions
# ----------------------------
def parse_predecessors(value):
    if pd.isna(value) or str(value).strip() == "":
        return []
    return [item.strip() for item in str(value).split("|") if item.strip()]

def validate_schedule(df):
    required_cols = [
        "task_id",
        "task_name",
        "optimistic_days",
        "most_likely_days",
        "pessimistic_days",
        "predecessors",
    ]

    missing = [col for col in required_cols if col not in df.columns]
    if missing:
        raise ValueError(f"Missing required columns: {missing}")

    if df["task_id"].duplicated().any():
        dupes = df.loc[df["task_id"].duplicated(), "task_id"].tolist()
        raise ValueError(f"Duplicate task_id values found: {dupes}")

    for col in ["optimistic_days", "most_likely_days", "pessimistic_days"]:
        if (df[col] <= 0).any():
            bad_tasks = df.loc[df[col] <= 0, "task_id"].tolist()
            raise ValueError(f"Non-positive values found in {col} for tasks: {bad_tasks}")

    invalid_range = df[
        (df["optimistic_days"] > df["most_likely_days"]) |
        (df["most_likely_days"] > df["pessimistic_days"])
    ]
    if not invalid_range.empty:
        bad_tasks = invalid_range["task_id"].tolist()
        raise ValueError(
            "Three-point estimates must satisfy optimistic <= most_likely <= pessimistic. "
            f"Problem tasks: {bad_tasks}"
        )

    task_ids = set(df["task_id"])
    for _, row in df.iterrows():
        preds = parse_predecessors(row["predecessors"])
        missing_preds = [p for p in preds if p not in task_ids]
        if missing_preds:
            raise ValueError(
                f"Task {row['task_id']} references missing predecessors: {missing_preds}"
            )

def build_dependency_structures(df):
    predecessors_map = {}
    successors_map = defaultdict(list)
    indegree = {task_id: 0 for task_id in df["task_id"]}

    for _, row in df.iterrows():
        task_id = row["task_id"]
        preds = parse_predecessors(row["predecessors"])
        predecessors_map[task_id] = preds
        indegree[task_id] = len(preds)

        for pred in preds:
            successors_map[pred].append(task_id)

    return predecessors_map, successors_map, indegree

def topological_sort(task_ids, successors_map, indegree):
    indegree_copy = indegree.copy()
    queue = deque([task_id for task_id in task_ids if indegree_copy[task_id] == 0])
    ordered = []

    while queue:
        current = queue.popleft()
        ordered.append(current)

        for successor in successors_map[current]:
            indegree_copy[successor] -= 1
            if indegree_copy[successor] == 0:
                queue.append(successor)

    if len(ordered) != len(task_ids):
        raise ValueError("Dependency cycle detected in schedule.")

    return ordered

def load_schedule(file_path):
    df = pd.read_csv(file_path)
    df["predecessors"] = df["predecessors"].fillna("")
    validate_schedule(df)

    predecessors_map, successors_map, indegree = build_dependency_structures(df)
    topo_order = topological_sort(df["task_id"].tolist(), successors_map, indegree)

    return df, predecessors_map, topo_order

def run_single_simulation(df, predecessors_map, topo_order):
    sampled_durations = {}
    early_finish = {}

    for _, row in df.iterrows():
        task_id = row["task_id"]
        sampled_durations[task_id] = np.random.triangular(
            row["optimistic_days"],
            row["most_likely_days"],
            row["pessimistic_days"],
        )

    for task_id in topo_order:
        preds = predecessors_map[task_id]
        early_start = max((early_finish[p] for p in preds), default=0.0)
        early_finish[task_id] = early_start + sampled_durations[task_id]

    project_finish_days = max(early_finish.values())
    return project_finish_days

def run_monte_carlo(df, predecessors_map, topo_order, iterations):
    finish_days = np.array([
        run_single_simulation(df, predecessors_map, topo_order)
        for _ in range(iterations)
    ])

    target_days = (TARGET_FINISH_DATE - PROJECT_START_DATE).days
    confidence = np.mean(finish_days <= target_days)

    p50_days = np.percentile(finish_days, 50)
    p80_days = np.percentile(finish_days, 80)
    p90_days = np.percentile(finish_days, 90)

    results = {
        "confidence": float(confidence),
        "p50_days": float(p50_days),
        "p80_days": float(p80_days),
        "p90_days": float(p90_days),
        "p50_date": PROJECT_START_DATE + pd.to_timedelta(np.ceil(p50_days), unit="D"),
        "p80_date": PROJECT_START_DATE + pd.to_timedelta(np.ceil(p80_days), unit="D"),
        "p90_date": PROJECT_START_DATE + pd.to_timedelta(np.ceil(p90_days), unit="D"),
        "mean_finish_days": float(np.mean(finish_days)),
        "finish_days_all": finish_days,
    }
    return results

# ----------------------------
# Load and run
# ----------------------------
df, predecessors_map, topo_order = load_schedule(CSV_FILE)
results = run_monte_carlo(df, predecessors_map, topo_order, ITERATIONS)

# ----------------------------
# Summary output
# ----------------------------
print("--- Daily Schedule Risk Summary ---")
print(f"Schedule file: {CSV_FILE}")
print(f"Iterations: {ITERATIONS}")
print(f"Target finish date: {TARGET_FINISH_DATE.date()}")
print(f"Confidence of meeting target: {results['confidence']:.1%}")
print(f"P50 finish date: {results['p50_date'].date()}")
print(f"P80 finish date: {results['p80_date'].date()}")
print(f"P90 finish date: {results['p90_date'].date()}")
print(f"Mean finish duration: {results['mean_finish_days']:.1f} days")
print("-----------------------------------")

if results["confidence"] < CONFIDENCE_THRESHOLD or results["p90_date"] > TARGET_FINISH_DATE:
    print("\nALERT: Schedule risk threshold breached.")
else:
    print("\nNo alert triggered. Schedule remains within thresholds.")

# ----------------------------
# Build chart data
# ----------------------------
finish_dates = PROJECT_START_DATE + pd.to_timedelta(np.ceil(results["finish_days_all"]), unit="D")
hist_df = pd.DataFrame({"finish_date": finish_dates})

p50_date = results["p50_date"]
p80_date = results["p80_date"]
p90_date = results["p90_date"]

# ----------------------------
# Plot histogram
# ----------------------------
fig = px.histogram(
    hist_df,
    x="finish_date",
    nbins=30,
    title="Simulated Project Finish Date Distribution"
)

fig.add_vline(x=TARGET_FINISH_DATE, line_dash="dash", line_color="red")
fig.add_vline(x=p50_date, line_dash="dot", line_color="blue")
fig.add_vline(x=p80_date, line_dash="dot", line_color="orange")
fig.add_vline(x=p90_date, line_dash="dot", line_color="green")

fig.add_annotation(x=TARGET_FINISH_DATE, y=0, text="Target", showarrow=True, arrowhead=1)
fig.add_annotation(x=p50_date, y=0, text="P50", showarrow=True, arrowhead=1)
fig.add_annotation(x=p80_date, y=0, text="P80", showarrow=True, arrowhead=1)
fig.add_annotation(x=p90_date, y=0, text="P90", showarrow=True, arrowhead=1)

fig.show()

How this notebook works without becoming complicated

The notebook is deliberately simple.

It does not try to replace your planning system. It turns the latest exported schedule into a quick risk check. That distinction matters. In most teams, the easiest automation to adopt is the one that fits beside the current process rather than forcing everyone to change tools.

A few practical features make the notebook useful:

It validates the input structure before simulation begins.
It catches missing predecessors, duplicate task IDs, invalid estimate ranges, and circular logic.
It respects task dependencies rather than simply summing all durations.
It calculates confidence-based finish dates that are easy to explain in a project meeting.

That is usually enough to create a useful early-warning view without overengineering the solution.

What the results actually mean

The notebook produces outputs like these:

Confidence of meeting target. This is the percentage of simulation runs that finish on or before your target date.
P50 finish date. This is the date with roughly a 50 percent chance of achievement or better.
P80 finish date. This is a more conservative planning date.
P90 finish date. This is a high-confidence date, often used when teams want stronger schedule protection.

For example, if the notebook reports:

confidence = 62%
P50 = 10 June
P80 = 16 June
P90 = 20 June

and your target finish is 15 June, the message is clear:

the target is still possible,
but confidence is weakening,
and a high-confidence outcome is now beyond the target.

That is exactly the sort of signal a project manager can act on early.

Why the histogram helps

A table of numbers is useful, but a chart usually makes the message easier to understand.

The histogram shows the spread of simulated finish dates. A target date line shows where the commitment sits. P50, P80, and P90 lines show how the distribution compares with different confidence levels.

This helps answer questions like:

Is the target near the center of the distribution or at the optimistic edge?
How much spread is there between likely and conservative outcomes?
Is the distribution tight or wide?
Has the risk profile become more uncertain over time?

That is usually much easier to explain in a review meeting than a single deterministic finish date.

A simple enhancement: save history

A single notebook run is useful. A history of runs is much more useful.

Why? Because trends matter.

If confidence drops from 82 percent to 79 percent, that may not look serious. If it drops from 82 percent to 79 percent to 74 percent to 68 percent across several days, the pattern is much more important.

You can save each notebook result to a CSV file like this:

history_file = "risk_history.csv"

history_row = pd.DataFrame([{
    "run_timestamp": pd.Timestamp.now(),
    "schedule_file": CSV_FILE,
    "confidence": results["confidence"],
    "p50_date": results["p50_date"].date(),
    "p80_date": results["p80_date"].date(),
    "p90_date": results["p90_date"].date(),
    "mean_finish_days": results["mean_finish_days"],
}])

file_exists = os.path.exists(history_file)
history_row.to_csv(history_file, mode="a", header=not file_exists, index=False)

print(f"Saved run history to {history_file}")

Now you can track how schedule confidence changes over time.

That helps answer questions like:

When did confidence start to deteriorate?
Was the change gradual or sudden?
Did a replan improve the forecast?
Are we drifting week by week even when nobody says the project is delayed?

A note on working days versus calendar days

This example uses plain day counts. That keeps the notebook easy to follow, but real projects often need working-day calendars.

That matters because:

weekends may be non-working,
some teams work six-day weeks,
shutdowns or holidays may affect dates,
different workstreams may use different calendars.

For a first version, simple day counts are usually enough as long as the assumption is clear. Later, you can improve the model by converting durations to business days or by simulating against a calendar-aware logic.

Another important limitation: resources are not modeled here

This notebook respects task logic, but it does not model resource constraints.

That means it assumes:

if tasks are logically parallel, they can run in parallel,
no specialist is overloaded across several activities,
there is no resource leveling effect.

In real projects, those assumptions are not always true.

So the right interpretation is this:

if your schedule is mostly logic-driven, this notebook can be very informative,
if your schedule is heavily resource-constrained, the model may still be directionally useful, but not fully realistic.

That is not a reason to avoid this approach. It is simply a reason to be honest about what the notebook does and does not represent.

Practical enhancements you can add later

Once the basic notebook is working, there are several good next steps.

1. Compare today versus yesterday

You can compare the latest CSV to the previous one and summarize changes such as:

tasks added,
tasks removed,
durations changed,
predecessors changed.

That helps explain why confidence moved.

2. Add an S-curve

A histogram is useful, but an S-curve gives a clearer view of cumulative confidence. It makes P50, P80, and target-date probability easier to explain.

3. Track criticality frequency

A task is not simply critical or not critical once. In Monte Carlo analysis, it may be critical in some simulation runs and not in others. Estimating criticality frequency gives a more realistic view of schedule risk concentration.

4. Phase-based thresholds

A single confidence threshold may be too simplistic. You may want lower thresholds in early planning and higher thresholds close to delivery.

5. Export a chart image

If you save the histogram or S-curve as an image, the notebook becomes easier to use in meetings and status packs.

How to keep the notebook trusted

The hardest part of this kind of workflow is usually not the Python. It is credibility.

Teams will only use the result if they believe it is:

based on the latest schedule,
using transparent assumptions,
consistent from run to run,
and not producing random noise.

A few habits help a lot:

keep the input file structure stable,
document the target date and confidence threshold clearly,
test the notebook with known scenarios,
compare results against common-sense expectations,
and make the assumptions easy to explain.

If a one-day procurement slip suddenly moves P90 by three weeks, people will ask questions. That is good. It means they are paying attention. The notebook should be simple enough that you can explain the result.

When this approach works best

This style of notebook-based analysis is especially effective when:

the schedule is updated frequently,
the project already uses three-point duration thinking,
leadership wants early warning rather than only monthly reporting,
and the planning team can export a clean file reliably.

It is less effective when:

the underlying schedule quality is poor,
dependencies are incomplete or inaccurate,
duration ranges are guessed once and never reviewed,
or the team expects the notebook to replace planning judgment.

Monte Carlo is a decision-support tool, not a substitute for project control discipline.

The main idea to keep

You do not need a giant platform to make schedule risk visible more often.

A simple notebook can already do something valuable:

read the latest schedule export,
simulate uncertainty across the dependency network,
compare the result with agreed thresholds,
and show whether risk is worsening.

That turns schedule risk from a periodic analysis into a live signal.

And that is usually the point where Monte Carlo stops being interesting only to specialists and starts becoming useful to the actual project team.

How To Land the Job and Interview for Project Managers Course:

Advance your project management career with HK School of Management’s expert-led course. Gain standout resume strategies, master interviews, and confidently launch your first 90 days. With real-world insights, AI-powered tools, and interactive exercises, you’ll navigate hiring, salary negotiation, and career growth like a pro. Enroll now and take control of your future!

Learn more

Scheduling Automation: Running Daily Monte Carlo Checks Using Python

Why automate schedule risk checks at all?

What the workflow looks like in real life

A practical CSV format

Example Jupyter Notebook code

How this notebook works without becoming complicated

What the results actually mean

Why the histogram helps

A simple enhancement: save history

A note on working days versus calendar days

Another important limitation: resources are not modeled here

Practical enhancements you can add later

1. Compare today versus yesterday

2. Add an S-curve

3. Track criticality frequency

4. Phase-based thresholds

5. Export a chart image

How to keep the notebook trusted

When this approach works best

The main idea to keep

One thought on “Scheduling Automation: Running Daily Monte Carlo Checks Using Python”

Leave a Reply Cancel reply

Why automate schedule risk checks at all?

What the workflow looks like in real life

A practical CSV format

Example Jupyter Notebook code

How this notebook works without becoming complicated

What the results actually mean

Why the histogram helps

A simple enhancement: save history

A note on working days versus calendar days

Another important limitation: resources are not modeled here

Practical enhancements you can add later

1. Compare today versus yesterday

2. Add an S-curve

3. Track criticality frequency

4. Phase-based thresholds

5. Export a chart image

How to keep the notebook trusted

When this approach works best

The main idea to keep

You might also like:

Monte Carlo Secrets for Project Managers

The Executive’s Guide to P-Values: Explaining P10 and P90 Without the Math

Dependency Hell: Simulating the Impact of Late Deliverables

One thought on “Scheduling Automation: Running Daily Monte Carlo Checks Using Python”

Leave a Reply Cancel reply