Working with Model Data in Databases

Posted on Oct 1, 2023

Overview

Managing the creation and maintenance of a financial model results in multiple versions, or snapshots, initially as iterations lead toward an acceptable model, and then latterly as iterations to optimize as comparisons with actual activity lead to improved insights into the forecasting mechanics. See Using Metrics in Financial Forecasting for details on a recommended update and reforecast process.

The key outputs of these snapshots are the individual line items on the profit and loss statement, or balance sheet, or other metrics are the basis for charts and summarized tables that communicate trends in business performance and expectations on forecast activity.

An approach I’ve used is to capture the outputs of each version and store in an SQL database, tracked by model name, tag and date. Stored in such a format the output data can be imported into practically any visualization environment, and here I describe briefly an Excel and Python-based workflow.

Excel VBA Module for Data Extraction

The Excel VBA module relies upon a Source sheet in the model structured as:

first row, second cell := start date of the model
second row, second cell := number of periods of data to extract (typically in months and equal to 24, 36, 48,…)
third row := header ID and TITLE
fourth row downwards := the various IDs, which equal the named cells in the sheet that are the left most cell of the range to be extracted, and TITLEs that are retained in the data as fuller description of the data type.

The code below then extracts select period-based data using tags defined in the Source sheet:

A temporary TransposedData sheet is created and prefilled with dates on the first column and titles and descriptions in the top rows.
The code loops over the rows of the main ID and TITLE table, transposing the row-based cell range to column-based.
Temporary sheet is then saved to a CSV file with user-specified suffix and then deleted.

The resulting CSV files are then imported in the above Python code into the SQL database. The Excel VBA module code can be found below: Excel VBA Module Code.

Importing Snapshot Data into the SQL Database

The code below creates the SQL database and necessary tables. I’ve used SQLite here for simplicity as access performance is of no consideration.

import os
import re
import json
import sqlite3
import pandas as pd
from datetime import datetime

# Connect to SQLite database (or create it if it doesn't exist)
conn = sqlite3.connect('fm_data.db')
cur = conn.cursor()
print('Connected to SQLite database')

# Create tables
cur.execute('''
    CREATE TABLE IF NOT EXISTS models
    (id INTEGER PRIMARY KEY, modelname TEXT, date TEXT, tag TEXT, data TEXT)
''')
print('Created table models')

The code assumes a series of consistently named CSV files present in the folder, reads these in one by one into the SQL database.

# Regular expression to extract model name, date, and tag from filename
regex = r'(.+)-(\d{6})_(.+)\.csv'

# Walk through all CSV files in directory
for root, dirs, files in os.walk('FOLDER_NAME'):
    for file in files:
	if file.endswith('.csv'):
	    # Parse filename
	    print('Parsing file: ' + file)
	    match = re.fullmatch(regex, file)
	    if match:
		modelname, date, tag = match.groups()
		# Read CSV file into pandas DataFrame
		df = pd.read_csv(os.path.join(root, file))
		if df.empty:
		    print('DataFrame is empty.')
		else:
		    print(df)
		    # Convert DataFrame to JSON
		df_json = df.to_json()
		# Store in database
		cur.execute('''
		    INSERT INTO models (modelname, date, tag, data)
		    VALUES (?, ?, ?, ?)
		''', (modelname, date, tag, df_json))
		print('Inserted data into database')

Finally the database changes are committed and closed.

# Commit changes and close connection
conn.commit()
conn.close()
print('Committed changes and closed connection')

Accessing Stored Snapshot Data

Accessing the data is then a question of connecting to the SQLite database file and reading in the various table data. Example code is below.

# Connect to the sqlite database
conn = sqlite3.connect('complete_data.db')

# Get the names of all tables in the database
table_names = conn.execute("SELECT name FROM sqlite_master WHERE type='table';").fetchall()
table_names = [name[0] for name in table_names if name[0].startswith('snapshot')]

# Read each table into a DataFrame and store them in a list
tot_rev_data = [pd.read_sql_query(f"SELECT * from {name}", conn) for name in table_names]

# Read the 'models' table from the data base
models = pd.read_sql_query("SELECT * from models", conn)

# Convert 'data' column back to dictionaries
models['data'] = models['data'].apply(ast.literal_eval)

Conclusion

Model snapshots should have some permanency, they represent a set of concluded work tasks at a specific point in time. Aside from basic version control, tracking snapshots offers insights into:

How subsequent actuals periods compare to prior forecasts–a critical analysis in financial model management
How projected business activity in forecast periods evolves over time.

I’ve found an SQL-based workflow to be a robust and dependable method for storing snapshot data in an easily retrievable format. In combination with languages such as Python, ingesting, analyzing and visualizing the data is relatively easy compared to spreadsheet-based processes, especially when dealing with multiple datasets in different files.

Excel VBA Module Code

The code below can be found here.