File-Based Matplotlib Plotting

Published in Programming

In a recent engineering project I used matplotlib to create figures from text files. The data was stored in various subdirectories and each data item was in its own text file. This provided a number of advantages:

  • data is easier to update, add, and remove,
  • updating of figures/plots is automated,
  • data can be grouped by directories and is reflected in the file system.

This workflow is best used when the data changes regularly and figures need to be regenerated. In my case these text files were outputs from an ANSYS run and so were generated automatically. Updating of plots simply required replacing text files in the source directory and re-running scripts. For this workflow it is faster to update plots than if using Excel, where the data is stored internally to the spreadsheets and would need to be re-imported to update the figures.

This workflow is similar to a web MVC architecture, where the model is the data source text files and the run script that creates the plots occupies roughly both the "VC" (view and controller) components of the pattern.

For example, data can be set up like this:

project/
   |- Data/
      |- Set 1/
      |  |- file1.txt
      |  |- file2.txt
      |- Set 2/
         |- file3.txt 
         |- file4.txt

Helper Module

With the structure above a helper module can be written and used to provide some basic methods to load and plot data from directories. It could be customized further for a specific application. This script depends on numpy and matplotlib:

import os, re
from glob import glob
import matplotlib.pyplot as plt
from numpy import genfromtxt, any

First, some methods to fetch data from a directory:

def get_fn(file):
    ''' Get the filename only with extension stripped. '''
    file = re.sub('\.txt$','',file)
    return os.path.split(file)[-1]

def get_data_single(file):
    '''
        Load data from a single text file.

        Args
        ==========
        file : string
            Filename (including path and extension) of source data file
        
        Returns
        =======
        data : dictionary
            label  name
            data   Data from text file converted to Numpy array
    '''
    try:
        label = get_fn(file)
        data = genfromtxt(file)
    except ValueError:
        return 'Invalid data formatting.'
    except FileNotFoundError:
        return 'File not found.'
    return { 'label': label, 'data': data }

def get_data_from_dir(directory, skip=None):
    '''
        Load in data from multiple text files located in a directory.

        Args
        ====
        directory : str
            name of directory containing source data files
        skip : str, optional
            A string prefix added to filenames denoting them to be skipped
        
        Returns
        =======
        data_set : list
            A list containing each data file as a dictionary, having:
                label   the filename
                data    Numpy array containing the data
    '''
    files = glob(os.path.join(directory,'*.txt'))
    if skip:
        files = list(filter(lambda file: not file.startswith(skip), files))
    files = list(filter(lambda file: 'README' not in file, files))                    # skip README files
    data_set = []
    for file in files:
        try:
            label = get_fn(file)
            data_set.append({'label': label, 'data': genfromtxt(file)})
        except ValueError:
            return 'Invalid data formatting.'
    return data_set

Then, some plotting methods:

# Two internal utility methods
def _plot_setup():
    ''' Set up the figure and axes for plotting.
        Returns matplotlib figure and axes objects.
    '''
    fig, ax = plt.subplots()

    # Add plot settings and styles relevant to your project
    ax.grid(which='both')
    # ...

    return fig, ax

def _plot_data(ax, data_item, x=None):
    label = data_item.get('label')
    data = data_item.get('data')
    if len(data.shape) == 1:
        if any(x):
            if len(x) != data.size:
                if len(x) > data.size:
                    x = x[:data.size]
                else:
                    raise ValueError('X axis size must be equal to or greater than data size.')
            ax.plot(x,data,label=label)
        else:
            ax.plot(data,label=label)
    else:
        ax.plot(data[:,0],data[:,1],label=label)
    return ax

# User methods
def plot_single(data, x=None):
    ''' Plot a single data item. '''
    fig, ax = _plot_setup()
    ax = _plot_data(ax,data,x)
    return fig, ax

def plot_data_set(data_set, x=None):
    ''' Plot all of the files from a directory.
        Used with get_data_from_dir() which returns file data as a list.

        Args
        ----
        data_set : list
            A list containing dictionaries of data items -> { name, data }
        x : list or numpy array, optional
            A 1D array or list to use as the x-axis; must match the data dimensions

        Returns
        -------
        fig, ax : matplotlib figure and axes objects
    '''

    fig, ax = _plot_setup()

    for each in data_set:
        ax = _plot_data(ax,each,x)

    return fig, ax

def add_to_plot(ax, data_item, x=None):
    ''' Add a single data set to a plot.
    
        Args
        ----
        data_item : dictionary having label, data entries
        x : list or numpy array, optional
            Used as x-axis in 

        Returns
        -------
        ax : matplotlib Axes object
    '''    
    ax = _plot_data(ax,data_item,x)
    return ax

def plot_dir(directory, skip=None, x=None):
    ''' Plot all files from a directory.
    
        Args
        ----
        directory : string
            path to directory
        skip : string
            prefix of files to skip
        x : list or numpy array
            x axis for single column data
        
        Returns
        -------
        fig, ax : matplotlib figure and axes objects
    '''
    data = get_data_from_dir(directory, skip=skip)
    return plot_data_set(data, x=x)

Usage

Using some heart rate data and a project like the following:

project/
  |- myscript.py
  |- fileplot.py
  |- Data/
  |   |- Series 1-2/
  |   |   |- hr.7257.txt
  |   |   |- hr.11839.txt
  |   |- Series 3-4/
  |   |   |- hr.207.txt
  |   |   |- hr.237.txt
  |- Figures/

File myscript.py can be used as follows:

from numpy import arange
from fileplot import *


FIGURES_DIR = 'Figures'

# Plot 1
# ======
x = arange(0,900,0.5)/60                     # Data is BPM values only, so create a uniform 
                                             #  x axis to plot against (900 seconds in minutes)
fig, ax = plot_dir('Data\\Series 1-2', x=x)  # Plot all files in the directory

# Add a single additional data set to the plot
hr207 = get_data_single('Data\\Series 3-4\\hr.207.txt')
ax = add_to_plot(ax, hr207, x=x)

# Add plot elements and save figure
ax.set_title('Plot 1')
ax.set_xlabel('Time, min')
ax.set_ylabel('Heart Rate, BPM')
fig.legend(loc='center right')
fig.savefig(FIGURES_DIR+'\\Plot 1.png', dpi=200)


# Plot 2
# ======
# Get and plot a single item from the 'Series 3-4' directory
hr237 = get_data_single('Data\\Series 3-4\\hr.237.txt')
fig, ax = plot_single(hr237)
ax.set_title('Plot 2')
ax.set_ylabel('Heart Rate, BPM')
fig.savefig(FIGURES_DIR+'\\Plot 2.png', dpi=200)

This creates 2 figures in the output 'Figures' directory, 'Plot 1.png' and Plot 2.png':