CHAPTER 15

Miscellaneous

Statistical Criteria to Evaluate Simulation Results in Wastewater Treatment Modelling

Introduction

The simulation results from dynamic wastewater treatment plant models are usually compared to the measured time-series datasets during model calibration and validation studies. Typically, the comparison between the dynamic simulation results and the time series measured data is based on qualitative assessment that involves visual inspection. Although, visual inspection based qualitative assessment is useful method to check the fit between the measured data and simulation results, the methods does not help in differentiating between goodness of one simulation results over the others when many simulation results can pass the qualitative assessment. In such situations it may be more meaningful to use quantitative statistical criteria to select the results of one calibration parameter set over the other.

Considering the usefulness of quantitative statistical criteria in evaluating and improving the model calibration and validation, a set of selected statistical indices were implemented in the GPS-X v6.4. The direct calculation of statistical indices inside GPS-X will significantly reduce the efforts of exporting and manipulating the simulation and measured data for estimation of the statistical indices. The statistical measures implemented will be explained in following sections using the predicted and measured BOD data as shown in a typical GPS-X output graph (see Figure 15‑1).

Figure 15‑1 – Typical Time Series Plot of Predicted and Measured Data

Measured Data Types

The time-series measured data is considered to contain, a time stamp at which the sample was collected and a value of the data. The time-series data could be available in a regular or irregular time intervals. The data measured at a wastewater plant could be from different type of sampling. The following data types are common:

1. Grab Data: The sample is collected at a specific time. The on-line data and data from grab samples belong to this category. The data point is fully defined by the time stamp and the measured value.

2. Time proportional composite data: Equal volumes of samples are collected at regular time intervals. These samples are mixed to give a composite sample. The data point is completely defined by the duration of composite and time interval for sample collection.

3. Flow proportional composite data: Equal volumes of samples are collected at equal volumes of wastewater treated. This means more number of samples is collected during high flow conditions and lower number of samples during the low flow conditions. These samples are mixed to give a composite sample. The data point is completely defined by the duration of composite and volume interval for sample collection.

As compared to the three measured data types indicated above, the raw simulation data produced by the wastewater models refers to a specific time and hence is similar to the Grab data type. Therefore, when simulation data is required to be compared with the composite measured data types (Type 2 and 3 above), a further processing of simulation data is necessary for meaningful comparison.

Considering the above discussion, depending on the type of the sample used for comparison, an equivalent sample data estimated using the simulation data in the statistical analysis. The type of sample can be specified in “Measured Data Type” section of the statistical setup menu (Figure 15‑2).

Figure 15‑2 - Statistical Analysis Set up Menu for Data Type, Output Plots and Table

Statistical Indices

Table of Statistical Indices

The main objective in this first implementation is to focus on the most important statistical indices that may be helpful in quantitative assessment of model calibration and validation. A brief description of the statistical indices that are calculated for a given dataset is provided in Table 15‑1.

Table 15‑1 – Statistical Measures and Equations

No.	Statistical Measure	Equation
1	Mean of Residuals
2	Mean of Absolute Residual
3	Mean of Squared Residual
4	Absolute Maximum Residual
5	Root of Mean Squared Residuals
6	Mean of Relative Residual
7	Mean of Absolute Relative Residual
8	Mean of Squared Relative Residual
9	Relative Volume Residuals
10	Absolute Relative Volume Residuals
11	Theil's Inequality Coefficient
12	Nash-Sutcliffe (R2)
13	Standard deviation of Residuals (SDR)
14	Mean of Standardized Residuals

Legend:

O_i = the observed (measured) value

P_i = the predicted (simulated) value

O_m = mean of the observed (measured value)

n = number of data points

i = the ith observation

R_i = (O_i - P_i) = residual (error)

MR = Mean of Residuals

A typical GPS-X output showing the calculated statistical indices given quality indicator is as shown in Figure 15‑3.

Figure 15‑3 - Summary of the Statistical Measures Calculated for a Time Series Dataset

Statistical Plots

Predicted vs. Measured

In this plot the predicted values are plotted against the measured values. A 45 degree line is also placed on this graph to see the deviation of data points from this line. Assuming a perfect fit, the plotted data points will plot on the 45 degree line. Visual inspection of this plot can highlight the model biases and other systematic errors in the model. A typical output from GPS-X analysis is as shown in Figure 15‑4.

Figure 15‑4 - Predicted vs. Measured Data Plot

Residual Plots

For a selected measured time series data, a number of residuals as show in Table 15‑2 are estimated. The calculated residuals are used for plotting against time or observed data. A histogram of the standardized residuals is also made available for evaluating the model fit.

Table 15‑2 – Types of Residuals

No.	Criteria	Equation
1	Residuals
2	Absolute Residual
3	Squared Residual
4	Relative Residual
5	Absolute Relative Residual
6	Squared Relative Residual
7	Standardized Residuals

Histogram of Standardized Residuals

A histogram of the standardized residuals are plotted to check the assumption that the errors are normally distributed. The histogram of standardized residual may also help to inspect if the residuals are randomly scattered about zero and to approximately 95 percent of the residuals lie between +2 and -2. A histogram produced for the BOD dataset is as shown in Figure 15‑5.

Residual Plots against Measured Data

The estimated residuals can be plotted against the measured data point to see trends and biases in model outputs. A typical output of this type of plot is as shown in Figure 15‑6.

Residuals Plots against Time

For identifying trends in estimated residuals, the time plots of the residuals may be useful. The model predictability may be affected some specific time based events and these plots may be of help to identify time periods of special events (rain fall, plant maintenance etc.) where the plant operation conditions are not captured adequately in the model. A typical output for this type of plot is as shown in Figure 15‑7.

Figure 15‑5 - Histogram of Standardized Residuals

Figure 15‑6 - Residuals Plotted against the Observed Data

Figure 15‑7 - Residuals Plotted against Simulation Time

Real Time Clock

This feature allows the user to set the speed of the simulation to real time or to a multiple of real time. This is useful in on-line SCADA applications or for operator training in which GPS‑X acts as a virtual plant.

The parameters for this feature can be accessed at the bottom of the
General Data > System > Parameters > Simulation Tools Settings form (Figure 15‑8), and select the More… button in the On-line Operation sub-section. The Real Time Synchronized Mode sub-section contains the parameter that will switch the real time clock ON or OFF. When the real time clock is ON, the simulation speed is equal to a multiple of real time. The multiplication factor used is specified by the Real Time Accelerator Factor. (i.e. setting the accelerator factor to 24 causes one simulation day to take one real hour). Note, the real time clock feature forces the Communication interval to be equal to one second.

Figure 15‑8 - Real Time Clock Parameters

Steady-State Solver

The steady-state solver used in GPS-X is a robust routine based on a direct search algorithm (that is, no gradients are used). Sometimes, the steady-state convergence appears slow or diverges due to a problem in the way that the model is set up. For example, if the underflow rate from a settler is too low, that settler will begin to fill up with solids. Since the concentration ranges of solids in the ten layers become large, the changes in these layers between loops of the steady-state solver also become large which may cause slow convergence or divergence.

In all cases of poor convergence, the user should closely examine all the unit processes for proper specification. The easiest way to do this is to examine the dsum# variable associated with each unit process where # represents the overflow stream label. You can do this by issuing the display command in the Command line of the Simulation Control window. The sum of the individual dsum values equals the dsum value displayed in the Log window. Having done that, there are a number of parameters associated with the steady-state solver that can be fine-tuned.

The forms shown in Figure 15‑9 and Figure 15‑10 are accessed by selecting
General Data > System > Parameters > Steady-State Solver Settings. The parameters in these forms are described below.

The first menu item, number of retries on iteration indicates the number of times the steady-state solver will try to reach convergence if it fails.

The error limit on individual variables is a tolerance below which the steady-state ignores a variable. If the steady-state solver is making changes to a state variable that become smaller than this tolerance, the solver will ignore the variable, assuming it is at steady-state.

When the solver achieves a sum of state variable derivatives below the iteration termination criteria, a steady-state convergence is triggered. The default value is 10.0, but in some systems it may need to be increased when in single precision.

The contract constant and expand constant refer to the step size made by the steady-state solver between iterations. A larger step is taken if an improvement is found, while a smaller step is taken if no improvement is found. The performance of the steady-state solver can be greatly affected by these parameters. The expansion factor is limited to the maximum step size in one iteration. This maximum step size is dampened as the steady-state solution is reached. The damping factor on final approach will control the damping effect. A zero value means no damping; while a value of 1.0 means that the maximum step size is limited to one tenth of the maximum step size.

The initial step size that the steady-state solver takes is governed by the initial perturbation parameter. This number is normally set smaller than the maximum step size to prevent a poor initial guess by the solver.

Figure 15‑9 - Steady State Parameters

Figure 15‑10 - More Steady-State Parameters

The next four items shown in these forms deal with other termination criteria and output control. The status of the solver is printed at a frequency controlled by the convergence output interval, while the steady-state loop counter initial value specifies which iteration loop will be first printed to the simulation Log window. The maximum number of iterations limits the steady-state solver to attempting only this many iterations. The loop counter is reset after each retry. Another steady-state termination criterion is the maximum number of unsuccessful iterations which will cause the steady-state solver to terminate if an improvement in the value of the sum of the derivatives is not made within this number of loops.

The trim parameters shown here refer to the output of the ACSL trim function. This is an alternative steady-state solver, which uses the Jacobian matrix for accelerated searches. However, this gradient type routine was not found to be very robust with the large models encountered within GPS-X.

The Advanced Steady-State Parameters sub-section contains parameters that are used to enhance and refine the contraction process for each state variable. The maximum concentration bound for each state variable are found in the More… button menu.

Numerical Control

The numerical issues discussed in this chapter are important because the success of a simulation depends on the input data and the models and on the numerical solver used to do the calculations.

Any modeller using dynamic simulation should have an understanding of the underlying numerical methods used to solve the equations to properly interpret the results. Understanding the numerical methods used will help to identify problems that are numerical in nature.

The choice of which numerical solver to use is important. There are several numerical integration methods available in GPS-X. The numerical solver can be set in the Simulation Control window or can be saved into a scenario or the layout by setting it in
General Data > System > Input Parameters > Dynamic Solver Settings and scrolling down to the Integration Settings sub-section (Figure 15‑11).

Generally, the default integration method, Runge-Kutta-Fehlberg (2), works very well for most models; however, there are situations where the solver may need to be changed. In general the trade-off is between numerical accuracy and simulation speed. The fixed step algorithms (i.e. Euler, Runge-Kutta (1), and Runge-Kutta (2)) are faster than the variable step algorithms (i.e. Adams-Moulton, Runge-Kutta-Fehlberg (1), Runge-Kutta-Fehlberg (2), Gear's Stiff, and Differential Algebraic Solver) but are not as accurate.

If the system is stiff then you may need to use the Gear's Stiff algorithm. A stiff system is one in which there are processes occurring at very different time scales (e.g., a very fast process such as the transfer of oxygen into a tank and a very slow reaction occurring at the same time).

In addition to the numerical solvers, the user can change some of the numerical parameters that control the simulations. Changing the default parameters however requires extreme caution because in some cases it may introduce errors in the results. The numerical parameters are accessed by making the following selections: General Data > System > Parameters > Numerical. The form is shown in Figure 15‑12.

These parameters include bounds on flow, state variables, state derivatives, parameters, exponentials, and volumes. The parameters for the implicit solver (IMPL - used for solving implicit functions evaluated as residuals, where the residuals are reduced to zero) used in the exact code option can be set here. Some of the parameters are discussed in more detail in the sub-sections below.

Figure 15‑11 – Integration Methods

Figure 15‑12 - Numerical Parameters Form

Ignore Dilution Parameters

If the liquid volume in a tank falls below the ignore dilution rate below this volume parameter value, then the dilution rate (defined as the inverse of the hydraulic residence time) will not be used to calculate the concentrations in the tank as would normally be done. In doing so, the speed of simulation is greatly increased. The same principle is applied to the settlers and SBRs where the dilution rate in a particular layer is ignored if the height of the layer is smaller than the ignore dilution rate below this layer thickness parameter value.

Protection against Division by Zero

The parameter protect against division by zero is fixed and cannot be changed by the user. This parameter is used in all calculations involving division. It is used to ensure that the denominator in an expression is never equal to zero. For example, the DO switching function, used in many rate equations (shown below), includes the protection against division by zero parameter.

Equation 15‑1

Speed Sub-Section

The parameters in the Speed sub-section of the Numerical form concern the pump flow rates used in all the objects. Since the switching of pumps on and off introduces sharp changes in the flows, the numerical integration routine may require a very small step size to reduce the numerical error. In some cases, these small step sizes will result in a very slow simulation. If the smooth pump discharge at discontinuitieslogical is turned ON (default is OFF), then the pumped flow rate is ramped up to the specified amount as follows:

Equation 15‑2

where:

Q = pumped flow rate (m ³/d),

par = smoothing factor,

t = smoothing period (d).

The smoothing function will only be applied to a user-specified percentage change (smooth at flow changes larger than). This allows the user the flexibility of smoothing only larger changes in a pump flow rate. The smoothing function will be applied to all pump flows in a layout including underflows, control splitters, etc. See Figure 15‑13 for an illustration of how the smoothing function affects the pumped flow.

Figure 15‑13 - Smoothing Function

Large Models

Although GPS-X allows the user to build very large and complex layouts, there are some restrictions. First, the simulation language used, ACSL, currently has a limit on the number of discrete blocks of code that can be specified. What this means to the user is that the number of automatic controllers for a given layout cannot exceed 25. Since a number of objects have built-in controllers (e.g., settler has a controller for both the pumped flow and underflow), the user can quickly use all the allowable discrete code blocks. To circumvent this problem, the user can select either the Big option, which combines all the automatic controller code into one discrete block, or the Big+ option, which disables all controllers. In selecting the Big option, the user can no longer specify the sampling time for each individual controller loop. One sampling interval will override the rest. This sampling interval (controller sampling time) is found under General Data > System > Parameters > Miscellaneous.

Integration Considerations

Robust vs. Fast Models

A trade-off exists between the speed of the simulation and the robustness and accuracy of the model under different conditions (e.g. sudden flow discontinuities or tanks emptying completely). GPS-X is configured such that the average user with a typical continuous flow activated sludge plant need not worry about the accuracy and the speed of the simulation. However, when dealing with more complex processes it may be necessary for the user to switch to a different integration algorithm or change the values of some numerical parameters to improve the accuracy or speed of the simulation. In general, robustness and accuracy are more important than simulation speed.

If you suspect that the simulation results are not accurate, you should first check that the physical dimensions of the process objects and the values of the process flows are reasonable. A numerical value may have been entered incorrectly.

If you are confident that the model parameters are reasonable then the next step is to determine whether a modelling simplification or numerical problem is leading to inaccurate results. Some helpful information is listed below:

· If you suspect that the simulation results are not accurate, you may wish to try the Exact code option. (Build > Code menu item). The Quick option (default) results in a faster simulation but is not as accurate when recycle flows are present. When the Quick option is used the recycle flows at each time step are simply the values taken from the previous time step. When the Exact option is used the recycle flows are calculated exactly using an implicit nonlinear equation solver. In most situations using the Quick option is acceptable, because the integration time steps are small and recycle flows usually don't change dramatically. In an application with a number of recycle flows and sudden extreme dynamic changes this simplification may cause some problems.

· The choice of integration algorithm (IALG) can have a significant effect on the output of the model. For a detailed description of the available integration algorithms consult the ACSL Reference Manual (contact Hydromantis/Hatch for details). Generally, only variable step algorithms (i.e. Adams-Moulton, Runge-Kutta-Fehlberg(1), Runge-Kutta-Fehlberg(2), Gear's Stiff, and Differential Algebraic Solver) are acceptable for the simulation of highly dynamic systems. Try the Gear's Stiff algorithm if you suspect that your system is stiff (unless you are simulating an SBR or trickling filter, in which case only the Runge-Kutta-Fehlberg algorithms are allowed).

The Gear's Stiff routine is a self-tuning algorithm that adapts itself to rapidly changing derivatives - in some cases though it might require excessive simulation time to cross a discontinuity in the model. The Runge-Kutta-Fehlberg(1) algorithm is exactly the opposite - it is very forgiving for discontinuities, but experiences difficulties with steeper problems. A good compromise between these two solvers is the Adams-Moulton algorithm. It is very robust and does not give up easily but it is slower. The Runge-Kutta-Fehlberg(2) algorithm is almost identical to Runge-Kutta-Fehlberg(1), but is more robust.

· It is possible to monitor the step sizes used by the integration algorithm during a run. You can either print the values to the Log window by entering the following command at the command line in the Simulation Control window: output truecssitg or place the variable truecssitgon an output graph. You can select truecssitgfor display in the General Data > System > Output Variables > General program variables form (it is labeled as Average integration step size). The average integration step size (truecssitg) is calculated over each communication interval. Start your simulation using a small communication interval (~0.01 or less). Sharp drops in truecssitg usually signal a discontinuity. Using the above method, the Sum of absolute values of the derivatives (dsum) can also be checked for sharp changes.

· If your simulation is slow you can find which variable makes the simulation slow by selecting the Fullinfo option in the Options>Set up>Information menu of the Simulation Control window, and running the simulation using either the dams-Moulton or Gear’s Stiff integration algorithms. The error summary displayed at the end of the run in the Log window will point to the variable that controls the step size.

· The integration of systems with low DO (dissolved oxygen) concentrations (in the range of a few dozen micrograms) is time consuming because the integration algorithm has to cut back on the step size or risk running into negative DO values.

· Sometimes the integration algorithms have difficulty at the start of a simulation. In such cases the initial concentrations of the state variables may be inappropriate, resulting in large initial derivative values. In this case the variable step integration algorithms keep cutting back on the step size to go over this initial bump. It may be helpful to use the steady-state solver to determine the initial conditions.

· Simulation speed depends strongly on the values of the derivatives of the state variables. You can reduce the absolute value of the lower and upper bounds on the derivatives to speed up the simulation of discontinuities by eliminating sharp peaks. These bounds can be found in the
General Data > System > Parameters > Dynamic Solver Settings form. The best values to use depend on the system in question. Try plotting the DO derivative (found in the Process Data menu for each object), because this is the most sensitive state variable in most cases. Get the maximum value during the run and then select the bounds to be 50 percent of the highest value. Changing the bounds from the default values (-1.e33 and 1.e33) can restrict the accuracy of the integration.

· Sudden start-up or switching off of pumps (influent or other) can cause discontinuities in the derivatives, and as a consequence, a delay in the processing of the simulation. To prevent this, a smooth pump option is provided in the General Data > System > Parameters > Dynamic Solver Settings form, which starts and stops pumps smoothly. This is strictly intended as a numerical feature and does not try to simulate real pump discharges during transient periods. By default, the smoothing is not used. If you experience problems when pumps are turned off or on, try using smoothing.

· Check the Log window for any messages printed during the run.