This chapter describes the GPS-X optimizer and its associated forms. More detail on the maximum likelihood objective function and the statistical tests provided in the solution report can be found in Appendix A and Appendix B, and the Optimizer Solution Report. The complete list of nomenclature used is found in Appendix C: Nomenclature. The references cited in this chapter are listed in Appendix D: References.
The optimizer is a module designed to minimize the value of a user-selected objective function by adjusting the free variables in this function. In the case of parameter estimation, these free variables are the unknown process parameters. The optimizer uses the Nelder-Mead simplex method (Press et al., 1986) for minimization. The algorithm has been modified to handle bounds on the optimization (i.e. free) variables.
The simplex method is a multi-dimensional procedure that does not rely on gradient information. The algorithm searches through the multidimensional "surface" using a direct search method to find a local minimum of the objective function.
The procedure starts with an initial point in the multi-dimensional parameter space and then generates new points in space by perturbing the initial point a scaled amount along each parameter direction. This leads to p + 1 points in space that define a polyhedron (the simplex) where p is the number of optimization variables. The points are called the vertices of the simplex.
At each iteration, the simplex method reflects the vertex with the highest function value (worst point) through the centroid of the remaining p points of the polyhedron. The amount of reflection is controlled by a reflection constant. If the reflected vertex is the new best point (lowest function value) then the polyhedron is expanded along the direction of reflection. The amount of expansion is controlled by an expansion constant. If the expanded vertex is better than the reflected vertex it is taken as the new best point.
If after the reflection step the reflected vertex is worse than the second worst vertex on the previous iteration, the polyhedron is contracted. The reflected vertex is contracted through the centroid of the remaining p vertices if it is the new worst point. If the reflected vertex is the new second worst point, the worst point is contracted through the centroid. The amount of contraction is controlled by a contraction constant.
When a contraction step is unsuccessful, the polyhedron is shrunk by moving the vertices toward the best point. The amount of shrinkage is controlled by a shrink constant.
At each step checks are made to ensure that the parameter bounds have not been violated. If they have been, the new vertex is moved back inside the bounds.
When the optimizer termination criteria are satisfied, the optimizer runs one additional simulation using the parameter values from the best point found during the iteration process. There is the possibility of encountering a local minimum that is not the global minimum of the objective function. This is a problem that is common to most optimization algorithms. As a result, it is important to solve optimization problems (e.g. parameter estimation or process optimization problems) from a number of different starting guesses to ensure that the optimizer has found the global minimum.
If the user requests the printing of confidence limits or derivative information (See Summary of the Optimizer Settings and Parameters section below) the optimizer conducts additional simulations to generate numerical derivatives.
The constants that control the reflection,
expansion, contraction, and shrinkage of the polyhedron in the
simplex method can be accessed by selecting
Layout > General Data > System > Input Parameters
> Simulation Tool Settings and then clicking on the
More... button in the Optimizer sub-section. The
resulting form is shown in Figure
14‑1. You can change any of the
constant values if you wish. The parameter entitled scaled step
size in initial guess is used to control the initial
perturbation size.

Figure 14‑1 – Form Containing the Simplex Method Constants
The list of objective functions available in GPS-X is given below:
Absolute Difference
Equation 14‑1

Relative Difference
Equation 14‑2

Sum of Squares
Equation 14‑3

Relative Sum of Squares
Equation 14‑4

Maximum Likelihood
Equation 14‑5

In the objective function expressions given above the following nomenclature is used:
zi,j = the measured value of response j in experiment i.
fi,j = the value of response variable j predicted by the process model in experiment i
gj = the heteroscedasticity parameter for response j
m = the number of measured response variables
nj = the number of experiments (i.e. observations) for response j
These objective function types are accessed by clicking on the inverted triangle beside the Optimize icon and then selecting Type from the drop-down menu.
In general, the maximum likelihood objective function should be used when doing parameter estimation. This objective function calculates statistically optimal parameter estimates based on assumptions on the nature of the measurement errors. The sum of squares objective function is a special case of the maximum likelihood objective function derived using further simplifying assumptions and can also be used. It is equivalent to the maximum likelihood objective function when there is only one response or target variable. Further details on parameter estimation and the maximum likelihood and sum of squares objective functions can be found in Appendix A of this chapter.
The other objective functions can be used for curve fitting when calculating statistically optimal parameter estimates is not a concern.
In addition to data fitting applications, GPS-X can be used for process optimization. For example, GPS-X can calculate the operating conditions for your process model that optimize some measure of process performance, such as operating cost or effluent quality.
To solve this type of problem in GPS-X you need to treat the problem as a data fitting exercise. For example, if you want to minimize the value of a certain model variable you need to specify an arbitrarily small target value for the model variable in a .dat file and have the optimizer minimize the difference between the calculated variable value and the target value using the absolute difference objective function. This is equivalent to minimizing the model variable directly. The target value should be made small enough so that the optimizer cannot reach it.
You are not limited to using one performance measure. You can select a number of different performance measures and have GPS-X optimize these variables simultaneously by fitting them to user-supplied targets.
There are four different criteria used to terminate the optimizer:
1. Parameter Tolerance: The maximum size along a parameter dimension is defined as the difference between the largest value of this parameter from among all of the simplex vertices, and the smallest value of this parameter from among all of the simplex vertices. This maximum size is scaled by dividing it by the difference between the upper and lower bounds for this parameter. If the maximum sizes for all of the parameter dimensions are less than the parameter tolerance, the optimization process is terminated.
2. Objective function tolerance: If the range (largest minus smallest) of objective function values covered by the simplex vertices is less than the objective function tolerance, the optimization process is terminated. Meeting this criterion without satisfying criterion 1 may indicate that the objective function is not very sensitive to some parameters.
3. Scaled termination value for objective function: If the objective function value at the current best point in the simplex is less than the user-specified final objective function value, the optimization process is terminated. This final value should be positive and small in magnitude. This criterion is not used with the maximum likelihood objective function. It is only applicable for the other objective functions because their values have a lower bound of zero (a perfect fit).
4. Maximum number of optimizer iterations: The optimization process is terminated if the maximum number of iterations is reached.
Satisfaction of any one of these four criteria will result in termination of the iterative optimization process. These criteria provide a trade-off between the length of the optimization run and the proximity to an optimal solution.
The specific values used for the termination criteria can be accessed from the General Data menu. Right-click in an empty region of the drawing board to display the menu and select System > Input Parameters > Simulation Tool Settings to display the form shown in Figure 14‑2.

Figure 14‑2 – Optimizer Form Containing the Termination Criteria Settings
A termination criterion can be disabled by specifying a large negative value for it. The objective function tolerance is disabled by default.
The optimizer module is equipped to handle three different types of process measurements: time series measurements, long term operational data that are averages of the original process measurements, and on-line measurements. Each type of measurement set leads to a different type of optimization problem in GPS-X. These optimization types can be accessed by clicking on the inverted triangle to the right of the Optimize and selecting Type.
This optimization type is the one normally used for both parameter estimation and process optimization in GPS-X. It is designed to handle both time series and steady-state measurements.
For this type of optimization, you enter your measured data into a text file with a .dat extension. This text file should follow the naming and formatting conventions discussed in the GPS-X User's Guide.
For parameter estimation involving a dynamic model, the data entered into the text file will be a set of time series values for each of the response variables. In GPS-X the response variables are referred to as target variables.
Steady-state optimization is a time series-type optimization with only one data point for each target variable. The steady-state solver is used and the simulation has a stop time of 0.0. This type of optimization is useful for calibrating the model to data reported as daily, weekly or monthly averages. Data of this type are typically obtained from composite samples and thus do not accurately reflect the time dynamics of the real process. In a steady-state optimization, the average data are used as the targets and selected model parameters are adjusted to fit these targets.
When doing process optimization, you enter single target values for your process performance measures at the desired points in time. As mentioned earlier, you should also make sure to use the absolute difference objective function.
GPS-X will fit your model to the measured data using the objective function that you select. If you prepare output graphs to display the predicted values of the model, GPS-X will automatically display the measured values provided in the .dat file on the graphs.
GPS-X will draw a new curve for the predicted values at each optimization iteration, in order to track the progress of the optimizer. At the end of the optimization process, final predicted responses are displayed so that you can visually assess the fit. An example of this type of graph is shown in Figure 14‑3.

Figure 14‑3 - Example GPS-X Output Graph Showing Measured Data (+ Markers) and the Predicted Response (Continuous Line)
GPS-X also has a sophisticated dynamic parameter estimation procedure (DPE). DPE is designed for the estimation of time-varying parameters. It can be used with on-line data or on a set of off-line time series data. For details on using on-line data, see the Advanced Control Module Manual.
The motivation behind DPE is that parameters in process models are often not constant, but vary with time. For example, the oxygen mass transfer coefficient in an aerated tank is often slowly time-varying.
Dynamic parameter estimation is also useful for estimating parameters in poorly understood processes. In these cases, the model structure is likely to be incorrect. As a result, the model may only be able to represent the data well over short time intervals. In this case, using DPE will help compensate for the model error and allow acceptable fitting of the measured data.
Another situation in which dynamic parameter estimation is useful is when you are interested in detecting process changes and upsets. If for example a model parameter is found to be relatively constant during normal process operation but is sensitive to process changes, you can track this parameter using the DPE feature and on-line data to help provide an early warning of process changes or disturbances.
In GPS-X, dynamic parameter estimation is done by applying the time series optimization approach mentioned earlier to a moving time window. Instead of estimating parameters from an entire set of data, GPS-X calculates a set of parameter estimates for each time window using the parameter estimates from the previous time window as a starting guess. This approach can be used on a data file that is continually updated with new blocks of data or on a static file of time series data. You can use any of the objective functions that are available for time series optimization when doing dynamic parameter estimation.
The length of the time window controls how often the parameters are updated. The shorter the time window, the more often the parameters are updated. When using short time windows, it may be necessary to filter the data to eliminate noise so GPS-X does not fit the noise.
To ensure proper termination of the optimization routine when using the DPE feature, it is suggested that the time window and the communication interval be chosen such that the time window is an integer multiple of the communication interval.
This section discusses the optimizer form that
is accessed by selecting
General Data > System > Input Parameters >
Simulation Tool Settings > Optimizer > More…. The
upper portion of this form is shown in Figure 14‑2. The
lower portion of the form is shown below in Figure 14‑4.

Figure 14‑4 - Bottom of Optimizer Form
The termination criteria found on this form were discussed under Termination Criteria in this chapter. The form accessed from the More... button in the Optimizer sub-section contains the simplex method constants. The remaining settings and parameters found in the Optimizer form are described below. You can access the different sub-sections by scrolling down the form.
Number of optimized parameters: This sets the number of variables to be adjusted by the optimizer. It should correspond to the number of parameters specified as Optimize variables in the Controls Setup window.
Number of data points (at least 2): This number sets an upper bound on the number of rows that GPS-X will read in from the .dat file. The default value is large so that GPS-X can handle the majority of data sets without having to change this value.
Detailed statistical report (ON - OFF): If this option is set to ON and you are using either the maximum likelihood or sum of squares objective functions, the following statistics are provided in the solution report in addition to the parameter estimates and objective function value: variance-covariance matrix, correlation matrix, % variation explained by regression, significance of the regression, lack of fit test, observed values, predicted values, % error between predicted and observed values, residuals, weighted residuals, standardized residuals, standardized residual plots, and the Portmanteau test on the weighted residuals. The variance-covariance matrix and the correlation matrix are only reported if the printing of confidence limits option is set to ON. The lack of fit test is only reported if it is turned ON. The Portmanteau test is only reported if it is turned ON and the maximum likelihood objective function is used. For more information on the statistical tests consult Appendix B: The Optimizer Solution Report.
Solution report to file (ON - OFF): If this option is set to ON, the statistical output is appended to file stats.txt in the current directory.
Error distribution (Normal/Cauchy): Selects the probability distribution of the measurement errors used in the maximum likelihood objective function. The Normal option should be used in most cases. The cauchy distribution looks similar to the normal distribution but has heavier tails so that values far removed from the mean have a higher probability than with they would with a corresponding normal distribution.
Estimate standard deviations of errors (ON - OFF): Determines if the standard deviations should be estimated from the data using Equation 14‑10, or if they should be taken as specified.
Standard deviation of errors: Vector of standard deviations of errors associated with each response variable (errors are assumed independent across variables and observations). These values are not used in the estimation problem if estimate standard deviations of errorsis set to ON.
Use specified standard deviations as reference (ON - OFF): Determines if the standard deviation of errorsshould be used as reference values for the purpose of counting the proportion of weighted residuals falling outside reference bounds at a given level of significance. These reference bounds are calculated using the reference standard deviations. In addition, if this option is ONa chi-square test is performed on the sum of squares of the standard deviations of the weighted residuals divided by the reference standard deviations. This option is set to OFFby default. It does not affect the outcome of an optimization run.
Level of significance: This value is used for computing the reference bounds if use specified standard deviations as reference is set to ON. This value cannot affect the outcome of an optimization run.
Heteroscedasticity model (ON - OFF): Determines if the variance model given in Equation 14‑8 is used. If this option is turned off the heteroscedasticity parameters are set to zero.
Heteroscedasticity parameters: A vector containing the heteroscedasticity parameter for each response variable. If heteroscedasticity model is set to OFF, all heteroscedasticity parameters are ignored.
Report objective function gradient and Hessian (ON - OFF): Controls the printing of the gradient and the Hessian of the objective function at the solution. When this option is ON, the gradient and Hessian are printed to the Log window and if necessary to the stats.txt file. The gradient of the objective function is a vector containing the derivatives of the objective function with respect to each optimized parameter. The relative gradient is a scaled version of the gradient. The Hessian is a matrix containing the second derivatives of the objective function with respect to the optimized parameters. A Gauss-Newton Hessian approximation is used. The Hessian is only reported when the maximum likelihood or the sum of squares objective functions are used.
Report model sensitivity coefficients (ON - OFF): Controls the printing of the model sensitivity coefficients at the solution. When this option is ON the sensitivity coefficients are printed to the Log window and if necessary to the stats.txt file. For each target variable, the model sensitivity coefficients are the derivatives of the target variable with respect to each optimized parameter at each data point. These derivatives are used in the calculation of the Hessian, variance-covariance, and correlation matrices and the confidence limits.
Finite-difference relative perturbation size: The gradient vector elements and the model sensitivity coefficients are calculated using a finite-difference formula, specifically a forward-difference formula. The step size used in calculating the derivatives is calculated by multiplying the finite-difference relative perturbation size by the absolute value of the parameter of interest. Note that this setting affects the calculation of the confidence limits.
Printing of confidence limits (ON - OFF): Controls the calculation and printing of confidence limits for the parameter estimates. When this option is ON the confidence limits are reported to the Log window and if requested to the stats.txt file. When this option is OFF (default), the confidence limits are not calculated or printed to the Log window or the stats.txt file. Confidence limits are only reported when the maximum likelihood or the sum of squares objective functions are used. This switch does not affect the outcome of an optimization run.
Confidence level for confidence limits: The level of confidence used when calculating the confidence limits. The default value of 0.95 corresponds to 95 percent confidence limits.
Treat the different target variables as one target (ON - OFF): If there is more than one target variable and the maximum likelihood objective function is used, this switch controls whether the different targets are treated as the same target for the purposes of calculating the degrees of freedom used in the Student’s t-statistic in the confidence limits calculations. This feature is useful if you have a number of different target variables that are actually the same variable measured in different experiments.
Level of significance for significance of regression test: This is the level of significance used in the significance of the regression test. An appropriate message is printed by GPS-X depending on the calculated probability value and the level of significance. If the probability is larger than the chosen significance level (default value is 0.05) it provides evidence that the parameters are all zero and that the regression is not significant for the corresponding target variable. If the probability value is smaller than the significance level it indicates that the regression is significant for the corresponding target variable and that the variation explained by the regression is greater than expected by chance. This test is only reported when the maximum likelihood or the sum of squares objective functions are used. It does not affect the outcome of an optimization run.
Lack of fit test (ON - OFF): Controls the printing of the lack of fit test. If this option is set to ON and the detailed statistical report is set to ON, the lack of fit test is printed to the Log window and if necessary the stats.txt file. This test determines whether the variance of the residuals is acceptable compared to the user supplied estimate of the measurement variance or the measurement variance calculated using replicate measurements. This test is only reported when the maximum likelihood or the sum of squares objective functions are used. It does not affect the outcome of an optimization run.
Level of significance for lack of fit test: This is the level of significance used in the lack of fit test. An appropriate message is printed by GPS-X depending on the calculated probability value and the level of significance. If the probability for a certain target variable is smaller than the chosen significance level, it indicates that there is a lack of fit associated with this target variable.
Replication sum of squares (User Supplied / Calculated): This option allows the user to specify whether the replication sums of squares used in the lack of fit test are user supplied or calculated by GPS-X using repeat measurements.
Relative tolerance used to detect repeat measurements: This is the relative difference used when detecting repeat measurements in the data set. The default value is 1.0E-4. Repeat measurements are inserted into GPS-X by placing measurements very close together in time in the .dat file. GPS-X identifies repeat measurements by checking the relative difference between all of the times entered into the .dat file. If the relative difference between two time values is less than the replication tolerance, the corresponding measurements are considered to be repeats for the purposes of calculating the lack of fit test.
This sub-section is accessed by clicking on the More... button in the Lack of Fit sub‑section.
Number of target variables: This is the number of target variables used in the optimization. It is only used to size the replication sum of squares array and the degrees of freedom for replication sum of squares array.
Replication sum of squares: This is an array for entering a replication sum of squares for each target variable. This is useful if you do not have repeat measurements but you have a good estimate of the measurement variance from past experience.
Degrees of freedom for replication sum of squares: This is an array for entering the degrees of freedom associated with each user supplied replication sum of squares. If you do not have this information you can enter rough estimates. Keep in mind that a large degrees of freedom value implies a high degree of confidence in the variance estimate.
Portmanteau test on weighted residuals (ON - OFF): This switch turns the Portmanteau test on or off. The Portmanteau test is used to detect trends in the weighted residuals. The Portmanteau test is designed for data taken in sequence (e.g. time or space). If trends are present, the residuals are not independent. This violates one of the assumptions of the maximum likelihood method and indicates that the model does not account for all of the non-random variability in the data. An appropriate message is printed depending upon the outcome of the test. This test is only reported when the maximum likelihood objective function is used. It does not affect the outcome of an optimization run.
Maximum number of lags used in portmanteau statistic: By default, the Portmanteau statistic involves autocorrelations at lags up to half the length of the time series. This setting can be used to impose further restriction on the number of autocorrelations taken into account. A large value effectively disables this option. This setting does not affect the outcome of an optimization run.
DPE Timewindow: This is the time window for dynamic parameter estimation.
Before using the optimizer tool to estimate optimal parameter values or optimize operating conditions, it is usually best to experiment with manual adjustment of the optimization variables. By conducting interactive simulations you can observe the effects of the model parameters on the response variables of interest. With this information you will be able to make better judgments on appropriate variables to use in a parameter estimation or optimization run.
For example, you can set up an interactive simulation with slider controls for the parameters and then try adjusting these variables to try and achieve a visually acceptable fit. You can plot actual data along with the target variables so that you can compare simulation and actual data. This approach is useful for generating starting guesses for parameter estimation and process optimization runs.
In this appendix, the maximum likelihood method is discussed in more detail. First a general introduction on parameter estimation is given based on material found in Bard (1974). This is followed by a detailed discussion of the maximum likelihood method as implemented in GPS-X. Finally, a brief description of the sum of squares objective function is given.
Parameter estimation is the procedure of fitting a mathematical model of a process to measured data by calculating optimal estimates of the model parameters. Parameter estimation differs from simple curve fitting in that the criterion used to judge the best fit is not arbitrary but is based on statistical considerations. In addition, the model structure is based on theoretical principles and the model parameters often have physical significance. In curve fitting, the choice of model structure is more arbitrary and is often chosen to simplify the computations. The aim of parameter estimation is to not only fit a model to data but to calculate parameter values that are good estimates of the true values of the physical quantities.
Parameter estimation techniques can be applied to empirical models, but the statistical properties of the estimates may not be as meaningful in a physical sense. In addition, empirical models are not as well suited for extrapolation as mechanistic models.
Parameter estimation is an important step in the development of mathematical process models. Process models contain parameters with physical significance that may vary significantly from plant to plant. To develop process models that can be used for predictive purposes, it is important to estimate the unknown process parameters using measured process data. Using literature values for the unknown parameters will often result in a model that is not very useful for predicting actual plant behavior.
GPS-X uses the maximum likelihood method for parameter estimation. In the maximum likelihood method, the optimal parameter estimates are obtained by maximizing the joint probability density function of the measurements. This joint probability density function is a function of the parameters and is known as the likelihood function. The form of the likelihood function depends on the structure of the experimental error.
In GPS-X it is assumed that the experimental errors are normally distributed random variables with a mean of zero. As a result, the likelihood function used in GPS-X is:
Equation 14‑6

where:
n = number of experiments (i.e. observations)
m = number of measured response variables
Vi = variance-covariance matrix for the ith experiment
| Vi | = represents the determinant of Vi
ei = m x 1 residual vector that contains the differences between the measured values of the response variables and the values predicted by our mathematical process model.
q = the vector of
parameters to be estimated in our mathematical process
model.
This expression is derived by substituting the residual vector, ei for the error vector in the multivariate normal probability density function (pdf). The error vector in the multivariate normal pdf contains the differences between the measured values of the response variables (the variables that we are fitting) and the true values. See Bard (1974) for details.
GPS-X also assumes that the measurement errors are independent from observation to observation and from response variable to response variable. This means that the variance-covariance matrices found in Equation 14‑6 are diagonal but not necessarily equal.
In GPS-X the log-likelihood function is used instead of the likelihood function for mathematical convenience. Maximizing the log-likelihood function is the same as maximizing the likelihood function. The log-likelihood function is derived by taking the natural logarithm of Equation 14‑6, as shown below:
Equation 14‑7

where:
= variance of
response j in experiment i
In this derivation, it is assumed that the variance of a response variable is not constant across all the observations. To take this into account, GPS-X uses the following expression for the estimated variance (Reilly et al., 1977):
Equation 14‑8
![]()
where:
=
estimate of the variance, ![]()
wj = proportionality constant that will be called the standard deviation of the weighted residuals for response j
fi,j = value of response variable j predicted by the process model in experiment i
gj = heteroscedasticity parameter for response j
This expression relates the variability of response variable j to the magnitude of the predicted value for response j.
The heteroscedasticity parameter controls how the variance depends on the predicted values. This parameter is bounded between 0 and 2 and is continuous within this range. A value of 0 indicates constant absolute variability across all of the observations for response variable j. A value of 2 indicates constant relative variability across all of the observations for response variable j.
To determine the optimal value of wj, given the other adjustable parameters in the log-likelihood function, the estimated variance given by Equation 14‑8 is substituted into Equation 14‑7. The resulting equation is then differentiated with respect to wj and the derivative is set to zero, leading to the following expression:
Equation 14‑9

where:
=
measured value of response j in experiment
i
Substituting this expression into Equation 14‑8 yields the estimate for the variance that accounts for non-homogeneous measurement errors:
Equation 14‑10

If Equation 14‑10 is substituted into Equation 14‑7, the log-likelihood function becomes:
Equation 14‑11

To allow for the possibility that the number of observations is different for each response variable, Equation 14‑11 is re-written as (Steiner et al., 1990):
Equation 14‑12

This is the function that is maximized by GPS-X to fit process models to measured data and obtain optimal parameter estimates. Equation 14‑12 is a measure of the probability that the measurements were generated by the process model.
Note that the user can set the heteroscedasticity factors or have GPS-X estimate their optimal values for you.
As process models are generally nonlinear, the parameter values that maximize Equation 14‑12 cannot be determined analytically. An iterative optimization method is required to maximize Equation 14‑12. As mentioned earlier in this chapter, GPS-X uses the Nelder and Mead simplex method (Press et al., 1986) for optimization. This method is a type of direct search algorithm that does not require the calculation of partial derivatives. This is helpful when estimating parameters in systems of differential equations. The simplex method also has the advantage that it can handle objective functions containing discontinuities. This method is generally slower than derivative-based optimization methods, but can handle a greater variety of objective functions and is often found to be quite robust in finding solutions. The version of the simplex method implemented in GPS-X allows for bounds to be placed on the parameters.
Since the simplex method is designed for minimization, GPS-X minimizes the negative of Equation 14‑12 to determine the optimal parameter estimates.
When there is only one target variable and the variance is constant across all observations (i.e. the heteroscedasticity parameter is zero), the sum of squares objective function is equivalent to the maximum likelihood objective function. For problems with more than one target variable, the sum of squares objective function is a special case of the maximum likelihood objective function that results if we make additional assumptions about the measurement errors. Our maximum likelihood objective function, given by Equation 14‑12, is derived using the following assumptions:
· The measurement errors are normally distributed random variables with a mean of zero.
· For each target variable, the measurement errors are independent from observation to observation. Each target variable has its own variance which varies from observation to observation according to a power-law. The variances are unknown and are calculated as part of the optimization process.
· There is no correlation between different target variables
If we make the additional assumptions that all
of the variances are equal (i.e. all responses have same variance)
and the variances do not change from observation to observation
(i.e. the heteroscedasticities are all zero) then the maximum
likelihood function reduces to the sum of squares objective
function in the multi-response case. The assumptions used to
derive the sum of squares objective function in the multi-response
case do not apply in most practical situations. Therefore, it
is recommended that the maximum likelihood objective function be
used for calibration problems with more than one target
variable.
In this appendix, the solution report provided in the Log window after a GPS-X optimization run is discussed. The solution report provides the user with the solution found by the optimizer and a number of additional statistics that are valuable when doing parameter estimation. This report also may be printed to the stats.txt file. First we discuss the basic report and the detailed statistical report. This is followed by a summary on using the statistical tests and a discussion of over parameterization.
NOTE: In the solution report, the parameters are given generic names and are numbered according to the order that the parameters are listed in the Control window containing these parameters. For example, the parameter at the top of the Control window is labeled Parameter 1.
Many of the statistics in the solution report are organized according to target variable names. Generic target variable names are used and the names are numbered according to the order given for the target variables in the target variables form that is accessed by selecting Target Variables... from the Optimize drop-down menu. For example, statistics corresponding to the first target variable in the form are presented under the heading Target 1.
The basic solution report, which is printed to the Log window, includes the number of iterations required by the optimizer, the initial and final values of the objective function, the initial and final values of the parameters (i.e. the optimization variables), the heteroscedasticity parameter values, and a summary of the chosen optimizer settings. In addition, the statistics discussed below are provided if requested in the Optimizer form.
The gradient vector of the objective function is defined as:
Equation 14‑13

where:
F = the objective function
θ1, 2 / θp = the optimization variables
In the case of parameter estimation these
variables are the parameters being estimated. The vector
contains the optimization
variables.
The partial derivatives are calculated numerically using the following forward-difference formula:
Equation 14‑14

where:
k = 1
hk = the step or perturbation size
The step size is calculated using the following formula:
Equation 14‑15
![]()
In Equation 14‑15, the value of 10-7 is referred to as the finite-difference relative perturbation size in GPS-X and can be changed in the Derivative Information sub-section of the Optimizer form.
The gradient provided in the solution report is calculated at the solution. If the solution is a local minimum, the elements of the gradient vector should be close to zero. Because poor scaling of the parameters and the objective function can make the gradient appear large, the relative gradient is also reported. The elements of the relative gradient are calculated by scaling the gradient elements using the following equation:
Equation 14‑16

The infinity norm of the relative gradient vector is also reported. If the solution is a local minimum, this norm should be close to zero.
The gradient is only reported when the report objective function gradient and Hessian switch in the Derivative Information sub-section in the Optimizer form is set to ON.
The model sensitivity coefficients express the local sensitivity of the process model to infinitesimal changes in the optimization variables (subset of the model parameters). They are the partial derivatives of the model with respect to the optimization variables. A sensitivity coefficient can be calculated for each target (i.e. response) variable with respect to each optimization variable at each data point. The sensitivity coefficients are calculated using the forward-difference approximation shown below:
Equation 14‑17

In this equation fi,j is the jth target variable at the ith data point. The step size is calculated in the same way as for the gradient.
The sensitivity coefficients are only reported when the report model sensitivity coefficientsswitch in the Derivative Informationsub-section in the Optimizer form is set to ON.
The Hessian is a matrix of second partial derivatives of the objective function with respect to the optimization variables. For the case of two optimization variables, the Hessian is defined as:
Equation 14‑18

In GPS-X the elements of the Hessian are calculated using the Gauss-Newton approximation. For the sum of squares objective function, each Hessian element is defined as:
Equation 14‑19

where:
m = number of target or response variables
nj = number of data points for target variable j
For the maximum likelihood objective function, Equation 14‑5, each Hessian element is defined as:
Equation 14‑20

where γj is the heteroscedasticity of the jth response variable.
The variable ei,j is the residual and is defined as:
Equation 14‑21
![]()
The variable zi,j is the measured value of response j at the ith data point.
The expressions in Equation 14‑19 and
Equation 14‑20 were derived assuming that the response variables are equivalent to the state variables. Even if this is not the case, this is not a limitation because the sensitivity coefficients are being calculated using finite-differences. Therefore any dependencies in the responses are being taken into account.
The Hessian matrices for the other objective functions are not calculated because they cannot be approximated using the Gauss-Newton approximation. As a result, they require the calculation of second order sensitivities.
If any of the optimization variables are at their bounds, the elements of the Hessian involving these variables are zero.
The Hessian matrix is only reported when the report objective function gradient and Hessianswitch in the Derivative Informationsub-section in the Optimizer form is set to ON.
GPS-X calculates linear-approximation confidence limits for the parameter estimates using the variance-covariance matrix. For the sum of squares objective function, the variance-covariance matrix of the parameter estimates is defined as (Bard, 1974):
Equation 14‑22
![]()
The matrix Ĥ-1
is the inverse of the Gauss-Newton Hessian approximation to
the sum of squares objective function at the solution. The variable
s2 is the variance estimate for the measurement
errors and is defined as:
Equation 14‑23

where
is
the value of the sum of squares objective function at the solution,
n is the number of data points (total number considering all
target variables), and p is the number of parameters (i.e.
optimization variables).
For the maximum likelihood objective function, the variance-covariance matrix of the parameter estimates is defined as (Bard, 1974):
Equation 14‑24
![]()
The matrix Ĥ-1 is the inverse of the Gauss-Newton Hessian approximation to the maximum likelihood objective function at the solution.
For both the sum of squares and the maximum likelihood objective functions, the 100(1-a) % confidence limits for parameter k are:
Equation 14‑25
![]()
The variable t(n – p; a / 2) is the Student’s t-statistic, with n-p degrees of freedom, and a significance level of a / 2. In this case, n is the number of data points per response, except for the sum of squares objective function where it is the total number of data points. The variable a is the significance level for the parameter estimates. It is defined by the following expression:
Equation 14‑26
![]()
where δ is the confidence coefficient. The confidence coefficient corresponds to the confidence level for confidence limits parameter found in the Confidence Limits sub-section of the Optimizer form. The default value is 0.95 so that by default GPS-X reports 95 percent confidence limits.
Equation 14‑27
![]()
The confidence limits are only reported when the printing of confidence limitsswitch in the Confidence Limitssub-section in the Optimizer form is set to ON.
The detailed statistical report provides additional information that is helpful when doing parameter estimation. It allows the user to assess how well their model fits the measured data. The detailed statistical report is activated by setting the detailed statistical reportoption to ON in the Static sub-section of the Optimizer form. The statistics included in the detailed statistical report are discussed below.
The variance-covariance matrix is defined by Equation 14‑22 for the sum of squares objective function and by Equation 14‑24 for the maximum likelihood objective function. This matrix contains the variances of the individual parameter estimates (diagonal elements) and the covariances between the different parameter estimates (off-diagonal elements). Parameters with large variances may be unnecessary.
The elements of the correlation matrix are calculated as follows:
Equation 14‑28

where
is
the (k, l) element of the variance-covariance
matrix.
By definition, the elements of the correlation matrix are always between zero and one. Unless some parameters are on their bounds the diagonal elements are always equal to one. The off-diagonal elements indicate the degree of correlation between pairs of parameters. A large off-diagonal element indicates a high degree of correlation between a pair of parameters. This provides evidence that some of the parameters in the model are unnecessary and the model is over parameterized. This may indicate that the model is not adequate for the task at hand or that the data do not provide enough information to allow estimation of all of the model parameters (Draper and Smith, 1981).
The variance-covariance and correlation matrices are only reported when the printing of confidence limitsswitch in the Confidence Limitssub-section in the Optimizer form is set to ON.
This is a statistic often reported by linear least squares packages. In GPS-X it has been modified to handle nonlinear maximum likelihood problems with heteroscedasticity. The statistic, adapted from Steiner et al. (1990), is calculated as follows for each target variable:
Equation 14‑29

It is multiplied by 100 to get a percentage. The variable SSj is the weighted residual sum of squares for response j and is defined by:
Equation 14‑30

The variable SSmeanj is the total weighted sum of squares corrected for the mean and is defined below:
Equation 14‑31

where
is a
weighted average of the measured values for response j and
is calculated as:
Equation 14‑32

The overall variation explained is calculated as:
Equation 14‑33

The % variation explained is not a model adequacy test but is useful because it shows how much of the variation in the data is accounted for by the fitted model. This test is only reported when the maximum likelihood or sum of squares objective functions are used.
This test involves testing the null hypothesis that all the model parameters are zero against the alternative hypothesis that all the parameters are not zero.
The following sum of squares ratio is calculated for each response variable:
Equation 14‑34

where SSregj is the weighted regression sum of squares for target variable j and is defined by:
Equation 14‑35
![]()
The sum of squares ratio follows an F-distribution if the errors in the measurements are independent, normally distributed random variables and if the model parameters are all zero.
GPS-X reports the F-value for each target
variable and reports the probability that it is from an
F-distribution with p-1 and
degrees of freedom. If the
probability is larger than the chosen significance level it
provides evidence that the parameters are all zero and the
regression is not significant for the corresponding target
variable. The significance level is entered in the General
Data > System > Parameters > Optimizer form in the
Significance of the Regression sub-section.
If the probability value is smaller than the significance level it indicates that the F-value does not follow an F-distribution and therefore we accept the alternative hypothesis that the model parameters are not all zero. In this case the regression is significant for the corresponding target variable and the variation explained by the regression is greater than expected by chance. For a model to be useful for predictive purposes the F-values should be large and the probability values should be close to zero.
GPS-X provides messages that summarize the results of the significance test. The significance test is based on linear regression theory and is only approximate for nonlinear models.
This test is only reported when the maximum likelihood or the sum of squares objective functions are used.
This test determines whether the variance of the residuals is acceptable compared to the user supplied estimate of the measurement variance or the measurement variance calculated using replicate measurements. This serves as a model adequacy test.
For each target variable, the following sum of squares ratio is calculated:
Equation 14‑36

The variable SSrepj is the weighted replication sum of squares for target variable j and is calculated as:
Equation 14‑37

Where n is the number of independent
variable values for which there are replicates,
nrv is the number of replicates for the
vth value of the independent variable,
is the mean of the replicate
measurements corresponding to the vth variable,
zuy is the uth replicate measurement
corresponding to the vth variable, and fvj
is the predicted value of response variable j corresponding
to the vth value of the independent variable.
The variable repdofj is the degrees of freedom for the weighted replication sum of squares for target variable j. It is defined as:
Equation 14‑38
![]()
As mentioned earlier, in this chapter, repeat measurements are inserted into GPS-X by placing measurements very close together in time in the .dat file. See the previous Lack of Fit Sub-Section, located in this chapter, for details.
The weighted lack of fit sum of squares for the jth target variable, SSlofj, is calculated as:
Equation 14‑39
![]()
As mentioned in this chapter, the user also can provide the replication sum of squares and the replication degrees of freedom in the lack of fit sub-section of the Optimizer form. In order for these values to be used by GPS-X, the replication sum of squares option in the Lack of Fit sub-section of the Optimizer form must be set to User Supplied.
The sum of squares ratio or F-value follows an F-distribution if the errors in the measurements are independent, normally distributed random variables and if the weighted lack of fit sum of squares is not much larger than the weighted replication sum of squares.
GPS-X reports the F-value for each target variable and reports the probability that it is from an F-distribution with nj – p ‑ repdofj and repdofj degrees from freedom. If the probability for a certain target variable is smaller than the chosen significance level, it indicates that there is a lack of fit associated with this target variable. The significance level is entered in the General Data>System>Parameters>Optimizer form in the Lack of Fit sub-section.
GPS-X provides messages that summarize the results of the lack of fit test. The lack of fit test is based on linear regression theory and is only approximate for nonlinear models.
This test is only reported when the maximum likelihood or the sum of squares objective functions are used and the lack of fit switch in the Lack of Fit sub-section in the Optimizer form is set to ON.
The observed values are the measured target variable values provided in the .dat file. The predicted values are the target variable values calculated by the model that correspond to the observed values (i.e. they are calculated at the same points in time).
The % errors are useful for detecting any large discrepancies between the model and the measured data. The % error statistic is calculated as follows:
Equation 14‑40

where %Ei, j is the percent error between zi, f and fi,j
As discussed earlier, the residuals are defined as:
Equation 14.21
![]()
They are an estimate of the error in the measurement, assuming that the model is correct.
The weighted residuals and standardized residuals are only reported when the maximum likelihood objective function is used. The weighted residuals are defined as:
Equation 14‑41

These are scaled residuals that are not as dependent on the magnitude of the measurements and the predicted values as are the unscaled residuals.
The standardized residuals are defined as:
Equation 14‑42

The variable Si,j is the estimated variance of response j for the ith data point and is defined in Equation 14‑10.
The standardized residuals and their associated plots can be used to check whether the residuals are independent and normally distributed. This assumption is fundamental to the development of the maximum likelihood objective function used in GPS-X. The standardized residuals are normalized and should have a mean of zero and a variance of one if the residuals are normally distributed.
To check for violation of the independence assumption, the standardized residual plots should be examined for any noticeable trends. The presence of trends provides evidence of serial correlation. Serial correlation occurs when residuals taken in sequence are correlated with each other.
To check the assumption that the errors are normally distributed, the standardized residual plots should be examined to see if the residuals are randomly scattered about zero. In addition, the values of the standardized residuals should be examined to check whether approximately 95 percent of the residuals lie between +2 and -2.
The standardized residual plots are scaled to fit in the Log window. The largest standardized residual for each response variable is represented by the maximum number of " * " characters, which is seven. The remaining standardized residuals for a response variable are scaled relative to this maximum residual for the purposes of calculating how many " * " characters to print. You should consult the actual values of the standardized residuals provided in the Log window to determine whether the actual magnitudes of the residuals for a response variable are large.
If the assumption that the residuals are independent and normally distributed is violated it provides evidence that our model is inadequate to represent the experimental data. If the process model structure is correct, it should account for the non-random variability in the data.
The assumption that the residuals should be independent if the measurement errors are independent is not strictly correct but is fine for practical purposes. The residuals are always correlated to some extent as a result of the fact that there are n measurements but only (n - p) degrees of freedom, where p is the number of parameters to be estimated (Draper and Smith, 1981).
The Portmanteau test is used to detect trends in the weighted residuals. If trends are present, the residuals are not independent. This violates one of the assumptions of the maximum likelihood method and indicates that the model does not account for all of the non-random variability in the data. The Portmanteau test is designed for data taken in sequence (e.g. time or space).
The Portmanteau statistic for response variable j is calculated as the number of observations times the sum of the squared autocorrelations between the weighted residuals up to a certain number of time lags (Brockwell and Davis, 1996):
Equation 14‑43

where nj is the number of data points for response j, τ is the number of time lags, and ρj(k) is the sample autocorrelation among the weighted residuals up to lag k for response j.
The sample autocorrelation is defined as:
Equation 14‑44

where Ωj(k) is the sample auto covariance function between the weighted residuals up to lag k for response j and is defined as:
Equation 14‑45

The sample autocorrelations are approximately independent, normally distributed random variables with mean zero and a variance of 1/nj if the weighted residuals are independent and identically distributed (Brockwell and Davis, 1996). This approximation gets better as the number of measurements increases. Although the residuals themselves are not necessarily identically distributed (they can have different variances), the weighted residuals should be identically distributed because they are scaled.
If the sample autocorrelations for a response
variable are independent and normally distributed as mentioned
above, then the variables
should
be independent and normally distributed
with mean zero and variance one. Since the Portmanteau statistic is
a sum of squares of these variables, it is distributed as a
chi-squared variable.
GPS-X reports the Portmanteau statistic for each target variable and reports the probability that it is from a chi-squared distribution with degrees of freedom. If the probability for a certain target variable is smaller than the chosen significance level it provides evidence of a trend in this target variable's weighted residuals. The significance level is entered in the General Data > System > Parameters > Optimizer form in the Portmanteausub-section.
GPS-X provides messages that summarize the results of the Portmanteau test. This test is only reported when the maximum likelihood or the sum of squares objective functions are used and the portmanteau test on weighted residualsswitch in the Portmanteausub-section in the Optimizer form is set to ON.
See Table 4‑1 for a summary of how to use the statistics given in the solution report to assess the adequacy of the fitted model. Keep in mind that many of the tests become more reliable as the number of measurements increases. Even if the tests indicate that the model is not adequate, it does not mean that you cannot use the model. A visual inspection of the plots provided by GPS-X, showing the measured values and the predicted values, may indicate that the model captures the major trends in the data. This is often good enough for practical purposes as it may only be important to model certain aspects of a physical system.
Overparameterization occurs when there are more parameters in the process model than necessary to fit the data. This situation leads to correlations between model parameters as mentioned earlier in the context of the correlation matrix. As a result, the objective function near the solution to the parameter estimation problem has elongated contours and the solution is not very sensitive to changes in certain parameters.
The user should be careful when choosing the adjustable parameters in a parameter estimation run. The model should be sensitive to these parameters. It is often not practical to select the entire model parameters as optimization variables because this slows the optimization process, and will likely result in meaningless values for certain model parameters. It is preferable to choose only those parameters that have the greatest affect on the mismatch between the model and the data. See the Sensitivity Analysis chapter in the GPS-X User's Guide for details on how to conduct a sensitivity analysis of your model using GPS-X before doing a parameter estimation run.
Table 14‑1 - Summary of How to Use the Statistical Tests
|
Statistic |
Interpretation |
|
Confidence Limits |
If the confidence limits on a parameter include zero it indicates that this parameter is not significant to the model. |
|
Correlation Matrix |
Large off-diagonal elements (close to one) suggest that certain parameters are correlated. |
|
Significance of Regression Test |
Probability values larger than the significance level indicate that the variation explained by the regression is less than expected by chance. |
|
Lack of Fit Test |
Probability values smaller than the significance level provide evidence for a lack of fit in the fitted model. |
|
Standardized Residuals |
The majority of standardized residuals should be between +2 and -2. |
|
Standardized Residual Plots |
The residuals should be randomly scattered about zero without any noticeable trends. |
|
Portmanteau Probability |
Probability values smaller than significance level suggest that the residuals are correlated. |
Bard, Y. A., Nonlinear Parameter Estimation. Academic Press, New York (1974).
Brockwell, P. J. and Davis, R. A. Introduction to Time Series and Forecasting. Springer-Verlag, New York (1996).
Draper, N. R., and Smith, H. Applied Regression Analysis. John Wiley & Sons, (1981).
Edgar, T. F. and Himmelblau, D. M. Optimization of Chemical Processes. McGraw-Hill, (1988).
Press, W. H., Flannery, B. P., Teukolsky, S. A., and Vetterling, W. T. Numerical Recipes: The Art of Scientific Computing. Cambridge University Press, New York (1986).
Reilly, P. M., Barjramovic, R., Blau, G. E., Branson, D. R., and Saverhoff, M. J. Guidelines for the Optimal Design of Experiments to Estimate Parameters in First Order Kinetic Models. Can. J. Chem. Eng., 55, 614 (1977).
Steiner, E. C., Rey, T. D., and McCroskey, P. S. SimuSolv Reference Guide. The Dow Chemical Company, Midland, MI, Vol. 2 (1990).
Dochain D. and Vanrolleghem P.(2001) Dynamic Modelling and estimation in wastewater treatment Processes, IWA Publishing, London, UK
Hauduc H, Neumann M. B., Muschalla D., Gamerith V., Gillot S. and Vanrolleghem P. (2011), Towards quantitative quality criteria to evaluate simulation results in wastewater treatment- A critical review, 8th IWA symposium on systems analysis and integrated assessment, Watermatex, pp 37-49
From MBR supplementary:
Membrane-Coupled Activated Sludge System: The Effect of Floc Structure on Membrane Fouling. Separation Science and Technology, 34(9), pp.1743-1758.
Choi, S., Yoon, J., Haam, S., Jung, J., Kim, J., Kim, W. (2000). Modelling of the Permeate Flux during Microfiltration of BSA-Adsorbed Microspheres in a Stirred Cell. Journal of Colloid and Interface Science, 228, pp. 270-278.
Garcia, G.E., Kanj, J. (2002). Two Years of Membrane Bioreactor Plant Operation Experience at the Vejas Tribe Reservation. Proceedings of the Water Environment Federation’s 75th Annual Technical Exhibition and Conference, September 28th – October 2, Chicago, Illinois.
Günder, B. (2001).The Membrane-Coupled Activated Sludge Process in Municipal Wastewater Treatment. Technomic Publishing Company, Lancaster, PA, USA.
Merlo, R.P., Adham, S., Gagliardo, P., Trussell, R.S., Trussell, R. (2000). Application of Membrane Bioreactor Technology for Water Reclamation. Proceedings of the Water Environment Federation’s 73rd Annual Technical Exhibition and Conference, October 14 - 18, Anaheim, CA.
Shin, H-S., Lee, W-T., Kang, S-T. (2002). Formation of Dynamic Membrane in Submerged Membrane Bioreactors (MBRs). Proceedings of the Water Environment Federation’s 75th Annual Technical Exhibition and Conference, September 28th – October 2, Chicago, Illinois.
Wallis-Lage, C., Steichen, M., deBarbadillo, C., Hemken, B. (2005). Shopping for an MBR – What’s for sale? Water Environment & Technology, 17(1), pp. 31-35.