Mathematical models for water environment
1 What is a model
There is no philosophical consensus on the definition of a model. For our purpose, we’ll go with this definition: a mathematical model that consists of variables and functions (Molnar, 2022).
- Variables represent aspects of the data and the model.
- Functions relate the variables to each other.
Functions range from simple, like \(y = 5\cdot x\) (only one variable), to complex, like a deep neural network with millions of parameters (ChatGPT-3.5 includes 175 billion).
2 First principals model
Theoretical work is said to be from first principles. For instance, hydrolysis processes of Activated sludge model (Henze et al., 2015) are described as followings.
Aerobic hydrolysis:
\[ K_{\mathrm{h}} \cdot \frac{S_{\mathrm{O}_{2}}}{K_{\mathrm{O}_{2}}+S_{\mathrm{O}_{2}}} \cdot \frac{X_{\mathrm{S}} / X_{\mathrm{H}}}{K_{\mathrm{X}}+X_{\mathrm{S}} / X_{\mathrm{H}}} \cdot X_{\mathrm{H}} \tag{1}\]
Anoxic hydrolysis:
\[ K_{\mathrm{h}} \cdot \eta_{\mathrm{NO}_{3}} \cdot \frac{K_{\mathrm{O}_{2}}}{K_{\mathrm{O}_{2}}+S_{\mathrm{O}_{2}}} \cdot \frac{S_{\mathrm{NO}_{3}}}{K_{\mathrm{NO}_{3}}+S_{\mathrm{NO}_{3}}} \cdot \frac{X_{\mathrm{S}} / X_{\mathrm{H}}}{K_{\mathrm{X}}+X_{\mathrm{S}} / X_{\mathrm{H}}} \cdot X_{\mathrm{H}} \tag{2}\]
Anerobic hydrolysis:
\[ K_{\mathrm{h}} \cdot \eta_{\mathrm{fe}} \cdot \frac{K_{\mathrm{O}_{2}}}{K_{\mathrm{O}_{2}}+S_{\mathrm{O}_{2}}} \cdot \frac{S_{\mathrm{NO}_{3}}}{K_{\mathrm{NO}_{3}}+S_{\mathrm{NO}_{3}}} \cdot \frac{X_{\mathrm{S}} / X_{\mathrm{H}}}{K_{\mathrm{X}}+X_{\mathrm{S}} / X_{\mathrm{H}}} \cdot X_{\mathrm{H}} \tag{3}\]
3 Data-driven model
Machine learning (ML) is a subfield of Artificial Intelligence (AI) that deals with systems that are able to acquire their own “knowledge” by extracting patterns from data.
- Linear regression (LR)
- Tree-based models: Decision tree (DT), random forest (RF), gradient boosting DT (GBDT), extreme boosting DT (XGBoost), LightGBM (LGBM)
- Support vector regression (SVR)
- Artificial neural network (ANN)
- Recursive neural network (RNN)
- Long short-term memory (LSTM)
- Convolution neural network (CNN)
4 Hybrid model
Hybrid model is a framework that incorporates both (incomplete) knowledge and data.
Dynamics that are not modeled explicitly by the first-principles component are captured by the machine learning component, thereby filling in knowledge gaps (Quaghebeur et al., 2021).
First-principle:
\[ \frac{\mathrm{d}^{k} \mathbf{X}(t)}{\mathrm{d} t^{k}} = f\left(\mathbf{X}(t);\mathbf{p}\right), \tag{4}\]
Data-driven:
\[ \frac{\mathrm{d}^{k} \mathbf{X}(t)}{\mathrm{d} t^{k}} = n\left(\mathbf{X}(t), \mathbf{Y}(t);\mathbf{w}\right), \tag{5}\]
Hybrid:
\[ \frac{\mathrm{d}^{k} \mathbf{X}(t)}{\mathrm{d} t^{k}} = f\left(\mathbf{X}(t);\mathbf{p}\right) + n\left(\mathbf{X}(t), \mathbf{Y}(t);\mathbf{w}\right). \tag{6}\]
5 Model implementations
Programming language
- Python
- Package management: Anaconda
- Data manipulation: Pandas, Panel datas
- Scientific computing : Numpy, numerical python
- Visualization: Matplotlib, Matlab-style plotting library
- Third-party package for machine learning: Scikit-learn, Pytorch
- R, Julia, Matlab
Example
We can implement Eq. 1 in Python as followings:
6 Application programming interface (API)
Data cleaning by Pandas
dataset = (
pd.read_csv(tmp_file_path, header=None, index_col=None).iloc[:, 0]
.str.split(";", expand=True))
Random forest implemented by Sklearn