Getting Started
Initial Setup
Install Python if it’s not already installed. Python 3.13 is recommended, and Python 3.12 is also supported. Python 3.14 is not supported yet.
Clone the repository:
To use the git command line, navigate to where you want to install traiNNer-redux, and enter this command (install git first if it’s not already installed):
git clone https://github.com/the-database/traiNNer-redux.git
To use a GUI for git, follow the instructions for that git client. For GitHub Desktop, for example, click on the green Code button near the top of this page, click Open with GitHub Desktop and follow the instructions.
For Windows users, double click
install.bat, and for Linux users, from the terminal in the traiNNer-redux folder runchmod +x install.sh && ./install.shto install all Python dependencies to a new virtual environment. Theinstall.shscript is tested on Ubuntu and may need adjustments to work on other Linux distros.
Training a Model
Do a quick test run
The repository comes with several configs that are ready to use out of the box, as well as a tiny dataset for testing purposes only. To confirm that your PC can run the training software successfully, run the following command from the traiNNer-redux folder:
venv\Scripts\activate
copy "options/_templates/train/SPAN/SPAN_S_fidelity.yml" "options/train/SPAN/custom_SPAN_S_fidelity.yml"
python train.py --auto_resume -opt ./options/train/SPAN/custom_SPAN_S_fidelity.yml
You should see the following output within a few minutes, depending on your GPU speed:
...
2024-07-02 21:40:56,593 INFO: Model [SRModel] is created.
2024-07-02 21:40:56,668 INFO: Start training from epoch: 0, iter: 0
2024-07-02 21:41:17,816 INFO: [4x_SP..][epoch: 0, iter: 100, lr:(1.000e-04,)] [performance: 4.729] [eta: 14:11:33] l_g_mssim: 1.0000e+00 l_g_percep: 3.5436e+00 l_g_hsluv: 4.3935e-01 l_g_gan: 2.4346e+00 l_g_total: 7.4175e+00 l_d_real: 2.4136e-01 out_d_real: 2.9309e+00 l_d_fake: 5.2773e-02 out_d_fake: -2.4104e+01
The last line shows the progress of training after 100 iterations. If you get this far without any errors, your PC is able to train successfully. Press ctrl+C to end the training run.
Set up config file
Navigate to
traiNNer-redux/options/train, select the architecture you want to train, and open theymlfile in that folder in a text editor. A text editor that supports YAML syntax highlighting is recommended, such as VS Code (with the YAML extension) or Sublime Text (with the LSP and LSP-yaml packages). For example, to train SPAN, opentraiNNer-redux/options/train/SPAN/SPAN_fidelity.yml.At the top of the file, set the
nameto the name of the model you want to train. Give it a unique name so you can differentiate it from other training runs.Set the scale depending on what scale you want to train the model on. 2x doubles the width and height of the image, for example. Not all architectures support all scales. Supported scales appear next to the scale in a comment, so
# 2, 4means the architecture only supports a scale of 2 or 4.Set the paths to your dataset HR and LR images, at
dataroot_gtanddataroot_lqunder thetrain:section. The HR images and LR images should match in numer of images and filenames. For each matching LR/HR pair, the image resolutions should work with the selected scale, so if a scale of 2 is selected then each HR must be 2x the resolution of its matching LR image.If you want to enable validation during training, set
val_enabledtotrueand set the paths to your validation HR and LR images, atdataroot_gtanddataroot_lqunder thevalsection.If you want to use a pretrain model, set the path of the pretrain model at
pretrain_network_gand remove the#to uncomment that line.
Run command to start training
Single GPU
Run the following command to start training. Change ./options/train/arch/config.yml to point to the config file you set up in the previous step.
venv\Scripts\activate
python train.py --auto_resume -opt ./options/train/arch/config.yml
For example, to train with the SPAN config:
venv\Scripts\activate
python train.py --auto_resume -opt ./options/train/SPAN/SPAN_fidelity.yml
To pause training, press ctrl+C or close the command window. To resume training, run the same command that was used to start training. The --auto_resume flag will resume training from when it was paused.
Multi GPU
Run the following command to start training with multiple GPUs. Change --nproc-per-node 4 to the number of GPUs you want to use for training.
venv\Scripts\activate
torchrun --nproc-per-node 4 train.py --launcher pytorch --auto_resume -opt ./options/train/arch/config.yml
Test models
Models are saved in the safetensors format to traiNNer-redux/experiments/<name>/models, where name is whatever was used in the config file. chaiNNer can be used to run most models. If you want to run the model on images during training to monitor the progress of the model, set up validation in the config file, and find the validation results in traiNNer-redux/experiments/<name>/visualization.
The test script can also be used to test trained models, which is required to test models with architectures not yet supported by chaiNNer. For example, to test SPANPlus model, open the config file at ./options/test/SPANPlus/SPANPlus.yml, and update the following:
Edit the
dataroot_lqoption to point to a folder that contains the images you want to run the model on.Make sure the options under
network_gmatch the options undernetwork_gin the training config file that you used. For example, if you trainedSPANPlus_STS, then set the type toSPANPlus_STSundernetwork_gin the test config file as well.Update
pretrain_network_gto point to the path of the model you want to test.
Then run this command to run the model on the images as specified in the config file:
venv\Scripts\activate
python test.py -opt ./options/test/SPANPlus/SPANPlus.yml
Convert models to ONNX
If you want to convert your PyTorch models to ONNX format, you can use the convert_to_onnx.py script to do so. First install the additional dependencies for ONNX:
venv\Scripts\activate
pip install .[onnx] --ignore-requires-python
Then open a config file corresponding to the architecture of the model you trained. For example, if you trained SPANPlus, open the config file at ./options/onnx/SPANPlus/SPANPlus.yml, and update the following:
Set the
nameto the name of your model, the ONNX filename will include this name.Make sure the setting under
network_gmatch the settings you used to train your model.Set
pretrain_network_gto point to the path of your.safetensorsor.pthmodel that you want to convert.Set the options in the
onnxsection of the config file as needed.
Then run this command to do the conversion (make sure the path points to the .yml file you edited):
venv\Scripts\activate
python convert_to_onnx.py -opt ./options/onnx/SPANPlus/SPANPlus.yml
The converted onnx files will be saved to the onnx directory.