Run a case ============================ Tested Clusters ---------------------------- SmartFlow has been tested and verified on various HPC clusters in Europe and China. The following table summarizes the hardware environment and job scheduling systems used during deployment. .. list-table:: HPC Clusters Tested with SmartFlow :widths: 15 25 20 20 20 :header-rows: 1 * - Country - Cluster - Partition - Scheduler - Network Interface * - Italy - CINECA - Booster - Slurm - ib0 * - China - BSCC (Beijing Super Cloud Center) - N32EA14P - Slurm - ib0 * - China - BSCC (Beijing Super Cloud Center) - BSCC-A - Slurm - ib0 Running on a standalone machine ------------------------------- To run a case on a standalone machine, we can use: .. code-block:: sh python /your/path/SmartFlow/src/smartflow/main.py It should be noted that the ``main.py`` file is located in the ``/your/path/SmartFlow/src/smartflow/main.py`` . You may encounter an error message in your current running folder ``../../SmartFlow/examples/train_retau_05200/err`` file. Please check the error message and fix it with your own settings. Perhaps, you may see some errors about the ``wandb`` as we import ``wandb`` library. If you see ``wandb`` error, please create or login `wandb `_ with your own account and add API_key as follows: .. code-block:: python import wandb wandb.login(key="your_api_key") Running with SLURM on a CPU cluster ----------------------------------- To run a case on a CPU cluster, we can use a SLURM script such as: .. code-block:: sh #!/bin/bash #SBATCH --time=48:00:00 #SBATCH --nodes=2 #SBATCH --job-name=smartflow #SBATCH --account=user_account #SBATCH --qos=qos_name #SBATCH --ntasks-per-node=32 #SBATCH --cpus-per-task=1 #SBATCH --output=slurm-%j.out #SBATCH --error=slurm-%j.err python main.py The script is submitted with the following command: .. code-block:: sh sbatch slurm.sh For this setting, we allocate 2 nodes with 32 tasks per node and 1 CPU per task (for a total of 64 tasks). The job is submitted to the ``qos_name`` queue under the ``user_account`` account. The job is expected to run for 48 hours. The output and error logs are saved in the ``slurm-%j.out`` and ``slurm-%j.err`` files, respectively, where ``%j`` represents the job ID. Running with SLURM on a GPU-accelerated cluster ----------------------------------------------- To run a case on a GPU-accelerated cluster, we can use a SLURM script such as: .. code-block:: sh #!/bin/bash #SBATCH --time=48:00:00 #SBATCH --nodes=2 #SBATCH --job-name=smartflow #SBATCH --account=user_account #SBATCH --qos=qos_name #SBATCH --ntasks-per-node=32 #SBATCH --cpus-per-task=8 #SBATCH --gres=gpu:4 #SBATCH --output=slurm-%j.out #SBATCH --error=slurm-%j.err python main.py For this setting, we allocate 2 nodes with 32 tasks per node and 8 CPUs per task, along with 4 GPUs per node for acceleration. The job is submitted to the ``qos_name`` queue under the ``user_account`` account. The job is expected to run for 48 hours. The output and error logs are saved in the ``slurm-%j.out`` and ``slurm-%j.err`` files, respectively, where ``%j`` represents the job ID.