On a cluster, the computers are generally referred to as compute nodes rather than workstations. Often one or more computers in a cluster are designated as management nodes and are reserved for launching jobs and/or storing data files. For the purpose of these instructions, we will use the terminology of compute nodes and management node.
The process on running simulations on a cluster comes in two parts.
- The first would be for IT / System Administrators who are configuring the nodes for Lumerical products and FlexLM.
- The second part is for end users who want to run or submit simulation jobs to the cluster.
- We are focusing on FDTD and varFDTD, which offer the possibility of both distributed and concurrent computing.
- Running simulations simultaneously, across several computers/nodes will require an additional license for each additional node running the simulation.
- Product software must be of the same release version on all nodes.
- The example provided here is for the PBS scheduler.
- Please coordinate with your IT department Administrator on how to configure and run FDTD simulations or submit jobs to your cluster.
- The submission script and how you submit to your cluster will vary depending on your cluster and job scheduler.
Part 1: Setup and configuration
- Getting started
- FlexLM installation
- Product installation
- Configure license
- Product executables
Part 2: Running on cluster
1. Setup and Configuration
- Configure your system first, as shown in the Getting started with parallel computing page.
FlexNet license manager
- If FlexLM is going to be installed on the cluster, it is typically installed on the management node.
See different install process below.
- On the other hand, Lumerical Suite can be installed on both the management node and the compute nodes.
- Install the product on each node of your cluster (or group of workstations).
- If you have a large number of compute nodes, you may find that manually running the install script is time consuming.
- One way to avoid this is to install on a network file system that is visible to all the compute nodes.
- This can be accomplished by extracting the rpm package into a different location such as;
- See the different installation process below.
- We recommend a system wide license configuration, rather than configuring it per user.
- For non graphical installation, please use the system wide, "License.INI" preference file to set the license server information.
- Another way to configure the license is to add the environment variable, LUMERICL_LICENSE_FILE,
- When running parallel simulations, inter-process communication is accomplished using a Message Passing Interface (MPI) library.
- Different MPI library implementations allow parallel execution on a variety of different types of hardware.
- Several versions of MPI are included with the standard product installation.
- Use the MPI version that is most appropriate for your system, and select the corresponding engine executable (.lcl).
- A script determines which options are available on your system,
- The default option MPI version is MPICH2 - Nemesis, located at,
- The corresponding simulation engine for FDTD is,
- The corresponding simulation engine for MODE is,
- The eigensolver engine (fd-engine) is not a parallel application so there is no need to provide different versions of the engine for each MPI version.
See: Supported MPI variants for more information.
Testing the MPI
- Lumerical includes a simple test program to test your MPI installation.
- If CPi runs successfully, you will see the value of pi (3.1415....) printed on your screen as well as a message from each of the computers that participates in the calculation.
- Run CPi to test the default MPICH2 framework that comes with Lumerical,
/opt/lumerical/2019b/mpich2/nemesis/bin/mpiexec -n 4 /opt/lumerical/2019b/mpitest/cpi-mpich2nem
- The option '-n' specifies the number of processes to use in the computation.
- You may change the number '4' to match the number of processors in your node.
- There are also other advanced options for mpiexec/mpirun that can be listed with the command,
- Try to run CPi with the same arguments that will be used to run the actual simulation engine,
/<your mpi installation path>/mpiexec -n 4 /opt/lumerical/2019b/mpitest/cpi-mpich2nem
- Download any of the example files for FDTD from the Application Gallery. The command required to run the simulation should be similar to what was required for the CPi test.
/opt/lumerical/2019b/mpich2/nemesis/bin/mpiexec -n 4 /opt/lumerical/2019b/bin/fdtd-engine-mpich2nem <example>.fsp
2. Running on cluster
We will consider the most common scenario (which is usually the most complex scenario as well) when working on a cluster.
- The user’s local computer has a full license of the product installed (both CAD and engine licenses).
- This machine is used to create the simulation files that will be run in the cluster.
- The cluster is on a separate network.
- The cluster uses a job scheduler.
(example templates for a Linux based PBS system are provided)
- The user cannot run the GUI on the cluster because only engine licenses are installed in the cluster and/or the user has no access to a graphical interface there.
- The scripting and data analysis tools are associated with the CAD license, which in this scenario is only available on the local computer, not the cluster.
- Therefore, any post-processing of the results after the simulations have been run, happens at the local computer. (e.g. visualization, using analysis groups and processing data with scripts)
- The work flow described next can be modified when some of these conditions do not apply.
- However, by examining this scenario we can illustrate the most common challenges.
1. On local computer
- Create a small simulation file that can be used for testing purposes.
- Use this test file to check that the simulation runs properly on your local computer and to identify any potential issues with the simulation setup.
- Generate the desired set of simulation files to be run in the cluster.
- If this is the first time you use the cluster, it is useful to use small files for a first quick test of the cluster setup.
- The set of files can be generated using a parameter sweep or with a script.
- Transfer the set of files from your local computer to the network drive where all the nodes have access.
2. On the cluster
- A job submission script must be created for each simulation file.
- This is commonly done using a shell script file where you specify the computing resources required for running the simulation (e.g. the amount of memory and maximum simulation time required).
- Once the submission script is generated, it can be submitted using the appropriate command (e.g. qsub in a PBS scheduler).
- These procedures can be automated with additional shell script files when you have multiple simulation files that require multiple job submissions.
- For more details about the submission process see the next subsection below, Submitting jobs to a scheduler.
- Once the jobs have been submitted, the scheduler will start each one as the resources in the cluster become available.
- Typically, once all your jobs have finished, you will be notified (e.g. an email from the scheduler).
- Transfer the set of files from the cluster network drive back to your local computer.
This step can be time consuming depending on the speed of the connection and the size of the results stored in the simulation files.
If while being on your local computer you can work in a folder shared with the cluster, you can avoid this step.
3. Back to local computer
- Visualize and analyze the results.
- If you created the set of simulation files using the parameter sweep tool you can load the results from the simulation file as shown here.
- Depending on the situation, this analysis step can be demanding in terms of time and computing resources.
- This can only be done in a machine with both a CAD license and access to a visual interface.
- If none of the cluster nodes satisfies these conditions you can only do it in your local machine.
- The first step is to obtain an example or template of a job submission script for your cluster from your cluster administrator.
- This script can be different depending on the scheduler configuration.
- The template for a PBS scheduler will probably be similar as shown here.
#PBS -S /bin/bash
#PBS -l mem=<total_memory>mb
#PBS -l walltime=<hours>:<minutes>:<seconds>
#PBS -l software=<product_license>
module load <your_application>
echo "Starting run at: 'date'"
echo "Job finished at: 'date'"
- The information inside <> signs must be provided according to the application you want to run.
- Update this with the appropriate information for your first FDTD simulation job submission.
- We recommend using a small simulation file for your tests.
- The updated template should look like below,
#PBS -S /bin/bash
#PBS -l mem=100mb
#PBS -l walltime=1:00:00
#PBS -l software=FDTD
module load lumerical
echo "Starting run at: 'date'"
/opt/lumerical/2019b/mpich2/nemesis/bin/mpiexec -n 4 /opt/lumerical/2019b/bin/fdtd-engine-mpich2nem ./simulation.fsp
echo "Job finished at: 'date'"
- The 'application_command_line' (the one that executes the simulation) depends on the MPI you will be using.
- To submit it in a PBS system use qsub and the name of the submission script,
Submitting multiple simulations
Having tested the process to submit one simulation file, you can now automate the process for multiple simulation files. For this purpose, we provide a set of three shell script files for a Linux based PBS scheduling system, which can be easily reused and customized:
- Master script
- This script might require some minor customization.
- It takes the number of processes per job (optional, default is 8 processes) and the fsp files as arguments.
- It then generates a job submission script for each simulation file using the PBS template script and the simulation information provided by the process template script.
- Finally, the script submits the jobs using qsub.
- PBS template script
- This is the template script that is used to create the job submission and it must be customized to ensure that the submission script follows the requirements from your system administrator.
- The modifications are essentially the same as the ones described above - "Submission script".
- Process template script
- This script rarely needs customization.
- It replaces the tags in the PBS template script for;
- total memory required for all processes <total_memory>
- memory required by each process <processor_memory>
- total time for the simulation <hours>,<minutes>,<seconds>
- number of processes to use <n>
- name of the fsp project file <filename>
- The file name and the number of processes are provided when calling the master script.
- For all the other tags, default values will be used unless the appropriate options are provided when calling the process template script from the master script.
- Once the shell script files are configured, you can submit jobs to the scheduler,
sh /opt/lumerical/2019b/bin/fdtd-run-pbs.sh -n <procs> fsp1 fsp2 ... [fspN]
- For example, to submit all the fsp files in a directory to the scheduler, you can use,
sh /opt/lumerical/2019b/bin/fdtd-run-pbs.sh *.fsp
- The script files are documented with comments in case they need to be customized.
- The process template script should not be modified unless some advanced customization is required.
- If you modify the master and/or PBS template scripts, we recommend that you make a copy of them under a different name so that your changes will not be overwritten when you upgrade the software. e.g. the master script can be renamed to 'myfdtd-run.sh'.
- If you change the name of the PBS template script, you must modify the TEMPLATE variable in the master script to specify the correct template file.