Requisites
- Install our simulation suite on your cluster, preferably on a shared filesystem.
- Running multiple simulations across several computers simultaneously (concurrent computing), will require as many licenses as the number of computers running the simulations.
eg. #licenses = #nodes running jobs at the same time - Concurrent computing is currently supported by all products.
- Distributed computing is only available for FDTD and varFDTD.
Configure your firewall
- Many Linux clusters communicate across a private network and therefore firewall security may not be required. If no firewall is in use in your network, this step may be skipped.
- The MPI processes communicate using a range of ports. It's easiest to simply disable the firewall on all nodes. An alternate solution is to configure MPI to use a specific range of ports, then create exceptions for those ports. See your MPI's documentation for details.
- If you want to leave the firewall turned on, two additional firewall exceptions are required:
- In some configurations, MPI requires the use of the SSH programs to start remote processes on the compute nodes during parallel execution. Ensure that the SSH port 22 is allowed to accept incoming TCP/IP connections on all of your compute nodes.
- Open/allow the TCP ports used by FlexLM. See FlexLM Configuration for details.
Shared network storage
When running a distributed job with MPI, the MPI process with Rank 0 is responsible from reading the project file from the disk. As long as you ensure that the localhost that has access to the simulation file is the first to appear in the host-list, the MPI process with Rank 0 should be launched on that node. When running concurrent jobs (eg. sweeps), the files are created on the local machine, but the job (Rank 0) is launched on a remote node, and these nodes will need access to the simulation file. The simple solution to both these problems is to set up shared network storage (or if that is not possible, we do support S3 storage). This shared storage must be accessible to all nodes under the same path or drive mapping. This will allow access to your simulation files from any node using the same path or Windows UNC pathname.
For example, you have the network storage location.
\\server\public\sims
- Mapping the above network path to the same drive letter on your Windows computers.
Drive X:\
- Mounting the network path under one location.
/mnt/shared/lumerical
On many Linux cluster/networks, each User's home directory is a network file system and is common to all nodes. If this is the case you may use your home directory to store your simulation files. For more information on creating a network file systems, see your operating system documentation.
Configure login credentials
Windows
- Your user account should have a unique username and a password.
- We do not recommend using the default Administrator account on windows.
- When using Intel MPI, register user credentials as shown here.
- For MPICH2 (deprecated), set the login credentials as shown here.
On Windows, use either MPICH2 or Intel MPI as the Job launching preset. Microsoft MPI or Local Computer are used when running only on the local machine.
Linux
Configure your compute nodes to allow remote login without a password, as the version of MPICH2 included with the installation package uses SSH to start remote jobs. If this is not configured, the user will have to type their password each time MPICH2 is called to run the simulation.
Creating a passwordless SSH login
- On your primary computer, enter the following commands to create a set of ssh keys.
$ ssh-keygen -t rsa
$ cd ~/.ssh
$ cat >> authorized_keys < id_rsa.pub - Press enter several times to accept all the defaults and an empty passphrase.
- This creates your public/private keys and saves them in your home directory
$HOME/.ssh
- Next, you must place your public key in the text file $HOME/.ssh/authorized_keys on each compute node
$ ssh <node name> "mkdir -p ~/.ssh; chmod 700 ~/.ssh"
$ cat ~/.ssh/id_rsa.pub | ssh <node name> "cat >> ~/.ssh/authorized_keys"
$ ssh <node name> "chmod 700 ~/.ssh/authorized_keys"
$ ssh <node_name>
Notes: Shared network file system
|
Install Lumerical on a shared filesystem
See Shared filesystem installation on Linux for details.
Configure license
See Global license configuration on a shared filesystem for details.
Configure resources
- Open the Lumerical software.
- Open Resource Configuration.
- on MODE and DEVICE, resources are configured on a per-solver basis (a tab for each solver),
- varFDTD and FDTD allows for parallel computing and changing the number of (MPI) processes
- 'Add' or 'Duplicate' a resource to the list.
- If you have a Job Scheduler installed on your cluster see: Job scheduler integration - resource configuration
- Edit each resource property as needed.