AlphaFold Usage on HPCC

3 minute read

AlphaFold3

Loading the module

You can load AlphaFold3 using the following commands:

module load alphafold/3
singularity shell $ALPHAFOLD_SING

You can also run AlphaFold3 with a gpu. If you wish to use a GPU, log into an A100 gpu node and then use the following commands:

module load alphafold/3
singularity shell --nv $ALPHAFOLD_SING

Using AlphaFold databases

A handful of databases are available at $ALPHAFOLD_DB (available after loading the alphafold/3 module).

An example command is as follows:

module load alphafold/3
singularity shell --nv $ALPHAFOLD_SING
# Commands from here on are run inside of the Alphafold container
python3 /app/alphafold/run_alphafold.py \
--model_dir=$ALPHAFOLD_DB/model \
--db_dir=$ALPHAFOLD_DB \
--json_path=fold_input.json \
--output_dir=my_output_folder/

More information on using Alphafold3 can be found in the Alphafold3 GitHub repo, including input documentation and output documentation.

Processing Large Datasets

Sometimes the dataset cannot fit within the memory of a single GPU. In this case you’ll need to use Unified Memory (“Combined” GPU and System memory). This does come with a drop in performance, but might be the only way to get large datasets processed.

To use Unified Memory, you can add these additional flags to the alphafold command:

--env XLA_PYTHON_CLIENT_PREALLOCATE=false \
--env TF_FORCE_UNIFIED_MEMORY=true \
--env XLA_CLIENT_MEM_FRACTION=3.2

For example:

python3 /app/alphafold/run_alphafold.py \
--model_dir=$ALPHAFOLD_DB/model \
--db_dir=$ALPHAFOLD_DB \
--json_path=fold_input.json \
--env XLA_PYTHON_CLIENT_PREALLOCATE=false \
--env TF_FORCE_UNIFIED_MEMORY=true \
--env XLA_CLIENT_MEM_FRACTION=3.2 \
--output_dir=my_output_folder/

AlphaFold2

Description of AlphaFold2

Loading the module

You can load AlphaFold2 using the following commands:

module load alphafold/2
singularity shell $ALPHAFOLD_SING

You can also run AlphaFold2 with a gpu. If you wish to use a GPU, log into a P100 gpu node and then use the following commands:

module load alphafold/2
singularity shell --nv $ALPHAFOLD_SING

Using Alphafold Databases

When running the alphafold command, you will be asked for certain databases. These databases can be found under the path $DATABASE_DIR/alphafold/. They can also be accessed using the $$ALPHAFOLD_DB environment variable that is automatically set after loading the alphafold module.

Here is an example of how to write your alphafold command using the monomer preset:

python3 /app/alphafold/run_alphafold.py \
--model_preset=monomer \
--db_preset=reduced_dbs \
--use_gpu_relax=True \
--data_dir=$ALPHAFOLD_DB \
--uniref90_database_path=$ALPHAFOLD_DB/uniref90/uniref90.fasta \
--mgnify_database_path=$ALPHAFOLD_DB/mgnify/mgy_clusters_2018_12.fa \
--template_mmcif_dir=$ALPHAFOLD_DB/pdb_mmcif/mmcif_files \
--max_template_date=2020-05-14 \
--obsolete_pdbs_path=$ALPHAFOLD_DB/pdb_mmcif/obsolete.dat \
--pdb_seqres_database_path=$ALPHAFOLD_DB/pdb_seqres/pdb_seqres \
--uniprot_database_path=$ALPHAFOLD_DB/uniprot/uniprot.fasta \
--small_bfd_database_path=$ALPHAFOLD_DB/small_bfd/bfd-first_non_consensus_sequences.fasta \
--pdb70_database_path=$ALPHAFOLD_DB/pdb70/pdb70 \
--fasta_paths=<path to fasta file here> \
--output_dir=<path to output directory>

and an example using the multimer preset:

python3 /app/alphafold/run_alphafold.py \
--model_preset=multimer \
--db_preset=reduced_dbs \
--use_gpu_relax=True \
--data_dir=$ALPHAFOLD_DB \
--uniref90_database_path=$ALPHAFOLD_DB/uniref90/uniref90.fasta \
--mgnify_database_path=$ALPHAFOLD_DB/mgnify/mgy_clusters_2018_12.fa \
--template_mmcif_dir=$ALPHAFOLD_DB/pdb_mmcif/mmcif_files \
--max_template_date=2020-05-14 \
--obsolete_pdbs_path=$ALPHAFOLD_DB/pdb_mmcif/obsolete.dat \
--small_bfd_database_path=$ALPHAFOLD_DB/small_bfd/bfd-first_non_consensus_sequences.fasta \
--uniprot_database_path=$ALPHAFOLD_DB/uniprot/uniprot.fasta \
--pdb_seqres_database_path=$ALPHAFOLD_DB/pdb_seqres \
--fasta_paths=<path to fasta file> \
--output_dir=<path to output directory>

Remember to fill in your fasta path and output dir if you wish to use these templates.

Additionally, these are not the only two methods of running AlphaFold, and different modes might require different sets of arguments to be passed to alphafold.py. For more details regarding what parameters are available, as well as more examples, please refer to the Alphafold Github Repo.

Last modified January 2, 2025: Update alphafold.md (6d4b4f012)