Sep 08, 2020

Public workspaceHomology modeling using SWISS-Model for Biochemistry I V.1

This protocol is a draft, published without a DOI.
  • 1James Madison University
Icon indicating open access to content
QR code linking to this content
Protocol CitationMichael Friedman, Chris Berndsen 2020. Homology modeling using SWISS-Model for Biochemistry I. protocols.io https://protocols.io/view/homology-modeling-using-swiss-model-for-biochemist-bkmjku4n
License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License,  which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Protocol status: Working
We use this protocol and it's working
Created: September 01, 2020
Last Modified: September 08, 2020
Protocol Integer ID: 41355
Abstract
Protocol for homology modeling proteins for use in Biochemistry I at James Madison University. Protocol guides students to use the SWISS-Model web server (citations below).

The protocol directs users to save data in OSF or the Open Science Framework. This is the preferred project management tool for the class and is required for JMU students using this for the course. Other users can use whichever system is preferred.

Citations for servers:
  1. Waterhouse, A., Bertoni, M., Bienert, S., Studer, G., Tauriello, G., Gumienny, R., Heer, F. T., de Beer, T. A. P., Rempfer, C., Bordoli, L., Lepore, R., and Schwede, T. (2018) SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Res. 46, W296–W303.
Guidelines
This protocol guides students through homology modeling and analysis of the resulting model. This protocol uses the CRX DNA binding domain to generate the results thus the shown images and results will vary.

The protocol directs users to save data in OSF or the Open Science Framework. This is the preferred project management tool for the class and is required for JMU students using this for the course. Other users can use whichever system is preferred.
Materials
SWISS-MODEL server: https://swissmodel.expasy.org/
A sequence in FASTA format
Internet connection
Structure viewing program such as YASARA or UCSF Chimera
Open Science Framework account (JMU students only)
Before start
Gather your sequence in FASTA format (an example is shown below)

>seq_name
MASDETEASETEAMDAET

NCBI BLAST
NCBI BLAST
10m
10m
Navigate to NCBI BLAST (Basic Local Sequence Alignment Tool) and paste your sequence into the "Enter Query Sequence" box.





The standard settings for the search are shown in the table.




Default SettingWhat it does
Enter Query Sequence
Query Subrange(Blank)Limits search to a part of the sequence. Can be useful if there are common motifs/domains in the sequence.
Choose Search Set
DatabaseNon-redundant protein sequences (nr)Limits search to a sub-set of sequences. For homology modeling searching the Protein Data Bank proteins (pdb) is a good idea if you want to see if your modeling might be successful.
Organism(Blank)Limit search to a specific organism or other taxonomic group.
Exclude(Unchecked)Reduce results by removing certain classifications of sequences.
Program Selection
AlgorithmblastpSetting changes how the databases are searched. blastp is the most straight-forward. PSI-BLAST is useful when the query sequence is not easily aligned to other sequences.


Record any changes to the settings in Step 2.1 below:

Press BLAST and wait until the results return.

Thie search can take up to Duration01:00:00 hour

Analysis of BLAST results to ID sequence
Analysis of BLAST results to ID sequence
Results will be returned as shown as below:



Column definitions from the Descriptions tab of the results.
Table columnWhat it tells you
DescriptionTells you identify of matching sequence. Predicted or hypothetical in title indicates protein has not been verified.
Max ScoreDuring alignment identities, similarities, and gaps are scored. This indicates the best score if the sequence was aligned multiple times.
Total ScoreIf many disconnected parts matched, this is the sum of the max scores for those
Query CoverIndicates the percentate of the query sequence found in the match. 100% means all of the sequence was found.
E valueE(xpect) value tells you how many sequences that would rank higher if this was a random match. 0 or very small numbers are good.
Per. IdentHow much of the sequence was identical in sequence. Need >40% for good homology model.
AccessionThe accession number for the sequence. Can be clicked to take you to the info card on that sequence.

Record your best 5 sequences and their statistics in the table below.



Sequence DescriptionMax ScoreTotal ScoreQuery CoverageE valuePer IdentAccession


In the Graphic Summary tab, you can view the domains in your sequence.

A domain is a part of the sequence with a known fold/shape/structure. A motif is a sequence that has a shape or function. Typically domains can fold on their on, while motifs are shorter pieces within domains.



Record any domains or motifs in the table below along with the approximate position within the sequence. This can help in the modeling and support the accuracy of your model later on.



Domain/Motif nameposition (this should be a number/set of numbers)


In the Alignments tab, the actual sequence alignment (the data) are shown.



Each alignment shows the following key information:

  • Identities and their location within the sequence.
  • Positives and their location within the sequence.
  • Gaps and their location within the sequence.
  • The alignment: Your sequence is the top row, the matched sequence in the middle row (+ means similar), and the sequece from the database (called Sbjct).
  • Position number of the sequence match. These are the numbers at each end of the sequences.
Press the Download link to the top right of the alignment and select Text you will get a complete file of your results. Upload this to your OSF folder for this project and name the file:
BLAST_alignment_[Group_name]_[sequence_name].txt
Replace [Group_name] with your name/group name without the brackets. Replace [sequence_name] with the name of the sequence.

Indicate your OSF file location as a link within a note on this step.

THIS IS YOUR DATA FILE FOR THE SEARCH!
Critical
Analysis of BLAST results to ID potential modeling templates
Analysis of BLAST results to ID potential modeling templates
Go togo to step #1 and repeat search but limit the Database to Protein Data Bank proteins (pdb). This search will identify proteins of known structure that match your protein and can suggest if your modeling attempt will be successful. Record your sequence matches in the table.


Accession numbers here lead to the information on the structure which may help when using SWISS-MODEL. These accession numbers are the PDB ID numbers.
Sequence DescriptionMax ScoreTotal ScoreQuery CoverageE valuePer IdentAccession
Table for recording results from PDB focused BLAST.

The top five structures here are potential templates structures which you can use to model your sequence. This means these structures are similar at the sequence level to your sequence and potentially will result in a similar structure to your sequence.
Homology modeling using SWISS-Model
Homology modeling using SWISS-Model
5m
5m
Click on the link for the SWISS-model server to get to a page that looks like


5m
Follow the instructions on the image above to start the modeling by Build Model. Initial steps can take up to Duration00:20:00


Note
Pressing Build Model is auto building using SWISS-Model with the best template according to SWISS-Model. Alternatively, you can Search for Templates and do template selection manually; guided by your templates identified using BLAST.

If you choose to do manual building via the Search for Templates tab. Record your templates below.


Template name (second column in table)

Once model building (either manual or automated) is complete, a screen as shown below will appear.



Useful information from this screen
  • GMQE for Global Model Quality Estimation is scored from zero to 1 and indicates model quality based on the alignment with numbers closer to 1 indicating a more reliable model.
  • QMEAN indicates the model quality based on structural features and the quality of the chemistry such as torsion angles and solvation. A good model has a number that is more positive, although a good model can have a negative QMEAN score. Less than -4 and model has bad chemistry.
  • Local Quality Estimate indicates model quality on a per residue basis and can indicate if there are sections of hte model that are problematic (such as the ends of the model in the report above)
  • Model-Template alignment shows how well the template structure and the sequence align and what parts of the model were used. Blue colors means better alignment while red colors mean worse alignment and modeling. Secondary structure is also indicated with tubes for α-helix and arrows for β-sheet.
The grey Model button leads to a menu to download information.

Two key options:
  1. PDB format results in just the homology model, which can be viewed in YASARA or Chimera
  2. Model Report downloads a .zip with the PDB file model and an HTML based report of the model process including the statistics shown in Step 8.1.

Download both and upload both files to OSF.

Name the PDB file:
SWISS_model_[Group_name]_[sequence_name].pdb
Replace [Group_name] with your name/group name without the brackets. Replace [sequence_name] with the name of the sequence.

Name the zip file:
SWISS_data_[Group_name]_[sequence_name].zip
Replace [Group_name] with your name/group name without the brackets. Replace [sequence_name] with the name of the sequence.

THESE ARE YOUR DATA FILES FOR SWISS MODEL!
Indicate your OSF file location as a link within a note on this step.

The Structure Assessment button leads to a new page showing a basic geometric and chemical assessment of the model.
The Ramachandran plot indicates if the phi/psi angles are appropriate for a protein structure and is interactive. The Phi angle is the dihedral angle for the rotation of the N-Cα bond while the psi angle is for rotation around the Cα-C bond of the amino acid backbone. The ideal angles for helices, sheets, and coils shown in green areas below are known to minimize steric clashes between atoms.

Use the camera tool to record the Ramachandran plot and upload it to your OSF.

Name the image file:
SWISS_phipsi_[Group_name]_[sequence_name]
Replace [Group_name] with your name/group name without the brackets. Replace [sequence_name] with the name of the sequence. Add a note with the link to your file.

Ramachandran plot

The Molprobity Results are numerical scores based on the model and indicate what percentage of amino acids that fall in the ideal geometry category and have minimal clashes. The check boxes allow for visualization of the bad amino acids and can be useful to see if there are general model problems or localized issues. Localized issues can be fixed, general problems cannot.
Record your Molprobity numbers.



Deviant amino acids
Molprobity Score
Clash Score
Ramachandran favored
Ramachandran outliers
Rotamer outliers
C-beta deviations
Bad bonds
Bad angles


Make sure you have recorded all the required data.

If you have completed the Phyre modeling. Save the record, export to PDF and upload this file to OSF in the Notebook files.
Ligand identification using SWISS-Model
Ligand identification using SWISS-Model
Clues to functionality can be gleaned from comparing unknown or predicted structures to with previously characterized structures of known function and characteristics. These scans can be biased against novel proteins or proteins with similar structures but distinct functions, but for initial guesses can be powerful. These methods align the structure and/or amino acid sequence to a database of structures with known ligand binding sites and look for structures with the similarity in amino acid composition, position, and over 3-D similarity. The idea being that similar structures lead to similar functions.
Return back to your SWISS-Model search and note any models with ligands present. This is noted in the far right column of the templates page.




Identify the ligand by clicking on the ligand name. Record ligands in the top few hits in the table below.



Ligand name


Click the Name of the template to see where the ligands bind to the protein.
Generally ligands are classified into nonfunctional binders, covalent, and non-covalent binders. The latter two categories are the most interesting. Hovering over the ligand name shows the molecule bound to the protein and left-clicking on the name zooms the structure to show the specifics of ligand binding, including the weak interactions between the amino acids and ligand.
Download the top two hits with ligands bound from the server and align it to your model in YASARA and record the RMSD value that YASARA returns to you.


Note
A perfect match in RMSD is 0, while a poor match is one where the RMSD value is >3 Å, however a high RMSD value does not mean there are not regions of local similarity. A visual comparison is always helpful!

Observe if there is any match in the ligand/substrate binding sites between your model and the template structures with ligands bound.
Note
Does the ligand “fit” into the aligned sites? It will not be perfect, so look for how bumps could fit into holes or nearby holes! Weak interactions also should be analyzed.

Record the ligand name and possible interactions between your model and the ligand below.
Ligand nameSource structure PDB IDInteracting amino acids in the model structure (three letter code and amino acid number)