Pal_finder protocol in Galaxy and MiMi protocol

Ruth Madrigal-Brenes; Gilbert Barrantes; Luis Sandoval; Eric J. Fuchs

Jan 03, 2025

Pal_finder protocol in Galaxy and MiMi protocol

DOI

dx.doi.org/10.17504/protocols.io.dm6gp9bo8vzp/v1

Ruth Madrigal-Brenes¹,
Gilbert Barrantes¹,
Luis Sandoval¹,
Eric J. Fuchs¹

¹Universidad de Costa Rica

Ruth Madrigal-Brenes

Universidad de Costa Rica

DOI: dx.doi.org/10.17504/protocols.io.dm6gp9bo8vzp/v1

Protocol Citation: Ruth Madrigal-Brenes, Gilbert Barrantes, Luis Sandoval, Eric J. Fuchs 2025. Pal_finder protocol in Galaxy and MiMi protocol. protocols.io https://dx.doi.org/10.17504/protocols.io.dm6gp9bo8vzp/v1

License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Protocol status: Working

We use this protocol and it's working

Created: November 29, 2024

Last Modified: January 03, 2025

Protocol Integer ID: 113161

Keywords: microsatellites, bioinformatics, tropical spider, Costa Rica, MiMi

Funders Acknowledgements:

Vicerrectoría de Investigación

Grant ID: C3-116

Abstract

The number of species threatened by fragmentation and loss of natural habitats due to changes in land use and unplanned urban expansion are rapidly increasing. Despite the seriousness of the situation, information on the impact of isolation caused by fragmentation and urbanization on spider genetic diversity is scarce, owing mostly to a lack of appropriate molecular markers. The main objective of this study was to develop microsatellite (SSR) primers for the spider Theridion evexum using low-coverage next-generation sequencing and bioinformatic tools. To increase the yield of DNA extracted from small spiders like T. evexum, we also optimized a CTAB DNA extraction protocol. We sequenced eight individuals at 4X using paired-end sequencing on an Illumina Novaseq 6000. Reads were cleaned and processed using the MiMi python pipeline. MiMi produced a total of 3999 putative microsatellite primers. After filtering for polymorphic loci with an allelic richness greater than three and primers that were present in at least 5 of the 8 sequenced individuals, 34 final markers were identified. An in vivo test of 13 of these 34 markers showed that 10 loci were polymorphic with at least three detectable alleles, one locus was monomorphic, and two loci did not produce PCR products. These markers will allow a better assessment of the effects of fragmentation and isolation across populations of this spider species. Furthermore, developing markers using low-coverage NGS (next-generation sequencing) and bioinformatic methods provide a valuable approach for uncovering SSR markers at a reduced cost for other tropical species, thereby broadening the scope of molecular ecology research in the tropics.

Set up

Create a user on https://palfinder.ls.manchester.ac.uk/.

Upload the raw sequences of each individual.

Sequence Quality Verification

Check sequences using FastQC with the following configuration: 

Short read data: Select raw sequences, both Forward” (R1) and Reverse (R2), separately for each individual. 

Contaminant list: None.

Adapter list: Upload a text file (.txt) containing adapter names and sequences.

Submodule and limit specifying file: None.

Disable grouping of bases for reads >50 bp: None.

Lower limit on sequence length: Default setting (blank).

Length of Kmer: Default setting (7).

Sequence Cleaning

Clean sequences using Trimmomatic-Galaxy Version 0.38.1 with the following configuration:

Single-end or paired-end reads?  Select paired-end (two separate input files).

Perform initial ILLUMINACLIP step? Yes.

Select standard adapter sequences or provide custom?  Select Custom.

Paste Customized adapter sequences in FASTA format.

Maximum mismatch count which will still allow a full match to be performed: 2

How accurate the match between the two 'adapter ligated' reads must be
for PE palindrome read alignment: 30.

How accurate the match between any adapter etc. sequence must be against a read: 10

Minimum length of adapter that needs to be detected (PE specific/palindrome mode): 8.

Always keep both reads (PE specific/palindrome mode)?  Yes

SLIDINGWINDOW: 4, 30

LEADING: 3

TRAILING: 3

MINLEN: 50

Output trimlog file? Yes. 

Output trimmomatic log messages? Yes. 

Save the outputs

Individual Microsatellite Identification

Run pal_finder in the Galaxy server with the following configuration:

Primer prefix: Enter a text prefix that will identify each individual.

Sequencing platform used to generate data: Select the Illumina option.

Input Type: Chose the "Pair of datasets" option. Select the paired sequences obtained with Trimmomatic (outputs of step 4.16).

Use all reads for microsatellite detection? Yes, as all sequences had been filtered using Trimmomatic.

Filters to apply to the pal_finder results: Selected all the default options.

Use PANDAseq to assemble paired-end reads and confirm primer sequences
are present in high-quality assembly: Yes.

Minimum number of 2-mer repeat units to detect: 6 

Minimum number of 3-mer repeat units: 4 

Minimum number of 4-mer, 5-mer, and 6-mer repeat units: 3 

Mispriming library to use: Default from pal_finder

Primer settings to use: Default from pal_finder (except for the minimum and maximum GC
percentage. Set it within a range of 40–60%).

SSR´s development with MiMi

Create a user account on an HPC cluster and ensure sufficient storage space (approximately 100 GB).

Download the MiMi program (https://github.com/graemefox/mimi).

Load the necessary modules for the program's functionality (Biopython, PANDAseq, MUSCLE, MiMi).

Upload the cleaned sequences for each individual and decompress them on the HPC cluster.

Edit the MiMi configuration file to adjust paths to the MiMi program location and the files uploaded in step 9.

Run MiMi. with the follosing command:

python MiMi_v0.1.2.py -c 'path to configuration file'

Download the output file named: MiMi_output_all_loci

SSR´s filtration

Open the file MiMi_output_all_loci on an any spreed sheet software.

Filter monomorphic microsatellites by filtering the column “Size_Range” to loci that have a value greater than 0. 

From the potential polymorphic microsatellites, filter those present in at least 6 of the 8 analyzed individuals (75% of the samples) by selecting the column named “Samples_containing_raw_reads_(respectively)” with 6 or more samples. And allergic richness of Ar = 3 or higher, by filtering the column “Alleles_present‘ with three or more alleles.

Public workspacePal_finder protocol in Galaxy and MiMi protocol

Pal_finder protocol in Galaxy and MiMi protocol