Dec 09, 2024

Public workspaceConcatenate Sequenced Data for Bioinformatics Analysis

  • 1NCSU
Icon indicating open access to content
QR code linking to this content
Protocol CitationDwi Cahyani 2024. Concatenate Sequenced Data for Bioinformatics Analysis. protocols.io https://dx.doi.org/10.17504/protocols.io.81wgbr6jnlpk/v1
License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License,  which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Protocol status: Working
We use this protocol and it's working
Created: December 02, 2024
Last Modified: December 09, 2024
Protocol Integer ID: 113433
Abstract
This protocols helps to create concatenated files from a sequenced data.
Step 1: Prepare Your Data
Step 1: Prepare Your Data
Go to the server where your sequencing data is stored (e.g., Oxford Nanopore’s EPI2ME or MinKNOW output directory)
Download only the passed reads (fastq.gz) because failed reads are typically of lower quality and are excluded from most analyses. Extracted all data.
Make sure all your FASTQ.gz files are stored in one folder for easy access during the concatenation process. Name the files systematically (e.g., sample1.fastq.gz, sample2.fastq.gz) to avoid confusion.
Step 2: Concatenate Using Galaxy
Step 2: Concatenate Using Galaxy
Open the Galaxy Website
Navigate to usegalaxy.org or the Galaxy server you’re using.
Log In or Create an Account
If you don’t already have an account, create one—it’s free! Logging in will save your workflow and results.
Upload Your Data by clicking Upload data in the Galaxy interface. Drag and drop your FASTQ.gz files into the upload window or select them manually. Wait for the files to be fully uploaded and visible in your Galaxy history panel.
Concatenate Datasets by searching for "concatenate dataset" tool in the Galaxy tools panel. Use the "Concatenate Dataset (tail to head cat)" option for a fast and straightforward concatenation. Select your FASTQ.gz files in the order you want them concatenated, Run the tool!.
Download the Concatenated File. Once the process is complete, download the concatenated FASTQ.gz file to your local machine for further analysis.
Step 3: Concatenate Using the Windows Command Line
Step 3: Concatenate Using the Windows Command Line
Prepare Your Files
Make sure all your FASTQ.gz files are in the same folder.
Unzip the Files.
If your FASTQ files are compressed (.gz), unzip them first. You can use a tool like 7-Zip or run the following command in the terminal:
bash
Copy code
gzip -d *.gz

Open Command Prompt. On your Windows computer, press Windows + R, type cmd, and hit Enter
Navigate to the Folder. Use the cd command to move to the folder containing your files. For example:
cd C:\Users\YourName\SequencingData
Concatenate the Files. the command line write: type file1.fastq file2.fastq > concatenated.fastq
Replace file1.fastq and file2.fastq with the actual file names.

  • Recompress the File. If needed, recompress the concatenated file using:
gzip concatenated.fastq

Finish!