Sep 01, 2024

Public workspaceICA Data Ingestion Protocol

  • Gordon Qian1,
  • Ryan Davis1,
  • Jennifer Johnston2
  • 1University of Sydney;
  • 2NysnoBio
Icon indicating open access to content
QR code linking to this content
Protocol CitationGordon Qian, Ryan Davis, Jennifer Johnston 2024. ICA Data Ingestion Protocol. protocols.io https://dx.doi.org/10.17504/protocols.io.n92ld8188v5b/v1
License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License,  which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Protocol status: Working
We use this protocol and it's working
Created: August 30, 2024
Last Modified: September 01, 2024
Protocol Integer ID: 106735
Keywords: ASAPCRN, Neuronal Genome Atlas for Parkinsons (NGAP), Illumina Connected Analytics (ICA), Whole Genome Sequencing, Variants, iPSC
Funders Acknowledgement:
Michael J Fox Foundation
Grant ID: ASAP-000497
Disclaimer
User account required to access ICA
Abstract
Step-by-step protocol for uploading genome sequencing data into the Neuronal Genome Atlas for Parkinsons (NGAP). This Atlas resource is hosted on Illumina Connected Analytics (ICA). The Ingestion Protocol outlines main data upload using a command line interface (CLI), and offers alternative data ingestion methods to provide broad access across the ASAP CRN. Alternative methods provided include: the Amazon Web Services (AWS), the Illumina Service Connector, or Illumina BaseSpace.
Materials
  • Cell line genome sequencing data
Before start
User account required to access ICA.
This protocol details the use of a cloud based platform for analysis and storage of human patient cell line sequencing data, including iPSC and fibroblast whole genome sequence, using the Illumina suite of analysis software and assets. This uses DRAGEN and other DNA sequence analysis tools. Host capability for custom pipeline design, dockerisation of tools, and various other analyses.
Data ingestion methods
Data ingestion methods
Data ingestion can be performed via different methods where one may be preferred over the other due to data storage location or the user’s operating system.
Primary Upload method
Illumina’s Command Line Interface (CLI) which can be executed locally or from high-performance computing (HPC) servers where large sequencing data is typically located.  We provide detailed instructions below for data upload using this method into Illumina Connected Analytics (ICA) Database Platform
Other Upload Methods

Data Upload from AWS
Data Upload via Service Connector

Data Upload from BaseSpace
  • If your data is located on BaseSpace, you may copy your data onto ICA through Illumina’s “BaseSpace to ICA Data Copy” application.
Command Line Interface (CLI)
Command Line Interface (CLI)
CLI Installation
Mac/Linux Instructions:
  1. Place the CLI in a folder that is included in your $PATH environment variable list of paths, typically /usr/local/bin.
  2. You will also need to make the file executable so that the CLI can run:


Windows Instructions:
  1. Place the CLI in a folder that is included in your $PATH environment variable list of paths. In Windows this is typically in the “C:\Program Files” folder.
  2. If you do not have write access to that folder, open a CMD window in administrator mode and type the following commands:



Authentication
Authentication
The usage of Illumina’s CLI requires an API key associated with the user’s Illumina Connected Analytics (ICA) account access. This can be acquired through the following steps:

1. Locate the “Manage API Keys” menu from Illumina’s platform home

2. Click the button to generate a new API Key. Provide a name for the API Key. Then choose to either include all workgroups or select the workgroups to be included. Selected workgroups will be accessible with the API Key.


3. Click to generate the API Key. The API Key is then presented (hidden) with a button to show the key to be copied and a link to download to a file to be stored securely for future reference. Once the window is closed, the key contents will not be accessible through the domain login page, so be sure to store it securely for future reference if needed.


4. Authenticate the usage of the CLI using the following commands:

Input the API Key generated from the product dashboard when prompted for x-api-key.
Data Upload
Data Upload
  • In order to upload data onto ICA, you must first retrieve the relevant project ID you wish to upload the data to. This can be found by listing all projects using the command:

  • The first column of the output will show the project ID. Save this as it will be used for data upload.
  • To upload a file called “Sample-1_S1_L001_R1_001.fastq.gz” to the project, copy the project id and use the command syntax below:


  • To verify the file has uploaded, run the following to get a list of all files stored within the specified project:


  • The general upload syntax is the follow: icav2 projectdata upload <localFileFolder> <remote-path> --project-id <project-id>
  • You are limited to only uploading 1 file or folder at a time, so it is recommended to store all files in 1 folder to upload with one command line input.