OncoMark

Shreyansh Priyadarshi

Feb 25, 2025

OncoMark

DOI

dx.doi.org/10.17504/protocols.io.yxmvm98xol3p/v1

Shreyansh Priyadarshi¹

¹Ashoka University

Shreyansh Priyadarshi

Ashoka University

DOI: dx.doi.org/10.17504/protocols.io.yxmvm98xol3p/v1

Protocol Citation: Shreyansh Priyadarshi 2025. OncoMark. protocols.io https://dx.doi.org/10.17504/protocols.io.yxmvm98xol3p/v1

License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Protocol status: Working

We use this protocol and it's working.

Created: February 24, 2025

Last Modified: February 25, 2025

Protocol Integer ID: 123312

Disclaimer

DISCLAIMER – FOR INFORMATIONAL PURPOSES ONLY; USE AT YOUR OWN RISK

The protocol content here is for informational purposes only and does not constitute legal, medical, clinical, or safety advice, or otherwise; content added to protocols.io is not peer reviewed and may not have undergone a formal approval of any kind. Information presented in this protocol should not substitute for independent professional judgment, advice, diagnosis, or treatment. Any action you take or refrain from taking using or relying upon the information presented here is strictly at your own risk. You agree that neither the Company nor any of the authors, contributors, administrators, or anyone else associated with protocols.io, can be held responsible for your use of the information contained in or linked to this protocol or any of our Sites/Apps and Services.

Abstract

This protocol details the methodology for developing and validating OncoMark, a neural multi-task learning framework designed to predict cancer hallmark activity from transcriptomic data. It covers synthetic data generation from single-cell transcriptomics, hallmark annotation, model architecture, training, validation, and statistical analyses.

OncoMark: A Neural Multi-Task Learning Framework for Predicting Cancer Hallmark Activity from Biopsy Transcriptomic Data

Abstract
This protocol details the methodology for developing and validating OncoMark, a neural multi-task learning framework designed to predict cancer hallmark activity from transcriptomic data. It covers synthetic data generation from single-cell transcriptomics, hallmark annotation, model architecture, training, validation, and statistical analyses.

1. Materials & Resources

1.1. Software & Tools
Python (RRID: SCR_008394)
TensorFlow (RRID: SCR_016345)
Scikit-learn (RRID: SCR_002577)
ComBat for batch effect correction (RRID: SCR_012835)
UCell for Digital Scoring (GitHub Repository)

1.2. Data Sources
Single-cell transcriptomic data: Weizmann 3CA repository
Bulk transcriptomic datasets:
The Cancer Genome Atlas (TCGA)
MET500
POG570
Cancer Cell Line Encyclopedia (CCLE)
Therapeutically Applicable Research to Generate Effective Treatments (TARGET)
Pan-Cancer Analysis of Whole Genomes (PCAWG)
Normal datasets:
GTEx, ANTE

1.3. Computational Environment
Operating System: Linux Ubuntu 20.04
RAM Requirements: Minimum 64GB

2. Step-by-Step Protocol

Step 1: Data Collection and Preprocessing
Download single-cell transcriptomic data from the Weizmann 3CA repository.
Perform quality control (QC):
Exclude cells with mitochondrial transcript content > 15%.


Remove cells with < 200 or > 6000 detected transcripts.

Download bulk transcriptomic datasets from public repositories.
Curate hallmark gene sets from literature and databases.

Step 2: Digital Scoring of Hallmarks
Compute digital scores using UCell for each hallmark per single cell.
Binarize scores using Otsu’s thresholding to classify hallmark presence or absence.
Calculate tissue-specific thresholds to refine hallmark classification.

Step 3: Synthetic Biopsy Data Generation
Aggregate 200 hallmark-specific cells from each patient sample to simulate biopsies.
Ensure non-overlapping hallmark-positive and hallmark-negative samples to avoid cross-contamination.

Step 4: Model Architecture & Training
Define a multi-task learning model with a shared base layer and hallmark-specific outputs.
Split data:
Training: 85% (57,735 samples)


Validation: 15% (10,195 samples)

Use Adam optimizer (learning rate: 0.0001) and binary cross-entropy loss.
Train model for 50 epochs with early stopping (patience = 6 epochs).

Step 5: Model Validation
Use five-fold cross-validation repeated twice.
Validate on external datasets (n=95 patients).
Evaluate using F1-score, accuracy, precision-recall, and AUC-ROC.

Step 6: Statistical Analysis
Compare hallmark distributions in cancer vs. normal samples using the Kolmogorov-Smirnov test.
Perform logistic regression to analyze hallmark-drug associations.
Compute odds ratios (ORs) for hallmark co-occurrences.

Step 7: Deployment & Clinical Application
Deploy the trained model as a web tool (OncoMark Web Server).
Allow clinicians to upload tumor transcriptomic data for real-time hallmark activity prediction.

3. Expected Results
High accuracy in predicting hallmark presence.
Clear separation of hallmark distributions in cancer vs. normal tissues.
Identification of hallmark-drug interactions.

4. Troubleshooting

IssuePossible CauseSolution
Low model accuracyOverfittingIncrease dropout, apply L2 regularization
Batch effectsDataset heterogeneityUse ComBat for batch correction
Imbalanced hallmark labelsUnequal representationPerform synthetic data augmentation

5. Data Sharing & Accessibility
Synthetic Data: https://doi.org/10.5061/dryad.zw3r228jc
Codebase: OncoMark GitHub
Python Package: OncoMark on PyPI
Web Server: OncoMark Web Server
This protocol follows best practices in reproducible AI-driven biomarker discovery and can be modified for additional hallmark analyses.

For questions, refer to the full documentation at OncoMark Docs.

Issue	Possible Cause	Solution
Low model accuracy	Overfitting	Increase dropout, apply L2 regularization
Batch effects	Dataset heterogeneity	Use ComBat for batch correction
Imbalanced hallmark labels	Unequal representation	Perform synthetic data augmentation

Public workspaceOncoMark

OncoMark