Icon indicating open access to content
QR code linking to this content
Protocol CitationShreyansh Priyadarshi 2025. OncoMark. protocols.io https://dx.doi.org/10.17504/protocols.io.yxmvm98xol3p/v1
License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License,  which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Protocol status: Working
We use this protocol and it's working.
Created: February 24, 2025
Last Modified: February 25, 2025
Protocol Integer ID: 123312
Disclaimer
DISCLAIMER – FOR INFORMATIONAL PURPOSES ONLY; USE AT YOUR OWN RISK

The protocol content here is for informational purposes only and does not constitute legal, medical, clinical, or safety advice, or otherwise; content added to protocols.io is not peer reviewed and may not have undergone a formal approval of any kind. Information presented in this protocol should not substitute for independent professional judgment, advice, diagnosis, or treatment. Any action you take or refrain from taking using or relying upon the information presented here is strictly at your own risk. You agree that neither the Company nor any of the authors, contributors, administrators, or anyone else associated with protocols.io, can be held responsible for your use of the information contained in or linked to this protocol or any of our Sites/Apps and Services.
Abstract
This protocol details the methodology for developing and validating OncoMark, a neural multi-task learning framework designed to predict cancer hallmark activity from transcriptomic data. It covers synthetic data generation from single-cell transcriptomics, hallmark annotation, model architecture, training, validation, and statistical analyses.
OncoMark: A Neural Multi-Task Learning Framework for Predicting Cancer Hallmark Activity from Biopsy Transcriptomic Data
Abstract
This protocol details the methodology for developing and validating OncoMark, a neural multi-task learning framework designed to predict cancer hallmark activity from transcriptomic data. It covers synthetic data generation from single-cell transcriptomics, hallmark annotation, model architecture, training, validation, and statistical analyses.
1. Materials & Resources
1.1. Software & Tools
  • Python (RRID: SCR_008394)
  • TensorFlow (RRID: SCR_016345)
  • Scikit-learn (RRID: SCR_002577)
  • ComBat for batch effect correction (RRID: SCR_012835)
  • UCell for Digital Scoring (GitHub Repository)
1.3. Computational Environment
  • Operating System: Linux Ubuntu 20.04
  • RAM Requirements: Minimum 64GB
2. Step-by-Step Protocol
Step 1: Data Collection and Preprocessing
  1. Download single-cell transcriptomic data from the Weizmann 3CA repository.
  2. Perform quality control (QC):
  • Exclude cells with mitochondrial transcript content > 15%.
  • Remove cells with < 200 or > 6000 detected transcripts.

  1. Download bulk transcriptomic datasets from public repositories.
  2. Curate hallmark gene sets from literature and databases.
Step 2: Digital Scoring of Hallmarks
  1. Compute digital scores using UCell for each hallmark per single cell.
  2. Binarize scores using Otsu’s thresholding to classify hallmark presence or absence.
  3. Calculate tissue-specific thresholds to refine hallmark classification.
Step 3: Synthetic Biopsy Data Generation
  1. Aggregate 200 hallmark-specific cells from each patient sample to simulate biopsies.
  2. Ensure non-overlapping hallmark-positive and hallmark-negative samples to avoid cross-contamination.
Step 4: Model Architecture & Training
  1. Define a multi-task learning model with a shared base layer and hallmark-specific outputs.
  2. Split data:
  • Training: 85% (57,735 samples)
  • Validation: 15% (10,195 samples)

  1. Use Adam optimizer (learning rate: 0.0001) and binary cross-entropy loss.
  2. Train model for 50 epochs with early stopping (patience = 6 epochs).
Step 5: Model Validation
  1. Use five-fold cross-validation repeated twice.
  2. Validate on external datasets (n=95 patients).
  3. Evaluate using F1-score, accuracy, precision-recall, and AUC-ROC.
Step 6: Statistical Analysis
  1. Compare hallmark distributions in cancer vs. normal samples using the Kolmogorov-Smirnov test.
  2. Perform logistic regression to analyze hallmark-drug associations.
  3. Compute odds ratios (ORs) for hallmark co-occurrences.
Step 7: Deployment & Clinical Application
  1. Deploy the trained model as a web tool (OncoMark Web Server).
  2. Allow clinicians to upload tumor transcriptomic data for real-time hallmark activity prediction.
3. Expected Results
  • High accuracy in predicting hallmark presence.
  • Clear separation of hallmark distributions in cancer vs. normal tissues.
  • Identification of hallmark-drug interactions.
4. Troubleshooting

IssuePossible CauseSolution
Low model accuracyOverfittingIncrease dropout, apply L2 regularization
Batch effectsDataset heterogeneityUse ComBat for batch correction
Imbalanced hallmark labelsUnequal representationPerform synthetic data augmentation

5. Data Sharing & Accessibility
This protocol follows best practices in reproducible AI-driven biomarker discovery and can be modified for additional hallmark analyses. For questions, refer to the full documentation at OncoMark Docs.