Examples

Example Analysis of ALK Protein Sequences in Mus musculus

The input file ALK_mice.faa contains amino acid sequences for ALK (Anaplastic Lymphoma Kinase) proteins identified in Mus musculus (house mouse). Each sequence is labeled with a unique identifier. When you upload this sequence set to the pipeline, protein property analysis will be reported per sequence.

The output file protein_analysis_results-2.zip contains the full results of running the pipeline on this input, including physicochemical property tables, a pairwise sequence similarity table, and clustering visualizations.


Typical Workflow

  1. Open the notebook in Google Colab
  2. Run the setup cells
  3. Upload ALK_mice.faa when prompted for a FASTA file
  4. Run the remaining cells in order

Expected Outputs

Output Description
Physicochemical properties table Sequence length, molecular weight, amino acid composition, and isoelectric point (pI) per protein
Pairwise similarity table Alignment score for every protein pair in the input set
Dendrogram (physicochemical) Hierarchical clustering of proteins based on physicochemical features
Dendrogram (sequence similarity) Hierarchical clustering based on pairwise alignment scores
Combined dendrogram Clustering using both physicochemical and similarity data together

This site uses Just the Docs, a documentation theme for Jekyll.