Examples
Example Analysis of ALK Protein Sequences in Mus musculus
The input file ALK_mice.faa contains amino acid sequences for ALK (Anaplastic Lymphoma Kinase) proteins identified in Mus musculus (house mouse). Each sequence is labeled with a unique identifier. When you upload this sequence set to the pipeline, protein property analysis will be reported per sequence.
The output file protein_analysis_results-2.zip contains the full results of running the pipeline on this input, including physicochemical property tables, a pairwise sequence similarity table, and clustering visualizations.
Typical Workflow
- Open the notebook in Google Colab
- Run the setup cells
- Upload
ALK_mice.faawhen prompted for a FASTA file - Run the remaining cells in order
Expected Outputs
| Output | Description |
|---|---|
| Physicochemical properties table | Sequence length, molecular weight, amino acid composition, and isoelectric point (pI) per protein |
| Pairwise similarity table | Alignment score for every protein pair in the input set |
| Dendrogram (physicochemical) | Hierarchical clustering of proteins based on physicochemical features |
| Dendrogram (sequence similarity) | Hierarchical clustering based on pairwise alignment scores |
| Combined dendrogram | Clustering using both physicochemical and similarity data together |