Introduction to the tools
Pankegg is composed of two main tools:
- Pankegg Make DB, the data parser and SQL database creator (Pankegg_make_db.py), which processes a CSV file where each line lists the required input files for a single sample. It compiles the information from these files into a structured SQL database, making downstream exploration fast and efficient, and
- Pankegg APP, the web server for interactive data exploration (Pankegg_app.py), which uses the database generated by the parser and provides an interactive browser-based interface to explore your results.
The web interface is divided into two main categories:
- Navigation and Search: Effortlessly browse and filter all available data to find exactly what you need.
- Features: Visualize data, compare bins or samples, and generate insightful plots.
Feature Pages
Pankegg provides multiple interactive features:
- Sample vs Sample: Heatmap of selected pathways, scatter plots for bin quality, PCA of bins, tables/plots showing KEGGs unique or shared between samples/pathways.
- Bin vs Bin: Plot and table of KEGGs shared or unique between two bins for metabolic pathways.
- Taxonomic Comparison: For a given rank, see sample-wise composition plots (abundance = number of bins classified as taxon / total bins), and pathway-vs-taxa heatmaps.
- PCA: Dedicated page for visualizing principal component analysis (PCA) of each sample, based on KOs, maps, or taxonomy.
Note:
In this documentation, the terms “map” and “pathway” are sometimes used interchangeably. Typically, “map” refers to the KEGG database’s map ID (e.g.,map00010
), while “pathway” refers to the biological pathway name. Although the KEGG database provides both a pathway ID and a map ID, this tool focuses on the map ID, which is generally more complete and reliable for referencing metabolic pathways.