Introduction to the tools

Pankegg is composed of two main tools:

Pankegg Make DB, the data parser and SQL database creator (Pankegg_make_db.py), which processes a CSV file where each line lists the required input files for a single sample. It compiles the information from these files into a structured SQL database, making downstream exploration fast and efficient, and
Pankegg APP, the web server for interactive data exploration (Pankegg_app.py), which uses the database generated by the parser and provides an interactive browser-based interface to explore your results.

The web interface is divided into two main categories:

Navigation and Search: Effortlessly browse and filter all available data to find exactly what you need.
Features: Visualize data, compare bins or samples, and generate insightful plots.

Navigation and Search Pages

Navigation and Search Pages
Page Name	Usage	Possible Filters
Bins	Visualize all bins for each sample, review CheckM2 quality and classifications. Generate quality plots, see maps/KEGGs for bins.	Sample Name, Bin Name, Taxonomy, Maps, KEGGs
Maps	Visualize all pathways detected in your data, check pathway “completeness”, highlight orthologs present per pathway.	Sample Name, Bin Name, Taxonomy, Map (ID/Name), KEGGs
KEGGs	List all KEGGs present globally or in a specific bin; find patterns in ortholog/bin names.	Ortholog Name Pattern, Bin Name (via Bins page)
Taxonomy	View tables for each taxonomic rank, listing taxa found and their abundance.	Taxonomic Rank

Feature Pages

Pankegg provides multiple interactive features:

Sample vs Sample: Heatmap of selected pathways, scatter plots for bin quality, PCA of bins, tables/plots showing KEGGs unique or shared between samples/pathways.
Bin vs Bin: Plot and table of KEGGs shared or unique between two bins for metabolic pathways.
Taxonomic Comparison: For a given rank, see sample-wise composition plots (abundance = number of bins classified as taxon / total bins), and pathway-vs-taxa heatmaps.
PCA: Dedicated page for visualizing principal component analysis (PCA) of each sample, based on KOs, maps, or taxonomy.

Note:
In this documentation, the terms “map” and “pathway” are sometimes used interchangeably. Typically, “map” refers to the KEGG database’s map ID (e.g., map00010), while “pathway” refers to the biological pathway name. Although the KEGG database provides both a pathway ID and a map ID, this tool focuses on the map ID, which is generally more complete and reliable for referencing metabolic pathways.