Skip to content

cdr_bench

Benchmarking framework for dimensionality reduction techniques on chemical datasets.

License: MIT Python 3.11 DOI


cdr_bench is a benchmarking framework for evaluating and comparing dimensionality reduction (DR) methods on chemical datasets. It implements a systematic pipeline for optimizing hyperparameters, computing quality metrics, and visualizing results across multiple DR techniques and molecular descriptor types.

This project accompanies the publication:

Orlov, A. A., Akhmetshin, T. N., Horvath, D., Marcou, G., & Varnek, A. "From High Dimensions to Human Insight: Exploring Dimensionality Reduction for Chemical Space Visualization." Molecular Informatics, 2024, 44(1). DOI: 10.1002/minf.202400265

Supported Methods

Method Library Description
PCA scikit-learn Principal Component Analysis
UMAP umap-learn Uniform Manifold Approximation and Projection
t-SNE openTSNE t-distributed Stochastic Neighbor Embedding
GTM ChemographyKit Generative Topographic Mapping

Supported Descriptors

  • Morgan fingerprints (count-based, configurable radius and size) via RDKit
  • MACCS keys (167-bit structural keys) via RDKit
  • ChemDist embeddings (graph neural network learned representations) via DGL-Life

Quality Metrics

  • Nearest-neighbor overlap (PNN)
  • Co-ranking matrix analysis (QNN, LCMC, Qlocal, Qglobal)
  • Trustworthiness and continuity
  • Distance correlation and residual variance

Quick Start

# Install
git clone https://github.com/AxelRolov/cdr_bench.git
cd cdr_bench
uv sync

# Run benchmarking
python scripts/run_benchmarking.py --config bench_configs/run_benchmarking.toml

# Analyze results
python scripts/analyze_results.py --input_dir results/ --output_dir results/ --k_hit 20

See the Installation and Quickstart guides for details.

Citation

If you use this code, please cite:

@article{orlov2024high,
  title={From High Dimensions to Human Insight: Exploring Dimensionality Reduction for Chemical Space Visualization},
  author={Orlov, Alexey A. and Akhmetshin, Tagir N. and Horvath, Dragos and Marcou, Gilles and Varnek, Alexandre},
  journal={Molecular Informatics},
  volume={44},
  number={1},
  pages={e202400265},
  year={2024},
  doi={10.1002/minf.202400265}
}

Datasets are available on Zenodo.