v1.0.0: VGAE applied to GM12878 vs IMR90 chr21 Hi-C at 25kb

Full reproducible pipeline: .mcool + ChIP-seq bigwigs → latent
  embeddings → A/B compartment calls → cross-cell comparison.

  Key results (chr21, 25 kb, latent dim=32):
  - Test AUC=0.777, AP=0.759 (converged epoch 31/300)
  - GM12878 A/B silhouette (cosine) = 0.775
  - IMR90 zero-shot silhouette = 0.443
  - A-compartment bins stable across cell types (mean cosine Δ=0.042)
  - B-compartment bins shift substantially (mean cosine Δ=0.451)
  - 101 B→A and 70 A→B compartment switches GM12878→IMR90
This commit is contained in:
2026-05-15 01:53:04 +02:00
parent 6c91af655d
commit acadbd780c
27 changed files with 6764 additions and 201 deletions

48
.gitignore vendored
View File

@@ -1,30 +1,30 @@
# Raw sequencing and contact data (large files; download via run_pipeline.sh)
data/raw/
# Python
__pycache__/
*.pyc
*.py[cod]
*.pyo
.pytest_cache/
*.egg-info/
dist/
build/
# Conda / mamba envs
.env/
*.yml.lock
# Environments
.env
*.env
.venv/
# Data
*.hic
*.mcool
*.cool
*.bam
*.bw
*.bigwig
*.bed
*.pairs*
*.pt
*.npy
*.csv
*.png
# Editors
.vscode/
.idea/
*.swp
*~
# Jupyter and logs
*.ipynb_checkpoints/
*.log
# Jupyter
.ipynb_checkpoints/
*.ipynb
# OS
.DS_Store
# Results / temp
results/
data/
Thumbs.db