BlueDot Impact Scholar · Seeking Fully-Funded PhD 2026

Partha Pratim Saha

AI Researcher  ·  Mechanistic Interpretability  ·  Geometric Deep Learning  ·  AI Safety & Alignment

I build differential-geometric and information-geometric frameworks to understand how LLMs encode, propagate, and transform beliefs across layers — and how alignment, fine-tuning, and cultural model merging alter that internal geometry. Published at NeurIPS 2025 and SpringerNature ICOMP’24.

★ NeurIPS 2025 SpringerNature BITS Pilani · 9.08 GPA · Top 5% NeurIPS 2025 Reviewer AWS Scholar BlueDot Impact Scholar
Partha Pratim Saha Field work
Open Research
GitHub Activity
GitHub Stats Top Languages
Contribution Graph

github.com/pps121  ·  torsional-belief-vector-field  ·  pps121.github.io

01 · About
About Me

I am an AI researcher and lecturer with over 12 years of combined experience in industry (Infosys, BirlaSoft/J&J, Wipro, IIT Kanpur) and academia (BITS Pilani M.Tech — 9.08 GPA, top 5%), now focused full-time on foundational research in mechanistic interpretability, AI alignment, and geometric deep learning.

My research programme develops information-geometric frameworks for tracing how LLMs encode, propagate, and transform beliefs across layers — and how alignment, fine-tuning, cultural model merging, and distillation alter that internal geometry. Three active research threads: (1) AI Safety & Mechanistic Interpretability via Torsional Belief Vector Fields; (2) Cross-cultural & Multilingual AI via Semantic Helix / nDNA; and (3) Privacy-Preserving Federated Learning for healthcare (published SpringerNature ICOMP’24).

My long-term vision: AI systems aligned not only behaviourally but geometrically — with transparent, auditable internal representations that can be verified, not just tested. I am particularly motivated by applications where responsible, interpretable AI can have measurable global impact: global health, diagnostic AI, equitable NLP, and human-centred AI systems for underserved communities.

I currently serve as Lecturer in Computer Science at Nalhati Government Polytechnic College (West Bengal, India), where I teach ML, Deep Learning, IoT, and Python, and supervise 50+ students on AI/NLP research projects. I have been selected as a NeurIPS 2025 Reviewer, AWS AI & ML Scholar, and received full scholarship to multiple premier summer schools (Duke ML, Cohere, Armenian LLM, University of Chicago DSI).

I am deeply concerned about the risks from advanced AI systems — particularly power-seeking behaviour, concentration of control, and the failure modes that emerge when capable AI systems pursue proxy objectives misaligned with human values. These are not abstract worries: they shape every architectural choice I make in my research. My goal is not to merely describe alignment failures but to build the mathematical tools that let us detect them inside the model — before they manifest as harmful outputs.

“AI systems must be aligned not only behaviourally but geometrically — with transparent, auditable internal representations that can be verified, not just tested.”
Education
M.Tech — Data Science & Engineering
GPA 9.08/10 · Distinction · Top 5% (2019–2021)
B.Tech — Computer Science & Engineering
WBUT, Kolkata, India
GPA 8.49/10 · Top 5% (2006–2010)
Research Interests
Mechanistic Interpretability · Geometric ML · AI Safety & Alignment · AI Deception Detection · Misalignment Monitoring · Cross-cultural LLMs · Privacy-Preserving FL · LLM Reasoning
Current Position
Lecturer in CS
Nalhati Govt. Polytechnic College
West Bengal, India · Dec 2021–Present
Links
02 · Skills & Expertise
Technical Expertise
NLP & Foundation Models
LLM Alignment & SafetyMechanistic Interpretability AI Deception DetectionMisalignment Monitoring Belief & Knowledge EditingAI Agents XAI / Explainable AIFine-tuning (HuggingFace) Hybrid & Multi-hop RAGConversational AI
Deep Learning & ML
PyTorchTensorFlow / Keras Transformers (GPT, Llama, Mistral, Qwen, Gemma, DeepSeek) CNN, RNN, Autoregressive Models Regression / Classification / Clustering Federated Learning
Geometric & Mathematical ML
Riemannian GeometryFisher-Rao Metric Cartan TorsionPersistent Homology Information GeometryDTW Analysis Spectral MethodsFrenet-Serret Framework
Frameworks & Tools
PythonLangChainLlamaIndex Scikit-learnNumpy / Pandas / NLTK / SpaCy Docker / KubernetesAzure / AWS / IBM Cloud PostgreSQL / MySQL
Domain Knowledge
AI Safety & Responsible AIGlobal Health AI Cancer Genomics & BioinformaticsHealthcare NLP Education TechnologyFinance & Banking Cross-cultural & Multilingual AI
Research & Communication
NeurIPS / ICML / SpringerNature Publication Academic Mentorship (50+ students) Workshop & Seminar Facilitation LaTeX & Technical Writing Cross-functional Team Leadership
Technical AI Safety
BlueDot Impact — AGI Strategy BlueDot Impact — Technical AI Safety Power-seeking & Goal Misgeneralisation Mechanistic Interpretability (Circuits) Geometric Alignment Probing RLHF / DPO Safety Analysis Catastrophic Risk Evaluation Representation Engineering
03 · Research
Research Projects
AI Safety Research Programme — Path to Impact
Core Research Focus

My core concern is straightforward but urgent: as AI systems grow more capable, misalignment between their internal objectives and human values becomes catastrophically consequential — not merely inconvenient. I am particularly worried about power-seeking behaviour in strategically capable agents, concentration of control in military and governance domains, and the fundamental challenge that behavioural safety does not imply geometric safety. A model can pass all safety evaluations while encoding deeply misaligned belief structures internally.

Geometric Torsion Framework — measuring alignment not as behaviour but as geometry:
Torsion norm T1_ℓ = ‖S_ℓ‖F  ·  H1 amplification 1500× for normative concepts  ·  Thermodynamic gap 10× (normative vs factual)  ·  Entropy–torsion bridge ρ = −0.387 (Mistral, p=5.43×10−30)

How I plan to create impact: (1) Develop post-hoc geometric probes that are compute-efficient enough for deployment-time monitoring; (2) Collaborate with world-leading AI safety labs (Anthropic, Redwood, ARC Evals, Mech Interp groups at Oxford/Cambridge/MIT/CMU) to validate geometric torsion findings against circuit-level mechanistic analysis; (3) Pursue a fully-funded PhD at a programme with strong AI safety infrastructure to work on scalable interpretability tools deployable beyond 7B-parameter models. I am actively applying for research fellowships, internships, and PhD positions starting 2026.

8
Geometric torsion metrics (8-scale)
3×2
IT/PA model pairs tested
20,439
LITMUS benchmark prompts
BlueDot Impact — AGI Strategy BlueDot Impact — Technical AI Safety Mechanistic InterpretabilityGeometric Alignment Power-seeking AnalysisCatastrophic Risk
Torsional Belief Vector Fields (TBVF) — AI Safety & Mechanistic Interpretability

Models transformer hidden-state trajectories as discrete curves on a Riemannian belief manifold (ℳ, gF) with the Fisher-Rao metric. The torsion tensor S = (M − MT) / 2 captures rotational mismatch between consecutive belief updates — a non-commutative geometric signature invisible to attention patterns or activation magnitudes. Key discovery: DPO alignment creates geometrically localised “brake layers” that systematically suppress torsion, probed across OLMo-7B, Mistral-7B, and Zephyr-7B using 500 unsafe prompts (Litmus) and 17 geometric metrics.

Peak result — Layer 27, Mistral-7B (Bonferroni-corrected, n=500):
DPO torsion suppression: 44.4%  |  Cohen’s d = 0.741  |  p = 7.7 × 10¹³
DTW–Torsion theorem: DC(w) ≥ 0.875 · |Σ||SIT||F − Σ||SPA||F|
17
Geometric metrics developed
3×2
SFT/DPO model pairs
500
Unsafe prompts (Litmus)
20
Concepts (DTW analysis)
31
Layer-level t-tests
16pp
Full paper + appendix
Torsion TensorFisher-Rao MetricFrenet-Serret Holonomy DefectSpectral TorsionDTW DPOOLMo-7BMistral-7BZephyr-7B
-->
Semantic Helix of LLMs (nDNA) — Cross-cultural & Multilingual AI
Active

Unifies fine-tuning, alignment, distillation, and merging as measurable deformations of the same depth-wise semantic flow via spectral curvature κ and thermodynamic length ℒ (epistemic effort across layers). Investigates epistemic inheritance in merged LLMs using Fisher-Rao geometry, producing neural offspring; emergent cultural nDNA measured via spectral curvature deviation Δκ and thermodynamic length divergence Δℒ. Cultures studied: African, Latin American, South Asian, East Asian, Arabic, Indigenous, European, Pacific Islander.

Spectral CurvatureThermodynamic Length Model MergingCultural AI Llama3-instructDeepSeek-R1Qwen
Privacy-Preserving Federated Learning for Healthcare (CFL)
Published · SpringerNature ICOMP’24

Collaborative Federated Learning (CFL) cloud-based system separates datasets into public and private sets based on the removal of PHI/PII, enabling personalised GPT-like systems without centralising sensitive data. Enables AI for healthcare at scale while preserving patient privacy — a critical requirement for responsible, equitable AI deployment globally.

Federated LearningPrivacy-Preserving AI Healthcare AIPHI/PII SeparationSpringerNature
Neural Robustness Learning in Dense Transformers (Preliminary)
In Progress

Investigates whether formal robustness certificates for LLMs can be derived from data contamination, label noise, and adversarial attacks rather than input-perturbation bounds. DPO-aligned models show compressed Fisher-norm spectra versus SFT counterparts, suggesting alignment induces geometric contraction doubling as a robustness mechanism. Framework: Lipschitz robustness bounds guaranteeing up to 40% data contamination holds almost natural robustness. Models: ViT, MedSigLIP, CLIP.

Robustness CertificatesFisher-norm Spectra ViTMedSigLIPCLIP
04 · Publications
Papers & Publications
2025
★ NeurIPS 2025 Workshop
Prompting Away Stereotypes? Evaluating Bias in Text-to-Image Models for Occupations
Shaina Raza, Maximus Powers, Partha Pratim Saha, Elham Dolatabadi, Usman Naseem
NeurIPS 2025 Workshop on Algorithmic Fairness · Empirical bias audit of DALL·E, Midjourney, Stable Diffusion across occupational stereotypes
2024
SpringerNature ICOMP'24
Collaborative Federated Learning Cloud Based System for Privacy-Preserving Healthcare AI
Partha Pratim Saha
First-author · Privacy-preserving federated learning system separating PHI/PII for GPT-style personalised healthcare without centralising sensitive data
Under Review & In Preparation
2026
Preprint · Under Review
GRAFT: Geometric Representations of Alignment's Fingerprint in Transformer Belief Trajectories
Partha Pratim Saha
Preprint · Under review · T2 torsion is 8× more concept-discriminative than CKA (AUC 0.89); three pre-registered hypotheses confirmed on LITMUS (20,439 prompts)
2026
Under Review · NeurIPS 2026
MENTIS: What Belief Changes Under Alignment? Multi-Scale Latent Torsion in Language Models
Partha Pratim Saha, Samarth Raina, Mayur Parvatikar, Amit Dhanda, Vinija Jain, Aman Chadha, Amitava Das
Under review (NeurIPS 2026) · 8 new torsion metrics, full LITMUS benchmark study, DPO suppression 44.4% (Cohen’s d=0.741, p=7.7×10¹³), entropy–torsion bridge ρ=−0.387
2025
Preprint
Scaling-law and Preference Integration in Neural Alignment Layers (SPINAL)
Arion Das, Partha Pratim Saha, Aman Chadha, Vinija Jain, Amitava Das
Contribution: model experiments, paper writing
2025
Journal Paper
Enhancing Human Empathy in Conversations Using Transformer-Based Models
Cherishma Kumar Subhasa, Endriyas Zenagebriel, Partha Pratim Saha, Zarah Rezaei, Joseph Akinyemi
Sciencematch · Impact Scholar Program 2025 · Top contributor; provided all technical ideas and process-pipeline
2024
SpringerNature
Collaborative Federated Learning Cloud Based System
Partha Pratim Saha, Naresh K. Sehgal, Miad Faezipour
International Conference on Internet Computing & IoT (ICOMP’24) · Computer Engineering & Applied Computing (CSCE), USA
05 · Experience
Work Experience
Teaching Assistant (M.Tech Programme)
BITS Pilani, India
2021 – 2023
Teaching Assistant for three graduate courses: NLP Applications [Winter 2023], Deep Learning [Fall 2021], and Deep Reinforcement Learning [Spring 2021]. Conducted tutorials, graded assignments, and mentored students. Honorarium: USD $2,513.11 across all three courses.
Lecturer in Computer Science
WBSCTED — Nalhati Government Polytechnic College, West Bengal, India
Dec 2021 – Present
Teaching ML, Deep Learning, IoT, Python, and Java. Project supervisor for 50+ final-year students in AI, NLP, Agentic AI, and Empathetic Chatbot development. Head of Department (CS) responsibilities. Four active research projects on AI Safety, nDNA/Semantic Helix, Cultural LLMs, and Neural Robustness.
Lead Data Scientist — Conversational Dialog System
Wipro Limited, Bangalore, India
Sept 2021 – Nov 2021
Developed a conversational chatbot system removing query ambiguities. Led a team of 5; implemented 50+ custom intents and dialog flows with IBM Watson. Impact: 0.3 million users worldwide.
Senior Data Scientist — Medical Search Engine (J&J R&D)
BirlaSoft · Johnson & Johnson R&D, New Delhi, India
Dec 2017 – Aug 2019
Built medical search engine using SciBERT and SpaCy NLP pipeline. Impact: over 0.1 million J&J product users. Tools: Python, Word2Vec, SciSpacy, Fuzzy Search, Flask.
Project Engineer — Threat Intelligence System
Indian Institute of Technology (IIT) Kanpur, India
Nov 2016 – Jul 2017
Developed secure threat management system for academic institutions. Researched cyber-security measures against integrity, confidentiality, and non-repudiation attacks. Tools: Python, Drupal, Django.
Senior Systems Engineer — Alignment & Cancer Genomics in AI
Infosys Technologies Limited, Chennai, India
Jan 2011 – Jul 2015
Applied Edit Distance and Needleman-Wunsch algorithms on DNA sequences (FASTQ) to identify minimum insertions/deletions. Identified top 10 genes driving Multiple Myeloma blood cancer; implemented 3 research papers. Impact: biological hierarchy determination for new species, drug design, life expectancy improvements. Tools: Python, Word2Vec, Numpy, Pandas, Dynamic Programming.
06 · Recognition
Awards & Achievements
🛡️
BlueDot Impact Scholar — Selected for both AGI Strategy and Technical AI Safety courses (2025–2026). Rigorous training in catastrophic risk, power-seeking, and technical safety evaluation.
🔬
LASR Labs — Progressed through initial selection rounds of the LASR (Learning from AI Safety Research) Labs programme for mechanistic interpretability research.
🖥️
5x Google Colab Pro A100/H100 GPU (300 units each) from Neuromatch Academy for AI Safety research
📋
NeurIPS 2025 Reviewer — Selected to serve as reviewer for MTI-LLM Workshop at NeurIPS 2025
☁️
AWS AI & ML Scholar by Udacity, 2025
🎓
🌐
Duke Machine Learning Summer School 2025 & Cohere Summer School 2025 attendee
🏆
SPAR Demo Day 2025 — Accepted for AI Safety & Alignment research demonstration (Neuromatch / AI Safety cohort)
🏙️
University of Chicago DSI Summer School 2024 — AI-Science Research Program; Eric & Wendy Schmidt Postdoctoral Fellowship (Schmidt Futures)
🎓
MLx Generative AI Fellowship — Oxford ML Summer School 2024 & 2025 — competitive scholarship award for generative AI research
🌍
Athens NLP Summer School 2024 — competitive international selection for NLP & large language models
🗼
diiP Summer School 2024, Paris — Deep Learning & Interpretability in Practice; competitive selection
🧠
Neuromatch Academy — Deep Learning — competitive global selection for the intensive summer school
🗽
NYU AI Summer School 2022 — New York University; competitive selection in AI & ML
🤖
AI4 IMPACT Scholar 2021 — AI Singapore; selected practitioner programme for applied AI impact
💡
Google Developer's Program 2019 — Google Developer Expert community; competitive global selection
🎓
Udacity Bertelsmann Technology Scholarship — Google-sponsored; competitive global selection in AI & ML
07 · Contact
Get in Touch

I am actively seeking fully-funded PhD positions starting in 2026 at research universities with strong programmes in AI safety, mechanistic interpretability, geometric/theoretical ML, or responsible AI for global impact.

I am also open to research fellowships, collaborations, and discussions with faculty and research scientists working on interpretable, responsible AI — particularly in applications relevant to global health, education, or equitable systems.

If you are a professor, research scientist, or program director interested in my work, please reach out directly. I respond within 24 hours.