ML + Safety

Problem Motivation

Modern intelligent systems operate where context, interpretation, and alignment determine their utility and trustworthiness. From enforcing privacy norms in enterprise communication, to aligning recommender behavior with expert rules, to tailoring policies for diverse sub-populations, and finally to probing how large language models absorb rules purely in context, our work asks a common question: how can systems internalize and act on information responsibly?

We began by rethinking enterprise privacy from the vantage of contextual integrity, separating what is shared from why it is shared. We then introduced domain-specific concordance to ensure neural recommenders obey expert-defined categorical rules—bridging data-driven learning with domain knowledge. For development policy, we created outcome-aware clustering to derive targeted interventions that respect population heterogeneity. At the algorithmic core, Contra advanced controlled variable selection with stronger false-discovery control under misspecification. Finally, we studied in-context alignment at scale, revealing how LLMs degrade as in-prompt behaviors proliferate and highlighting robust evaluation and prompt design needs.

Results and Contributions

Contextual Integrity for Privacy Enforcement Paper: VACCINE: Using Contextual Integrity for Data Leakage Detection (WWW 2019)
- CI-based formalism decoupling flow extraction from policy enforcement in DLP systems.
- VACCINE: NLP-driven flow extraction + declarative policies compiled to operational SQL rules.
- Supports temporal/consent-aware rules; improved precision and expressivity on Enron email corpus.

Domain-Specific Concordance in Recommender Systems Paper: Enhancing Neural Recommender Models through Domain-Specific Concordance (WSDM 2021)
- Regularization framework to make recommenders obey expert-defined category mappings under within-category perturbations.
- Improved category-robustness distance (≈ 101–126%) and accuracy (up to ≈ 12%) on MovieLens, Last.fm, and MIMIC-III.
- Bridges embeddings with human-understandable rules for robust generalization.

Outcome-Aware Clustering for Targeted Policy Design Paper: Targeted Policy Recommendations using Outcome-aware Clustering (ACM COMPASS 2022)
- Partially supervised segmentation that ties feature selection and distance to an outcome of interest.
- Balanced, near-homogeneous clusters enable cluster-level policy recommendations rather than one-size-fits-all.
- Applied to LSMS-ISA data for sub-Saharan Africa; surfaced policy heterogeneity invisible at population level.

Contrarian Randomization Tests (Contra) for Controlled Discovery Paper: Contra: Contrarian Statistics for Controlled Variable Selection (AISTATS 2021)
- Mixture of two “contrarian” models (true vs. null-swapped) yields stronger FDR control when covariate models are misspecified.
- Maintains asymptotic power 1; more reliable p-values than calibrated HRTs; scalable to high dimensions/large n.
- Demonstrated on synthetic and genetic datasets with improved rigor and efficiency.

In-Context Alignment at Scale (When More is Less) Paper: In-Context Alignment at Scale: When More is Less (ICML 2025 Workshop)
- Systematic study of LLMs’ ability to adopt novel rules/facts purely via prompts as the number of behaviors increases.
- Findings: accuracy degrades (often up to ~50%) with more in-context rules; depth/position effects; emergence of “cheating” via superficial cues.
- Introduces synthetic rule-following setups and NewNews to evaluate belief updating and instruction fidelity.

Members

Neelabh Madan, NYU.
Ananth Balashankar, NYU.
Mukund Sudarshan, NYU.

Publications

Yan Shvartzshnaider, Zvonimir Pavlinovic, Ananth Balashankar, Thomas Wies, Lakshminarayanan Subramanian, Helen Nissenbaum, Prateek Mittal. "VACCINE: Using Contextual Integrity for Data Leakage Detection." World Wide Web Conference (WWW), 2019. (PDF)
Ananth Balashankar, Alex Beutel, Lakshminarayanan Subramanian "Enhancing Neural Recommender Models through Domain-Specific Concordance." WSDM, 2021. (Link)
Ananth Balashankar, Samuel Fraiberger, Eric M. Deregt, Marelize Görgens, Lakshminarayanan Subramanian "Targeted Policy Recommendations using Outcome-aware Clustering." ACM COMPASS, 2022. (Link)
Mukund Sudarshan, Aahlad Puli, Lakshminarayanan Subramanian, Sriram Sankararaman, Rajesh Ranganath. "Contra: Contrarian Statistics for Controlled Variable Selection." AISTATS, 2021. (Link)
Neelabh Madan, Lakshminarayanan Subramanian. "In-Context Alignment at Scale: When More is Less." ICML Workshop on Models of Human Feedback for AI Alignment, 2025. (Open Review)

Machine Learning + Safety

Problem Motivation

Results and Contributions

Contextual Integrity for Privacy Enforcement Paper: VACCINE: Using Contextual Integrity for Data Leakage Detection (WWW 2019)

Domain-Specific Concordance in Recommender Systems Paper: Enhancing Neural Recommender Models through Domain-Specific Concordance (WSDM 2021)

Outcome-Aware Clustering for Targeted Policy Design Paper: Targeted Policy Recommendations using Outcome-aware Clustering (ACM COMPASS 2022)

Contrarian Randomization Tests (Contra) for Controlled Discovery Paper: Contra: Contrarian Statistics for Controlled Variable Selection (AISTATS 2021)

In-Context Alignment at Scale (When More is Less) Paper: In-Context Alignment at Scale: When More is Less (ICML 2025 Workshop)

Members

Publications