Machine Learning + Safety

Open Networks and Big Data Lab

Research Focus

Building systems that can reason about information flow and causality through formal verification, from privacy policies to software systems.

Our research develops formal methods and machine learning techniques to understand and verify how information flows through complex systems. We focus on bridging the gap between human-interpretable policies and machine-verifiable specifications, preserving ambiguity where necessary while enabling automated reasoning where possible. This work spans from analyzing legal privacy documents to ensuring AI systems follow specified constraints.

Research Contributions

The Privacy Quagmire: Where Computer Scientists and Lawyers May Disagree

HotNets '25

Privacy policies dictate how systems handle user data, yet engineers struggle to verify compliance because policies use intentionally vague legal language. Current automated analyzers extract data practices using NLP but fail when policies say things like "share data for legitimate purposes" - terms that have no computational definition. This mismatch between legal flexibility and formal verification creates a fundamental barrier to automated compliance checking.

We identify four systematic challenges: vague terms, evolving terminology, exception patterns that appear contradictory, and external legal dependencies. We propose an approach that preserves this ambiguity, where we use LLMs to extract structured parameters and convert them to first-order logic while keeping vague conditions as explicit placeholders for human interpretation.

In-Context Alignment at Scale: When More is Less

ICML 2025 Workshop

Systematic study of LLMs' ability to adopt novel rules and facts purely via prompts as the number of behaviors increases. We find that accuracy degrades (often up to ~50%) with more in-context rules, and models exhibit emergence of "cheating" via superficial cues rather than true rule following. Introduces synthetic rule-following setups and NewNews dataset to evaluate belief updating and instruction fidelity.

Learning Conditional Granger Causal Temporal Networks

CLeaR 2023

A framework for learning causal temporal networks that captures how causal relationships evolve over time. The method learns conditional Granger causality patterns from temporal data, enabling the discovery of dynamic causal structures that change based on context or time periods. This work bridges temporal causality analysis with practical applications in understanding evolving systems.

Targeted Policy Recommendations using Outcome-aware Clustering

ACM COMPASS 2022

Partially supervised segmentation that ties feature selection and distance to an outcome of interest. Balanced, near-homogeneous clusters enable cluster-level policy recommendations rather than one-size-fits-all. Applied to LSMS-ISA data for sub-Saharan Africa; surfaced policy heterogeneity invisible at population level.

Enhancing Neural Recommender Models through Domain-Specific Concordance

WSDM 2021

Regularization framework to make recommenders obey expert-defined category mappings under within-category perturbations. Improved category-robustness distance (~101-126%) and accuracy (up to ~12%) on MovieLens, Last.fm, and MIMIC-III. Bridges embeddings with human-understandable rules for robust generalization.

VACCINE: Using Contextual Integrity for Data Leakage Detection

WWW 2019

CI-based formalism decoupling flow extraction from policy enforcement in DLP systems. NLP-driven flow extraction + declarative policies compiled to operational SQL rules. Supports temporal/consent-aware rules; improved precision and expressivity on Enron email corpus.

Publications

  1. "The Privacy Quagmire: Where Computer Scientists and Lawyers May Disagree."
    Yunwei Zhao, Varun Chandrasekaran, Thomas Wies, Lakshminarayanan Subramanian.
    HotNets '25, November 17-18, 2025, College Park, MD, USA.
  2. "In-Context Alignment at Scale: When More is Less."
    Neelabh Madan, Lakshminarayanan Subramanian.
    ICML Workshop on Models of Human Feedback for AI Alignment, 2025.
    OpenReview
  3. "Learning Conditional Granger Causal Temporal Networks."
    Ananth Balashankar, Srikanth Jagabathula, Lakshminarayanan Subramanian.
    Causal Learning and Reasoning Conference (CLeaR), 2023.
    PDF
  4. "Targeted Policy Recommendations using Outcome-aware Clustering."
    Ananth Balashankar, Samuel Fraiberger, Eric M. Deregt, Marelize Görgens, Lakshminarayanan Subramanian.
    ACM COMPASS, 2022.
    ACM DL
  5. "Enhancing Neural Recommender Models through Domain-Specific Concordance."
    Ananth Balashankar, Alex Beutel, Lakshminarayanan Subramanian.
    Web Search and Data Mining (WSDM), 2021.
    ACM DL
  6. "VACCINE: Using Contextual Integrity for Data Leakage Detection."
    Yan Shvartzshnaider, Zvonimir Pavlinovic, Ananth Balashankar, Thomas Wies, Lakshminarayanan Subramanian, Helen Nissenbaum, Prateek Mittal.
    World Wide Web Conference (WWW), 2019.
    PDF

Research Team

Current Members
  • Yunwei Zhao - PhD Student, Courant Institute, NYU
  • Neelabh Madan - PhD Student, Courant Institute, NYU
  • Lakshminarayanan Subramanian - Professor, Courant Institute, NYU
Collaborators
  • Thomas Wies - Professor, Courant Institute, NYU
  • Varun Chandrasekaran - University of Illinois Urbana-Champaign
  • Ananth Balashankar - Researcher
  • Mukund Sudarshan - Researcher
  • Helen Nissenbaum - Cornell Tech
  • Srikanth Jagabathula - NYU Stern

Software & Resources