Ongoing Projects

Data-Driven Customer Segmentation

Understanding customer behavior and preferences is one of the most important challenges for any firm, and involves problems like customer segmentation, modeling and predicting consumer choices as well as determining the impact of assortments and prices on consumer demand. Many of these problems have become increasingly important in the current age of Big Data, where firms have the ability to collect granular individual-level data at various points of interaction with the customer, both online and offline. Availability of such granular data allows a firm to greatly benefit from micro-segmentation of customers, leading to micro-targeting in the form of personalized product offerings, promotions, and recommendations. The existing approaches in the marketing and operations literature are incapable of scaling to such large amounts of data.Our goal is to propose alternative techniques for solving these problems that address some of the existing limitations: (1) provable guarantees on the quality of the obtained solution (2) robust to modeling assumptions and (3) fast and scalable to large amounts of data. Some of our techniques are inspired by recent work in the machine learning literature on large-scale and sparse optimization.

Event Analytics from News Data

The goal is to understand important socio-economic indicators based on news and other data available on the Web. This involves using text mining techniques to extract events from online news articles and learning networks of real world events and show how such event networks can be used in the prediction of real-world social phenomena like drought, price variations and disease outbreaks. Our longer term vision is to build an automated analytics engine to learn and infer socio-economic phenomena from highly diverse and noisy data sources from the Web.

Crowdsourced Learning

The conventional education ecosystem in developing regions is plagued by the lack of good quality textbooks and educational resources, lack of skilled teachers and high variability across student skill and motivational levels. This paper makes the case for establishing a crowdsourced learning ecosystem that leverages the collective intelligence of educators around the world to design a collaborative platform [Arias et al. 2000] to easily share, search, organize, rate and present educational materials for teachers and students around the world. In particular, we make two important contributions: (a) Modeling learning outcomes in crowdsourced learning: We propose a mathematical framework that enables systematic modeling and comparison amongst different education paradigms. Our framework provides a means to quantify student learning under a given paradigm based on critical factors such as the student skill (or ability), quality of the reading material (or the teacher), etc. (b) Crowdsourced Learning Platform: We describe the design of YeSua, an initial prototype of our crowdsourced learning platform that uses an inquiry-based framework for generating annotated lesson plans for different subjects.

Sentiment Analysis on Financial Articles

Here we try to analyze the effects of FOMC communications, like their meeting minutes, statements, press conferences etc on interest rates.


Older Projects

Reputations in Crowdsourcing

The growing popularity of online crowdsourcing services like Amazon Mechanical Turk, CrowdFlower and citizen science projects like FoldIt, Galaxy Zoo have made it very easy to leverage the power of the masses to tackle complex tasks that cannot be solved by automated algorithms and systems alone. However, these applications are vulnerable to noisy responses introduced either unintentionally by unreliable workers or intentionally by spammers and malicious workers. Therefore it is important to identify the underlying quality of any worker so that we can determine which workers' responses we can trust. We propose a novel reputation system for crowd workers based only on the responses submitted by the workers. Intuitively, a worker has high reputation if she performs the tasks honestly as opposed to spamming or providing malicious labels. For instance, it has been shown that a lot of workers complete the tasks in a short period of time by providing uniform or random labels, just to earn the payment associated with completing the task and hence, should be assigned a low reputation. Our goal is to design efficient algorithms for computing worker reputations in crowdsourcing systems that are robust to manipulation and able to detect dishonest workers.

Satellite Image Analysis to Detect Changing Land Patterns

Changing patterns in agricultural land availability is one of the fundamental problems that impacts food security in developing regions like India. We implemented a tool that can analyze satellite images of a region to compute temporal changes in land patterns for categories like arable, developed, water bodies etc.

Traffic Congestion Detection

Our goal is to design mechanisms to detect the state of traffic congestion in and around critical congestion areas and also design simple preventive mechanisms to prevent critical congestion areas from hitting congestion collapse.

Feasibility of Web-based Educational Lesson Plans

The Web has a wealth of educational information across different topics, which can potentially be used to improve teaching. In an attempt to harness this potential we developed a Contextual Information Portal(CIP) for education and deployed it in schools in Kenya and India. The portal crawls the Web and tries to collect documents for different subjects and creates a repository of relevant information.