Copyright Statement: The copyrights for all the publications rest with the respective publishers. The versions posted here are author's versions and is meant for personal use and not for redistribution.
A link to my DBLP/DBLPvis page.
Conference & Journal Papers
Technical Reports
Talks
Conference & Journal Papers
2010
, in the 26th International Conference on Data Engineering (ICDE) 2010.
The increasing popularity of social networks has initiated a fertile research area in information extraction and data mining. Although such analysis can facilitate better understanding of sociological, behavioral, and other interesting phenomena, there is growing concern about personal privacy being breached, thereby requiring effective anonymization techniques. In this paper, we consider edge weight anonymization in social graphs. Our approach builds a linear programming (LP) model which preserves properties of the graph that are expressible as linear functions of the edge weights. Such properties form the foundations of many important graph-theoretic algorithms such as shortest paths, k-nearest neighbors, minimum spanning tree, etc. Off-the-shelf LP solvers can then be used to find solutions to the resulting model where the computed solution constitutes the weights in the anonymized graph. As a proof of concept, we choose the shortest paths problem, and experimentally evaluate the proposed techniques using real social network data sets.
Hide Abstract
2009
-
Sudipto Das, Shyam Antony, Divyakant Agrawal, Amr El Abbadi, "Thread Cooperation in Multicore Architectures for Frequency Counting Over Multiple Data Streams",[Paper], [Talk] in the 35th International Conference on Very Large Data Bases (VLDB) 2009 [PVLDB 2(1) 217-228].
Many real-world data stream analysis applications such as network
monitoring, click stream analysis, and others, require combining
multiple streams of data arriving from multiple sources. This is
referred to as multi-stream analysis. To deal with high stream
arrival rates, it is desirable that such systems be capable of supporting
very high processing throughput. The advent of multicore
processors and powerful servers driven by these processors
calls for efficient parallel designs that can effectively utilize the
parallelism of the multicores, since performance improvement is
possible only through effective parallelism. In this paper, we address
the problem of parallelizing multi-stream analysis in the context
of multicore processors. Specifically, we concentrate on parallelizing
frequent elements, top-k, and frequency counting over
multiple streams. We discuss the challenges in designing an efficient
parallel system for multi-stream processing. Our evaluation
and analysis reveals that traditional contention based locking results
in excessive overhead and wait, which in turn leads to severe
performance degradation in modern multicore architectures.
Based on our analysis, we propose a cooperation based locking
paradigm for efficient parallelization of frequency counting. The
proposed cooperation based paradigm removes waits associated
with synchronization, and allows replacing locks by much cheaper
atomic synchronization primitives. Our implementation of the proposed
paradigm to parallelize a well known frequency counting algorithm
shows the benefits of the proposed cooperation based
locking paradigm when compared to the traditional contention
based locking paradigm. In our experiments, the proposed cooperation
based design outperforms the traditional contention based
design by a factor of 2 - 5.5X for synthetic zipfian data sets.
Hide Abstract
-
Sudipto Das, Divyakant Agrawal, Amr El Abbadi, "ElasTraS: An Elastic Transactional Data Store in the Cloud",[Paper], [Talk] USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '09), in conjunction with USENIX '09.
Over the last couple of years, "Cloud Computing" or
"Elastic Computing" has emerged as a compelling and
successful paradigm for internet scale computing. One
of the major contributing factors to this success is the
elasticity of resources. But in spite of the elasticity provided
by the infrastructure and the scalable design of the
applications, the elephant (or the underlying database),
which drives most of these web-based applications, is
not very elastic and scalable, and hence limits scalability.
In this paper, we propose ElasTraS which addresses
this issue of scalability and elasticity of the data store in a
cloud computing environment to leverage from the elastic
nature of the underlying infrastructure, while providing
scalable transactional data access. This paper aims
at providing the design of a system in progress, highlighting
the major design choices, analyzing the different
guarantees provided by the system, and identifying
several important challenges for the research community
striving for computing in the cloud.
Hide Abstract
-
Sudipto Das, Shyam Antony, Divyakant Agrawal, Amr El Abbadi, "CoTS: A Scalable Framework for Parallelizing Frequency Counting over Data Streams",[Paper], [Poster] 25th IEEE International Conference on Data Engineering (ICDE 2009), pages 1323-1326.
Frequency counting, frequent elements and top-k queries form a class of operators that are used for a wide range of stream analysis applications. In spite of the abundance of these algorithms, all known techniques for answering data stream queries are sequential in nature. The imminent ubiquity of Chip Multi-Processor (CMP) architectures requires algorithms that can exploit the parallelism of such architectures. In this paper, we first evaluate different naive techniques for intra-operator parallelism, and summarize the insights obtained from the naive techniques. Our experimental analysis of the naive designs shows that intra-operator parallelism is not straightforward and requires a complete redesign of the system. We then propose an efficient and scalable framework for parallelizing frequency counting, frequent elements and top-k queries over data streams. The proposed CoTS (Co-operative Thread Scheduling) framework is based on the principle of threads co-operating rather than contending. Our experiments on a state-of-the-art quad-core chip-multiprocessor architecture and synthetic data sets demonstrate the scalability of the proposed framework, and the efficiency is demonstrated by peak processing throughput of more than $60$ million elements per second.
Hide Abstract
2008
-
Sudipto Das, Divyakant Agrawal, Amr El Abbadi, "CAM Conscious Integrated Answering of Frequent Elements and Top-k Queries over Data Streams", [Paper], [Talk] 4th International Workshop on Data Management on New Hardware, DaMoN 2008, in conjunction with SIGMOD/PODS 2008, pages 1-10.
Frequent elements and top-k queries constitute an important class of queries for data stream analysis applications. Certain applications require answers for both frequent elements and top-k queries on the same stream. In addition, the ever increasing data rates call for providing fast answers to the queries, and researchers have been looking towards exploiting specialized hardware for this purpose. Content Addressable Memory(CAM) provides an efficient way of looking up elements and hence are well suited for the class of algorithms that involve lookups. In this paper, we present a fast and efficient CAM conscious integrated solution for answering both frequent elements and top-k queries on the same stream. We call our scheme CAM conscious Space Saving with Stream Summary (CSSwSS), and it can efficiently answer continuous queries. We provide an implementation of the proposed scheme using commodity CAM chips, and the experimental evaluation demonstrates that not only does the proposed scheme outperforms existing CAM conscious techniques by an order of magnitude at query loads of about 10%, but the proposed scheme can also efficiently answer continuous queries.
Hide Abstract
-
Vinod Kone, Sudipto Das, Ben Y. Zhao, Haitao Zheng, "QUORUM: Quality of Service in Wireless Mesh Networks", [Paper] Mobile Networks and Applications Journal, Springer, Netherlands, April 2008 [MONET 12(5-6), 358-369].
Wireless Mesh Networks (WMNs) can provide seamless broadband connectivity to network users with low setup and maintenance costs. To support next-generation applications with real-time requirements, however, these networks must provide improved quality of service guarantees. Current mesh protocols use techniques that fail to accurately predict the performance of end-to-end paths, and do not optimize performance based on knowledge of mesh network structures. In this paper, we propose QUORUM, a routing protocol optimized for WMNs that provides accurate QoS properties by correctly predicting delay and loss characteristics of data traffic. QUORUM integrates a novel end-to-end packet delay estimation mechanism with stability-aware routing policies, allowing it to more accurately follow QoS requirements while minimizing misbehavior of selfish nodes.
Hide Abstract
2007
-
Anup K. Ghosh, Sudipto Das, Rajesh Roy, Amitava Mukherjee, "Sender Side Intelligence for TCP Throughput Enhancement in Wired-cum-Wireless Networks", [Paper] The 18th Annual IEEE International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC.07), September 2007, Athens, Greece.
Performance of the TCP Congestion Control Algorithm has been the focus of research over the last decade. In this paper we propose modifications to TCP Congestion Control to improve its performance in wired-cum-wireless networks The key idea to determine the Optimal Congestion Window for a TCP Sender, in a particular network scenario (that corresponds to the fair share of that connection) and keep this congestion window a constant to a point where the fair share in the network has changed considerably from the instance of the calculation of the size of the last window. At this point, the TCP Congestion Window is recalculated according to the nature of new scenario. The proposed mechanism is particularly effective over wireless links, which have an inherently loss-prone nature, as Modified TCP.s congestion window being independent of packet losses (be it corruption losses or it congestion losses), keeps transmitting at the same rate at before.
Hide Abstract
-
Vinod Kone, Sudipto Das, Ben Y. Zhao and Haitao Zheng, "QUORUM - QUality Of service RoUting in wireless Mesh networks", [Paper] Proceedings of International Conference on Heterogeneous Networking for Quality, Reliability, Security and Robustness (QShine) Vancouver, Canada, August 2007.
Wireless Mesh Networks (WMNs) can provide seamless broadband connectivity to network users, with the advantage of low setup and maintenance costs. To support next-generation applications with real-time requirements, however, these networks must provide improved Quality of Service guarantees. Most current mesh network routing protocols are adapted from MANET protocols, and do not optimize for mesh network properties. In this paper, we propose QUORUM (QUality Of service RoUting in wireless Mesh networks ), a routing protocol optimized for WMNs that addresses these drawbacks. QUORUM integrates a novel end-to-end packet delay estimation mechanism with stability-aware routing policies, allowing it to more accurately follow QoS requirements while minimizing misbehavior of selfish nodes.
Hide Abstract
2006
-
Rajesh Roy, Sudipto Das, Anup K. Ghosh, Amitava Mukherjee, "Modified TCP Congestion Control Algorithm for Throughput Enhancement in Wired-cum-Wireless Networks", [Paper] In Proc. of 4th Swedish National Computer Networking Workshop, Oct 2006, Luleå, Sweden (SNCNW 2006).
The key idea proposed in this paper is to determine the Optimal Congestion Window for a TCP Sender in a particular network scenario (that corresponds to the fair share of that connection) and keep this congestion window a constant to a point where the fair share in the network has changed considerably from the instance of the calculation of the size of the last window. At this point, the TCP Congestion Window is recalculated according to the nature of new scenario. The proposed mechanism is particularly effective over wireless links, which have an inherently loss-prone nature, as Modified TCP.s congestion window being independent of packet losses (be it corruption losses or it congestion losses), keeps transmitting at the same rate as before.
Hide Abstract
-
Sudipto Das, Rajesh Roy, Pradip K. Das, "Optimizations to Multipath Routing Protocols in Mobile Ad hoc Networks", [Paper] [Talk] In Proc of International Conference on Emerging Applications of IT Feb 2006, Kolkata, India(EAIT 2006).
Mobile Ad hoc Networks are typically characterized by high mobility and frequent link failures. As a result, routing algorithms selecting a single path during route creation have to make frequent route discoveries resulting in decreased throughput and high endto- end delay. Multipath routing approaches like AOMDV make use of pre-computed routes determined during route discovery. This solution, however, suffers during high mobility because the alternate paths are not actively maintained. Hence, precisely when needed, the routes are often broken. In this paper, the information gathered by a node about its neighbor, in addition to those proposed in [1], is used to dynamically determine the node to which a particular data packet has to be forwarded. Using this approach a better load balancing can be obtained in addition to utilization of the additional routes, if feasible, and in the process maintaining these routes. We also explore the possibility of implementing QoS using such a scheme.
Hide Abstract
-
Rajesh Roy, Sudipto Das, Pradip K. Das,"A Pragmatic Approach towards the Improvement of Performance of Ad Hoc Routing Protocols",[Paper] [Talk] In Proc. of 4th Asian International Mobile Computing Conference, Jan 2006, Kolkata, India (AMOC 2006).
Mobile ad hoc networks are typically characterized by high mobility and frequent link failures that result in low throughput and high end-to-end delay. In order to facilitate communication within the network, a routing protocol is used to discover routes between nodes. The primary goal of such an ad hoc network routing protocol is correct and efficient route establishment between a pair of nodes so that messages may be delivered in a timely manner. Though a large number of routing protocols have been developed for this kind of networks, most of them suffer from implementation difficulties and some inherent incapabilities to cope with highly dynamic scenarios as well as support for Quality of Services (QoS). In this paper we propose to introduce a new layer in between network layer and IEEE 802.11 MAC sub layer. This layer will interact with the underlying MAC layer and provide link connectivity information as well as various other valuable parameters (e.g. received signal strength, network resource usage etc.), which can be used for further enhancement of the routing algorithms.
Hide Abstract
2005
-
Anup K. Ghosh, Sudipto Das, Rajesh Roy, Amitava Mukherjee,
"Constant Congestion Window approach for TCP - effect on Fairness", [Paper] [Talk] In Proc. of 3rd Swedish National Computer Networking Workshop, Sep 2005, Halmstad, Sweden (SNCNW 2005).
This is an extension of the work done by Ghosh et al. [1] whereby it was indicated that a fixed congestion window performs better than a TCP Reno implementation. This paper investigates the impact of this model on the fairness of connections and comes up with suggestion to improve this.
Hide Abstract
Top of Page
Technical Reports
2009
, UCSB CS Tecnical Report 2009-04, March 2009.
Many real-world data stream analysis applications such as network monitoring, click stream analysis etc., require combining multiple streams of data arriving from multiple sources. This is referred to as multi-stream analysis. To deal with high stream arrival rates, it is desirable that such systems be capable of supporting very high processing throughput. The advent of multicore processors and powerful servers driven by these processors open many research challenges for exploiting parallelism for high throughput processing. This calls for efficient parallel designs that can effectively utilize the parallelism of the multicores. In this paper, we address this problem of parallelizing multi-stream analysis in the context of multicore processors. Parallelizing stream analysis is extremely important since in the multicore era, performance improvement is possible only through effective parallelization. Specifically, we concentrate on parallelizing frequent elements, top-k, and frequency counting over multiple streams. We discuss the challenges in designing an efficient parallel system for processing streams originating from multiple sources. Our evaluation and analysis reveals that traditional .contention. based locking results in excessive overhead which lead to severe performance degradation in modern multicore architectures. Based on our analysis, we propose a .cooperation. based locking paradigm for efficient parallelization of frequency counting. Our implementation of the proposed paradigm to parallelize a well known frequency counting algorithm shows the benefits of the proposed .cooperation. based locking paradigm when compared to the traditional .contention. based locking paradigm. In our experiments, the proposed .cooperation. based design outperforms the traditional .contention. based design by a factor of 2 . 5.5X for synthetic zipfian data sets.
Hide Abstract
Sudipto Das, Omer Egecioglu, Amr El Abbadi, "Anonymizing Edge-Weighted Social Network Graphs", UCSB CS Tecnical Report 2009-03, March 2009.
The increasing popularity of social networks has initiated a fertile research area in information extraction and data mining. Although such analysis can facilitate better understanding of sociological, behavioral, and other interesting phenomena, there is growing concern about personal privacy being breached, thereby requiring effective anonymization techniques. If we consider the social graph to be a weighted graph, then the problem of anonymization can be of various types: node identity anonymization, structural anonymization, or edge weight anonymization. In this paper, we consider edge weight anonymization in which the anonymization preserves important properties of the underlying graphs. Our approach builds a linear programming (LP) model which preserves properties of the graph that are expressible as linear functions of the edge weights. Such properties form the foundations of many prevalent graph-theoretic algorithms such as single source shortest paths tree, all-pairs shortest paths, k-nearest neighbors, minimum cost spanning tree, etc. Off-the-shelf LP solvers can be used to find solutions to the resulting model where the computed solutions form the weights of the anonymized graph. We demonstrate how this abstract model can be used for different applications, prove the correctness of the extended models, analyze their complexity for selected graph algorithms, and experimentally evaluate the proposed techniques using real social network data sets.
Hide Abstract
2008
-
Sudipto Das, Shyam Antony, Divyakant Agrawal, Amr El Abbadi, "Clouded Data: Comprehending Scalable Data Management Systems", [Paper] UCSB CS Tecnical Report 2008-18, November 2008.
Managing petabytes of data for millions of users has been a challenge for big internet based enterprises such as Google, Yahoo!, and Amazon. Even though database management systems have a long history of managing enterprise level data and information, they are deemed to be unsuitable in this context. This resulted in an architectural redesign of data management systems with an eye towards the requirements of high scalability, high availability, and low latency while providing weaker consistency and lower application generality. In this paper, we try to comprehend what is fundamentally different in the internet-scale applications that allowed these data management systems to achieve orders of magnitude higher levels of scalability compared to traditional databases. With an understanding of these modern systems, we also make an attempt to predict future application requirements and raise two fundamental questions: where do we really stand in terms of scalable data management, and how far are we from providing scalable data management as a service, just as computing is provided as a service in large scale infrastructures?
Hide Abstract
-
Sudipto Das, Shyam Antony, Divyakant Agrawal, Amr El Abbadi, "CoTS: A Scalable Framework for Parallelizing Frequency Counting over Data Streams", [Paper] UCSB CS Tecnical Report 2008-08, June 2008.
Applications involving analysis of data streams have gained significant popularity and importance. Frequency counting, frequent elements and top-k queries form a class of operators that are used for a wide range of stream analysis applications. In spite of the abundance of these algorithms, all known techniques for answering data stream queries are sequential in nature. The imminent ubiquity of Chip Multi-Processor (CMP) architectures requires algorithms that can exploit the parallelism of such architectures. In this paper, we first explore the challenges in parallelizing frequent elements and top-k queries in the context of the inherent parallelism available in multi-core processors, evaluate different naive techniques for intra-operator parallelism, and summarize the insights obtained from the different parallelization efforts. Our experimental analysis of the naive designs implemented in the paper shows that intra-operator parallelism is not straightforward and requires a complete redesign of the system. Based on the lessons learnt from this analysis, we design an efficient and scalable framework for parallelizing frequency counting, frequent elements and top-k queries over data streams. The proposed CoTS (Co-operative Thread Scheduling) framework is based on the principle of threads co-operating rather than contending. Our experiments on a state-of-the-art quad-core chip multiprocessor architecture and synthetic data sets demonstrate the scalability of the proposed framework, and the efficiency is demonstrated by peak throughput of more that 60 million elements per second. In addition, for skewed data distributions, despite using heavy weight synchronization primitives, the implementation of the proposed framework outperforms the sequential implementation by a factor of 2 - 4X.
Hide Abstract
Top of Page
Talks & Presentations
2009
2008
2007
2006
2005