Scalable Big Data Management and Analytics

The rise of big data is changing our way of thinking about the world by providing new insights and creating new forms of value. The challenges for big data come not only from the volume but also the complexity, such as the multi-dimensional nature and temporal dynamics of the data. With fast increasing processing power of computers at decreasing price, we envision a future in which processing complex big data is as convenient as how we are processing ordinary data today.

Our research goal on big data management and analytics is to address the research challenges for delivering effective, scalable and high performance software systems for managing, querying and mining complex big data at multiple dimensions, including 2D and 3D spatial and imaging data, temporal data, spatial-temporal data, and sequencing data. This is driven by emerging spatial big data problems from geospatial applications, location based services, and social network applications with cost effective ubiquitous positioning technologies and collaborative spatial data collection. Meanwhile, rapid improvement of data acquisition technologies have produced tremendous amount of scientific data, such as high resolution digital pathology images in both 2D and 3D, and next generation whole genome sequencing data. Managing and analyzing such data poses several major challenges, including explosion of data volume, high complexity of data, and/or temporal dynamics.

Our research will ultimately create novel open source software systems to support challenging applications in multiple domains, by researching different architectures for supporting such software with consideration of different forms of heterogeneity, processing patterns, massive parallelism, and layers of storage with different characteristics.