Reading List for Database & Scalable Data Management & Analytics
Foundation Papers
There are many good lists of basic Database papers. A good one to start with is University of Wisconsin-Madison Database Reading List:
Similarly, there are several lists of basic scalable system papers. A great one is from Worcester Polytechnic Institute (WPI):
Recent Papers
- Main Database Engines/Vendors:
- Oracle (ICDE 2015)
- IBM DB2 (many)
- SAP HANA (ICDE 2015)
- Microsoft Hekaton, MySQL (many)
- CouchDB
- MemSQL
- Silo
- Analytics Platform (Stream-based, Batch-based and etc.) on NoSQL data storage:
- Hadoop
- Spark
- Flink
- Impala
- Database Storage Format (column-store, row-store, hybrid & etc.)
- Abadi, Daniel J., Samuel R. Madden, and Nabil Hachem. Column-stores vs. row-stores: How different are they really?. SIGMOD 2008
- V. Srinivasan, Brian Bulkowski, Wei-Ling Chu, Sunil Sayyaparaju, Andrew Gooding, Rajkumar Iyer, Ashish Shinde, Thomas Lopatic. Aerospike: Architecture of a Real-Time Operational DBMS. VLDB 2016.
- Harald Lang, Tobias Muehlbauer, Florian Funke, Peter Boncz, Thomas Neumann, Alfons Kemper. Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and Compilation. SIGMOD 2016.
- Mohammed Al-Kateb, Paul Sinclair, Grace Au, Carrie Ballinger. Hybrid Row-Column Partitioning in Teradata. VLDB 2016.
- Anil Shanbhag, Alekh Jindal, Yi Lu, Samuel Madden. Amoeba: A Shape changing Storage System for Big Data VLDB 2016.
- Graph-Database paper? TBU.
- Database Partitioning and Indexing - Access Methods
- V. Leis, A. Kemper, and T. Neumann. The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases. ICDE 2013.
- Victor Alvarez, Stefan Richter, Xiao Chen, Jens Dittrich. A comparison of adaptive radix trees and hash tables. ICDE 2015.
- Martin Schäler, Alexander Grebhahn, Reimar Schröter, Sandro Schulze, Veit Köppen, Gunter Saake. QuEval: Beyond high-dimensional indexing a la carte. VLDB 2014.
- Sarath Lakshman, Sriram Melkote, John Liang, Ravi Mayuram. Nitro: A Fast, Scalable In-Memory Storage Engine for NoSQL Global Secondary Index. VLDB 2016.
- J. Levandoski, D. B. Lomet, and S. Sengupta. The Bw-Tree: A B-tree for New Hardware Platforms. ICDE 2013.
- Pedro Pedreira, Chris Croswhite, Luis Bona. Cubrick: Indexing Millions of Records per Second for Interactive Analytics. VLDB 2016.
- Onur Kocberber, Babak Falsafi, Boris Grot. Asynchronous Memory Access Chaining. VLDB 2016.
- Amirhesam Shahvarani, Hans-Arno Jacobsen. A Hybrid B+-Tree as Solution for In-Memory Indexing on CPU-GPU Heterogeneous Computing Platforms. SIGMOD 2016.
- Goetz Graefe, Haris Volos, Hideaki Kimura, Harumi Kuno, Joseph Tucek, Mark Lillibridge, Alistair Veitch. In-Memory Performance for Big Data. VLDB 2015.
- Query Execution and Processing on Scalable Systems
- Ahmed M. Aly, Ahmed R. Mahmood, Mohamed S. Hassan, Walid G. Aref, Mourad Ouzzani, Hazem Elmeleegy, Thamir Qadah AQWA: Adaptive Query-Workload-Aware Partitioning of Big Spatial Data. VLDB 2015.
- Qiang Huang, Jianlin Feng, Yikai Zhang, Qiong Fang, Wilfred Ng. Query-Aware Locality-Sensitive Hashing for Approximate Nearest Neighbor Search. VLDB 2016.
- Juwei Shi, Yunjie Qiu, Umar Farooq Minhas, Limei Jiao, Chen Wang, Berthold Reinwald, Fatma Ozcan. Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics. VLDB 2016.
- Armbrust, M., Xin, R. S., Lian, C., Huai, Y., Liu, D., Bradley, J. K., … & Zaharia, M. . Spark SQL: Relational Data Processing in Spark. SIGMOD 2015.
- Zuhair Khayyat, William Lucia, Meghna Singh, Mourad Ouzzani, Paolo Papotti, Jorge-Arnulfo Quiane-Ruiz, Nan Tang, Panos Kalnis. Lightning Fast and Space Efficient Inequality Joins. VLDB 2015.
- Ben Kimmett, Venkatesh Srinivasan, Alex Thomo. Fuzzy Joins in MapReduce: An Experimental Study. VLDB 2015.
- Parth Nagarkar, K. Selçuk Candan, Aneesha Bhat. Compressed Spatial Hierarchical Bitmap (cSHB) Indexes for Efficiently Processing Spatial Range Query Workloads. VLDB 2015.
- Srikanth Kandula; Anil Shanbhag; Aleksandar Vitorovic; Matt Olma; Robert Grandl; Surajit Chaudhuri; Bolin Ding. Quickr: Lazily Approximating Complex AdHoc Queries in BigData Clusters. SIGMOD 2016.
- Guoqiang Jerry Chen; Janet L. Wiener; Shridhar Iyer; Anshul Jaiswal; Ran Lei Nikhil Simha; Wei Wang; Kevin Wilfong; Tim Williamson; Serhat Yilmaz. Realtime Data Processing at Facebook. SIGMOD 2016.
- Eyal Altshuler, Tova Milo. An Efficient MapReduce Cube Algorithm for Varied Data Distributions. SIGMOD 2016.
- Manos Athanassoulis, Anastasia Ailamaki. BF-Tree: Approximate Tree Indexing Research. VLDB 2014.
- Amirhesam Shahvarani, Hans-Arno Jacobsen. A Hybrid B+-tree as Solution for In-Memory Indexing on CPU-GPU Heterogeneous Computing Platforms. SIGMOD 2016
- Dong-Wan Choi, Chin-Wan Chung. Nearest neighborhood search in spatial databases. ICDE 2015.
- Query Analytics on modern systems
- Wail Y. Alkowaileet, Sattam Alsubaiee, Michael J. Carey, Till Westmann, Yingyi Bu. Large-scale Complex Analytics on Semi-structured Datasets using AsterixDB and Spark. VLDB 2016.
- Wenjian Xu; Ziqiang Feng; Eric Lo. Fast Multi-column Sorting in Main-Memory Column-Stores. SIGMOD 2016.
- Query Planning and Optimizing
- T. Neumann. Efficiently Compiling Efficient Query Plans for Modern Hardware. VLDB 2011.
- Concurrency Control
- Cong Yan, Alvin Cheung. Leveraging Lock Contention to Improve OLTP Application Performance. VLDB 2016.
- Other research topics for traditional DBMS
- Non-volatile memory
- Hardware transactional memory
- Cold-data management
- RDMA/Fast networks
- Spatial Data Query Processing.
- Badrish Chandramouli, Jonathan Goldstein, Abdul Quamar. Scalable Progressive Analytics on Big Data in the Cloud. VLDB 2014.
- Mingjie Tang, Yongyang Yu, Qutaibah M. Malluhi, Mourad Ouzzani, Walid G. Aref. LocationSpark: A Distributed In-Memory Data Management System for Big Spatial Data. VLDB 2016.
- Ricardo Fernandes, Piotr Zaczkowski, Bernd Göttler, Conor Ettinoffe, Anis Moussa. TrafficDB: HERE’s High Performance Shared-Memory Data Store. VLDB 2016.
- Spatial & Spatio-temporal Analytics
- Shiming Zhang, Yin Yang, Wei Fan, Marianne Winslet. Design and Implementation of a Real-Time Interactive Analytics System for Large Spatio-Temporal Data. VLDB 2014.
- Harish Doraiswamy, Huy Vo, Claudio Silva and Juliana Freire. A GPU-Based Index to Support Interactive Spatio-Temporal Queries over Historical Data. ICDE 2015.
- Social & Healthcare Data Analytics
- Zheng Jye Ling, Quoc Trung Tran, Ju Fan, Gerald C.H. Koh, Thi Nguyen, Chuen Seng Tan, James W. L. Yip, Meihui Zhang. GEMINI: An Integrative Healthcare Analytics System. VLDB 2014.
- TBU