Data management has been becoming increasingly critical to derive value to existing applications and services. This course is designed to cover advanced concepts of data management including (but not limited to) concurrency control, transaction management, query processing, indexing, mobile data management, spatial databases, as well as handling WWW & social media data. The course has a significant hands-on lab component, where students will do programming assignments to further improve their expertise in the concepts and implementation of advanced data management systems.
Unit 1: Concurrency Control
This unit will cover topics such as the need for concurrency control, serializability, recoverability, optimistic & pessimistic concurrency control mechanisms, two-phase locking, two-phase commit, time-stamp ordering, multi-version concurrency control etc.
Unit 2: Transaction Management
This unit will cover topics such as the ACID properties of transactions & the relaxation of some of these properties for new-age applications, rollback, deadlocks, compensating transactions, recovery.
Unit 3: Indexing
This unit discusses various important single-dimensional and multi-dimensional database indexes as well as their variants. Examples include B-trees, R-trees, quadtrees etc. In this unit, students will also learn how to create variants of these fundamental indexes to improve query response times for real-world complex user queries related to domains such as smart cities. The unit also covers the inherent trade-offs associated with each of the indexes so that students can learn how to decide the appropriate index to use for a given application scenario.
Unit 4: Complex Query Processing & Optimization
This unit covers query processing & optimization. Topics in this unit include (but are not limited to) query plans, query size estimation, disk I/O cost estimation etc. This unit will also cover the processing of complex spatial database queries such as multi-way spatial joins, keyword search queries in spatial databases, k-Nearest Neighbor queries, m-closest descriptors queries and so on. Furthermore, this unit will also cover aspects of distributed query processing such as query processing in a cluster environment and issues such as data migration, data replication, index migration & replication, data availability, performance, scalability etc.
Unit 5: Mobile Data Management
Given the ever-increasing popularity and prevalence of mobile devices and apps, the need for effective mobile data management continues to increase dramatically. This unit describes key mobile data management issues such as mobile resource constraints (e.g., energy, bandwidth), incentives for participatory crowdsourcing/crowdsensing, reliability, scalability etc.
Unit 6: Handling WWW & Social Media Data
This unit will discuss existing as well as emerging applications of data management for WWW & social media data. Key issues associated with handling WWW & social media will also be covered. Examples of such issues include noisy data & data reliability, data heterogeneity, data integration, data semantics, knowledge management, unstructured data, scalability etc.