Big data is the area of informatics focusing on datasets whose size is beyond the ability of typical database and other software tools to capture, store, analyze and manage. This course provides a rapid immersion into the area of big data and the technologies which have recently emerged to manage it.
We start with an introduction to the characteristics of big data and an overview of the associated technology landscape and continue with an in depth exploration of Hadoop, the leading open source framework for big data processing. Here the focus is on the most important Hadoop components such as Hive, Pig, stream processing and Spark as well as architectural patterns for applying these components. We continue with an exploration of the range of specialized (NoSQL) database systems architected to address the challenges of managing large volumes of data.
Overall the objective is to develop a sense of how to make sound decisions in the adoption and use of these technologies as well as economically deploy them on modern cloud computing infrastructure.
View SyllabusSkills You'll Learn
Databases, Apache Hive, Apache Hadoop, Data Management, Real Time Data, Data Processing, Data Architecture, Database Management Systems, Software Design Patterns, Data Infrastructure, Scalability, Distributed Computing, NoSQL, Apache Kafka, Data Lakes, Apache Spark, Big Data, Cloud Computing, MongoDB, Apache Cassandra
From the lesson
Module 8: Key-Value, Wide-Column & Document Stores
In Module 8, students will explore specific NoSQL databases types – namely Key-Value, Wide-Column, and Document databases. Two similar systems, HBase and Cassandra, will be studied and contrasted in the context of the CAP theorem and associated CP/AP trade-offs. Topics such as consistency and availability will be discussed in the context of specific usage scenarios for both HBase and Cassandra – and general application domains of both systems will be highlighted. Finally, the document database MongoDB will be reviewed in the context of natural language/text processing use cases – and MongoDB usage and architecture will be analyzed with respect to traditional RDBMS.
Taught By
Yousef Elmehdwi
Associate Teaching Professor of CS