This page is an archived version of our first offering. For more information and to register for our current offerings, please visit the Digital Programs microsite.
Tackling the Challenges of Big Data
Course Description | What the Media Says | Course Overview | Key Benefits
Earn a Certificate of Completion | Earn CEUs | Who Should Participate/Prerequisites Learning Objectives | Course Outline | Instructors | Course Vision | Location
This Digital course will survey state-of-the-art topics in Big Data, looking at data collection (smartphones, sensors, the Web), data storage and processing (scalable relational databases, Hadoop, Spark, etc.), extracting structured data from unstructured data, systems issues (exploiting multicore, security), analytics (machine learning, data compression, efficient algorithms), visualization, and a range of applications.
Each module will introduce broad concepts as well as provide the most recent developments in research.
The course will be taught by a team of world experts from MIT and the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) in each of these areas.
CSAIL is the largest research laboratory at MIT and one of the world’s most important centers of information technology research. CSAIL and its members have played a key role in the computer revolution. The lab’s researchers have been key movers in developments like time-sharing, massively parallel computers, public key encryption, the mass commercialization of robots, and much of the technology underlying the ARPANet, Internet, and the World Wide Web.
CSAIL members (former and current) have launched more than 100 companies, including RSA Data Security, Akamai, iRobot, Meraki, ITA Software, and Vertica. The Lab is home to the World Wide Web Consortium (W3C).
With backgrounds in data, programming, finance, multicore technology, database systems, robotics, transportation, hardware, and operating systems, each MIT Tackling the Challenges of Big Data professor brings their own unique experience and expertise to the course.
Daniela Rus—director, MIT Computer Science and Artificial Intelligence Laboratory Professor, Electrical Engineering Computer Systems: “Through our partnerships with our key industrial stakeholders, we understand that Big Data is the next frontier to tackle...” read more »
Sam Madden—director, Big Data Initiative, MIT Computer Science and Artificial Intelligence Laboratory Professor, Electrical Engineering Computer Systems: “At CSAIL, we think of Big Data as a big opportunity to develop the next generation of technologies to store, manage, analyze, share, and understand the huge quantities of data we are now collecting...” read more »
Sanjay Sarma—director of the MIT Office of Digital Learning and Fred Fort Flowers and Daniel Fort Flowers Professor of Mechanical Engineering: “I am thrilled with MIT Professional Education's Digital course offering of Tackling the Challenges of Big Data as part of MIT's digital learning portfolio...” read more »
Bhaskar Pant—executive director, MIT Professional Education: “MIT Professional Education is pleased to be able to offer a unique and comprehensive online course addressing a very important challenge facing industry today...” read more »
What the Media Says:
CIO Today: "The big data course offered by MIT should be required in any enterprise where business users interact with data. Business users crave big data and analytics tools, but without an understanding of what makes data good or bad they may make decisions based on insight that’s fallacious. MIT's big data course is an important step for the industry."
Bostinno: "Through the course, companies will have the ability to offer training and education to employees on big data, a field ripe for innovation that executives can tap into to make better decisions."
Taking into consideration various time zones, this course is self-paced with online accessibility 24/7. Lectures are pre-taped and you can follow along when you find it convenient as long as you finish on April 1. You may find it more beneficial to stick to a weekly schedule so you can stay up-to-date with the discussion forums. There are approximately five hours of video every week. You will spend additional time on multiple-choice assessments, readings, and discussion forums. Most participants will spend about seven hours a week on course-related activities.
The course is held over four weeks and will provide the following:
- Five modules covering 18 topic areas: with 20 hours of video
- Five assessments to reinforce key learning concepts of each module
- Case studies
- Discussion forums for participants to discuss thought provoking questions in medicine, social media, finance, and transportation posed by the MIT faculty teaching the course; share, engage, and ideate with other participants
- Community Wiki for sharing additional resources, suggested readings, and related links
Participants will also take away:
- Course materials from all presentations
- 30 day access to the archived course (includes videos, discussion boards, content, and Wiki)
- Position yourself in your organization as a vital subject matter expert regarding major technologies and applications in your industry that are driving the Big Data revolution and position your company to propel forward and stay competitive
- Engage confidently with management on opportunities and Big Data challenges faced by your industry; analyze emerging technologies and how those technologies can be applied effectively to address real business problems while unlocking the value of data and its potential use for company growth
- Learn and assess the issues of scalability – make your work more productive – save time and money
- Gain valuable insights and access to CSAIL research that will differentiate how you and your company breakdown Big Data to save time and money while making work more efficient
- Convenient, flexible schedule with access 24 hours a day, from anywhere in the world, no travel time, inexpensive, taught by world-renowned MIT faculty
- Earn a Certificate of Completion and 2.0 CEUs from MIT Professional Education
MIT Professional Education Alumni Benefits
After completing Tackling the Challenges of Big Data, participants will become alumni of MIT Professional Education and will receive all the associated benefits and courtesies listed below.
- Receive exclusive discounts on all future Short Programs and Digital Programs courses
- Access will be provided to our restricted MIT Professional Education alumni group on LinkedIn; this includes invites to join all MIT Professional Education social media platforms
- Networking opportunities with other individuals from around the globe working in a variety of industries interested in technology, computer science, entrepreneurship, science, research, and Big Data, among many others
- Email distribution of our MIT Professional Education newsletter
- Finally, participants will join the MIT Professional Education alumni mailing list where they will receive advanced notice regarding special announcements on upcoming courses, programs, and events
Earn a Certificate of Completion
Upon successful completion of the course a Certificate of Completion will be awarded by MIT Professional Education.
To earn a Certificate of Completion in this course, participants should watch all the videos, actively participate in the discussion boards, and complete all assessments by April 01, 2014, with an average of 80 percent success rate.
The Certificate of Completion will be awarded on April 02, 2014, by MIT Professional Education.
Grading: Grades are not awarded for this course.
Sample Certificate of Completion:
Earn MIT Professional Education Continuing Education Units (CEUs)
Participants of Tackling the Challenges of Big Data offered March 4 - April 1, 2014 who successfully complete all course requirements are eligible to receive 2.0 Continuing Education Units (2.0 CEUs).
CEUs are a nationally recognized means of recording noncredit/non-degree study. They are accepted by many employers, licensing agencies, and professional associations as evidence of a participant’s serious commitment to the development of a professional competence.
CEUs are based on hours of instruction. For example: One CEU = 10 hours of instruction.
CEUs may not be applied toward any MIT undergraduate or graduate level course.
Who Should Participate
Prerequisite(s): This course is designed to be suitable for anyone with a bachelor’s level education in computer science or equivalent work experience. No programming experience or knowledge of programming languages is required.
Tackling the Challenges of Big Data is designed to be valuable to both individuals and companies because it provides a platform for discussion from numerous technical perspectives. The concepts delivered through this course can spark idea generation among team members and the knowledge gained can be applied to their company’s approach to Big Data problems and shape the way business operate today.
The application of the course is broad and can apply to both early career professionals as well as senior technical managers.
Participants will benefit the most from the concepts taught in this course if they have at least three years of work experience.
Participants may include:
- Engineers who need to understand the new Big Data technologies and concepts to apply in their work
- Technical managers who want to familiarize themselves with these emerging technologies
- Entrepreneurs who would like to gain perspective on trends and future capabilities of Big Data technology
Participants reside and work from around the world. See a list of countries and companies from professionals who have already registered.
Participants will learn the state-of-the-art in Big Data. The course aims to reduce the time from research to industry dissemination and expose participants to some of the most recent ideas and techniques in Big Data.
After taking this course, participants will:
- Distinguish what is Big Data (volume, velocity, variety), and will learn where it comes from, and what are the key challenges
- Determine how and where Big Data challenges arise in a number of domains, including social media, transportation, finance, and medicine
- Investigate multicore challenges and how to engineer around them
- Explore the relational model, SQL, and capabilities of new relational systems in terms of scalability and performance
- Understand the capabilities of NoSQL systems, their capabilities and pitfalls, and how the NewSQL movement addresses these issues
- Learn how to maximize the MapReduce programming model: What are its benefits, how it compares to relational systems, and new developments that improve its performance and robustness
- Learn why building secure Big Data systems is so hard and survey recent techniques that help; including learning direct processing on encrypted data, information flow control, auditing, and replay
- Discover user interfaces for Big Data and what makes building them difficult
- Measure the need for and understand how to create sublinear time algorithms
- Manage the development of data compression algorithms
- Formulate the “data integration problem”: semantic and schematic heterogeneity and discuss recent breakthroughs in solving this problem
- Understand the benefits and challenges of open-linked data
- Comprehend machine learning and algorithms for data analytics
Modules, Topics, and Faculty
Module One: Introduction and Use Cases
The introductory module aims to give a broad survey of Big Data challenges and opportunities and highlights applications as case studies.
Introduction: Big Data Challenges (Sam Madden)
- Identify and understand the application of existing tools and new technologies needed to solve next generation data challenges
- Challenges posed by the ability to scale and the constraints of today's computing platforms and algorithms
- Addressing the universal issue of Big Data and how to use the data to align with a company’s mission and goals
Case Study: Transportation (Daniela Rus)
- Data-driven models for transportation
- Coresets for Global Positioning System (GPS) data streams
- Congestion-aware planning
Case Study: Visualizing Twitter (Sam Madden)
- Understand the power of geocoded Twitter data
- Learn how Graphic Processing Units (GPUs) can be used for extremely high throughput data processing
- Utilize MapD, a new GPU-based database system for visualizing Twitter in action
Module Two: Big Data Collection
The data capture module surveys approaches to data collection, cleaning, and integration.
Data Cleaning and Integration (Michael Stonebraker)
- Available tools and protocols for performing data integration
- Curation issues (cleaning, transforming, and consolidating data)
Hosted Data Platforms and the Cloud (Matei Zaharia)
- How performance, scalability, and cost models are impacted by hosted data platforms in the cloud
- Internal and external platforms to store data
Module Three: Big Data Storage
The module on Big Data storage describes modern approaches to databases and computing platforms.
Modern Databases (Michael Stonebraker)
- Survey data management solutions in today’s market place, including traditional RDBMS, NoSQL, NewSQL, and Hadoop
- Strategic aspects of database management
Distributed Computing Platforms (Matei Zaharia)
- Parallel computing systems that enable distributed data processing on clusters, including MapReduce, Dryad, Spark
- Programming models for batch, interactive, and streaming applications
- Tradeoffs between programming models
NoSQL, NewSQL (Sam Madden)
- Survey of new emerging database and storage systems for Big Data
- Tradeoffs between reduced consistency, performance, and availability
- Understanding how to rethink the design of database systems can lead to order of magnitude performance improvements
Module Four: Big Data Systems
The systems module discusses solutions to creating and deploying working Big Data systems and applications.
Security (Nickolai Zeldovich)
- Protecting confidential data in a large database using encryption
- Techniques for executing database queries over encrypted data without decryption
Multicore Scalability (Nickolai Zeldovich)
- Understanding what affects the scalability of concurrent programs on multicore systems
- Lock-free synchronization for data structures in cache-coherent shared memory
User Interfaces for Data (David Karger)
- Principles of and tools for data visualization and exploratory data analysis
- Research in data-oriented user interfaces
Module Five: Big Data Analytics
The analytics module covers state-of-the-art algorithms for very large data sets and streaming computation.
Fast Algorithms I (Ronitt Rubinfeld)
- Efficiency in data analysis
Fast Algorithms II (Piotr Indyk)
- Advanced applications of efficient algorithms
- Scale-up properties
Data Compression (Daniela Rus)
- Reducing the size of the Big Data file and its impact on storage and transmission capacity
- Design of data compression schemes such as coresets to apply to Big Data set
Machine Learning Tools (Tommi Jaakkola)
- Computational capabilities of the latest advances in machine learning
- Advanced machine learning algorithms and techniques for application to large data sets
Case Study: Information Summarization (Regina Barzilay)
Applications: Medicine (John Guttag)
- Utilize data to improve operational efficiency and reduce costs
- Analytics and tools to improve patient care and control risks
- Using Big Data to improve hospital performance and equipment management
Applications: Finance (Andrew Lo)
- Learn how big data and machine learning can be applied to financial forecasting and risk management
- Analyze the dynamics of the consumer credit card business of a major commercial bank
- Recognize and acquire intuition for business cases where big data is useful and where it isn't
Rus is professor of Electrical Engineering and Computer Science and Director of the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT. Rus’ research interests include distributed robotics, mobile computing, and programmable matter. At CSAIL, she has led numerous groundbreaking research projects in the areas of transportation, security, environmental modeling and monitoring, underwater exploration, and agriculture. Her research group, the Distributed Robotics Lab, has developed modular and self-reconfiguring robots, systems of self-organizing robots, networks of robots and sensors for first responders, mobile sensor networks, techniques for cooperative underwater robotics, and new technology for desktop robotics. They have built robots that can tend a garden, bake cookies from scratch, cut birthday cake, fly in swarms without human aid to perform surveillance functions, and dance with humans.
Madden is a computer scientist specializing in database management systems. He is the faculty director of MIT’s Big Data Initiative at CSAIL and co-director of the Intel Science and Technology Center (ISTC) in Big Data at CSAIL. Recent projects include CarTel, a distributed wireless platform that monitors traffic and onboard diagnostic conditions in order to generate road surface reports, and Relational Cloud, a project investigating research issues in building a database as a service. In 2005, Madden was named one of Technology Review magazine’s “Top 35 Under 35.” He is also cofounder of Vertica (acquired by HP).
Additional Faculty Instructors:
Regina Barzilay is professor of Computer Science at MIT. Her research interests are in natural language processing. For her doctoral dissertation at Columbia University, she led the development of Newsblaster, which does what no computer program could do before: recognize stories from different news services as being about the same basic subject, and then paraphrase elements from all of the stories to create a summary. Though humans can easily divine the meaning of a word from its context, computers cannot. Barzilay uses statistical machine learning software to teach computers to make educated guesses. Barzilay is a recipient of the NSF Career Award, Microsoft Faculty Fellowship, and has been named as one of "Top 35 Innovators Under 35" by Technology Review magazine. One of her current projects is Condensr – a review summarization system that processes Yelp restaurant reviews and categorizes them, breaking down reviews into comments about food, ambiance, service, and value, as well as giving an overall summary of reviewer sentiment. The goal is to go beyond a simple star rating to give the overall consensus of diners about various aspects of a restaurant experience.
John Guttag is the Dugald C. Jackson professor of electrical engineering and computer science at MIT. He is also co-head of the MIT Computer Science and Artificial Intelligence Laboratory's Networks and Mobile Systems Group. This group studies issues related to computer networks, applications of networked and mobile systems, and advanced software-based medical instrumentation and decision systems. In addition to his academic activities, Guttag has had long-term consulting relationships with a number of industrial research and advanced development organizations. He has also worked for many years as a consultant specializing in the analysis of information systems related business opportunities and risks. Guttag currently serves on the technical advisory board of Vanu, Inc., on the Board of Directors of Empirix and Avid Technologies, and on the Board of Trustees of the Massachusetts General Hospital Institute of Health Professions.
Piotr Indyk is a professor in the Theory of Computation Group. His research focuses primarily on computational geometry in high dimensions, streaming algorithms, and computational learning theory. He has made a range of contributions to these fields, particularly in the study of low-distortion embedding, algorithmic coding theory, and geometric and combinatorial pattern matching. Among his many awards, in 2012 Indyk received Technology Review’s: “10 Emerging Technologies That Will Change the World” award for work on Faster Fourier Transform. He is a member of Wireless@MIT and Big Data@CSAIL research initiatives.
Tommi Jaakkola's research interests include many aspects of machine learning, statistical inference and estimation, and analysis and development of algorithms for various modern estimation problems such as those involving predominantly incomplete data sources. One of his current projects is in the energy area hydrocarbon exploration. In this project, the goal is to identify boundaries between different types of underground rocks using seismic sensors. Such boundaries are of interest in hydrocarbon exploration as they are places where oil is often present. These sensors produce massive streams of data that need to be mined to understand the location of boundaries. Researchers are working these mining algorithms, as well as advanced compression and encoding techniques to compactly summarize these data streams.
David R. Karger is a professor of Electrical Engineering and Computer Science at MIT. He leads the Haystack group, devoted to making it easier for people to create, find, organize, manipulate, and share information individually and socially. He co-led MIT's SIMILE project, a collaboration with MIT Libraries and the World Wide Web consortium developing tools to improve the management and retrieval of information for libraries. He is responsible for Karger's algorithm, a Monte Carlo method to compute the minimum cut of a connected graph. Karger has spent time working at Akamai and consults for Google, Microsoft, and Vanu Inc.
Andrew W. Lo is the Charles E. and Susan T. Harris professor, a professor of Finance, and the Director of the Laboratory for Financial Engineering at the MIT Sloan School of Management. He was cited by Time magazine as one of “The World's 100 Most Influential People: 2012.” He is founder and chief scientific officer of AlphaSimplex Group, LLC, a quantitative investment management company based in Cambridge, Massachusetts. He has received awards for teaching from both MIT and Wharton.
Ronitt Rubinfeld is a professor in the Department of Electrical Engineering and Computer Science at MIT. Her current interests include randomized and sublinear time algorithms. In particular, she is interested in what can be understood about data by looking at only a very small portion of it. Rubinfeld has served on the program committees of numerous conferences in theoretical computer science and is on the editorial board of the Theory of Computing Systems Journal. She is a co-chair of the DIMACS Special Focus on Network Security.
Michael Stonebraker (Bio)
Adjunct Professor, Electrical Engineering Computer Systems
Through a series of academic prototypes and commercial startups, Stonebraker's research and products are central to many relational database systems on the market today. Founder or co-founder of nine companies, he has been a pioneer of database research and technology for more than a quarter of a century. He was the main architect of the INGRES relational DBMS, the object-relational DBMS, POSTGRES, and the federated data system, Mariposa. All three prototypes were developed at the University of California, Berkeley where Stonebraker was a professor of Computer Science for 25 years. Forbes magazine named him one of the eight innovators driving the Silicon Valley wealth explosion during its 80th anniversary edition in 1998.
Matei Zaharia (Bio)
Assistant Professor, Electrical Engineering Computer Systems
Matei Zaharia earned a PhD from the University of California, Berkeley, where he worked with Scott Shenker and Ion Stoica on topics in computer systems, large-scale data processing, and cloud computing. He created the popular Apache Spark cluster computing project, and is also a committer on Apache Hadoop. In 2014, Matei joined MIT as an assistant professor. Before then, he was the CTO of Databricks, a startup company based on Spark.
His research interests are in building practical secure systems, from operating systems and hardware to programming languages and security analysis tools. He received his PhD from Stanford University in 2008, where he developed HiStar, an operating system designed to minimize the amount of trusted code by controlling information flow. In 2005, he co-founded MokaFive, a company focused on improving desktop management and mobility using x86 virtualization.
Among several awards, Zeldovich received the MIT EECS Spira teaching award in 2013, a Sloan fellowship in 2010, and an NSF CAREER award in 2011.
MIT wants to help solve the world’s biggest and most important problems such as Big Data. Tackling the Challenges of Big Data is an online course developed by the MIT Computer Science and Artificial Intelligence Laboratory in collaboration with MIT Professional Education, and edX.
MIT Professional Education
Computer Science and Artificial Intelligence Laboratory (CSAIL)
This course takes place online.