Hadoop: Building Big-Data Apps in the Cloud

March 7-11 2016, JHU Dorsey Center, Elkridge MD
Co-Sponsored by Johns Hopkins Engineering for Professionals
JHU/EP Logo


This class is now over. Next public version is tentatively set for fall 2016 at the same Johns Hopkins location in Maryland. In the meantime, please contact hall@coreservlets.com for information on a customized onsite version at your location. Lower price, more convenient for your developers, and customizable content. Full-day courses can be held at any location worldwide, but for clients in the Baltimore/Washington area, late afternoon, evening, or weekend sessions can also be arranged.

“Wonderful. In 20 years, this is the best organized, most pragmatic and enjoyable course I've taken.”

“The best instructor-led course I have attended, by far.”

“Best short course ever!”

“Compared to the other short courses I have taken, this one completely redefined my scale from 1-10.”

“In my 35+ years of taking technical courses, Marty's classes consistently come out ranking #1 on my list. Highly relevant material is delivered with enthusiasm, humor, and a high degree of class interaction that is unmatched anywhere.”

more student reviews

Hadoop Logo This page describes the public (open enrollment) training course on Hadoop development to be held March 7-11 2016 at in Elkridge, MD (co-sponsored by the Johns Hopkins University Engineering for Professionals program). The entire course is personally developed and taught by experienced Hadoop developer and instructor Karthik Shyamsunder. No contract instructor regurgitating someone else's materials! Coreservlets.com has presented Java-related courses onsite for dozens of organizations in the US, Canada, Mexico, Australia, Japan, Puerto Rico, India, Cambodia, Norway, and the Philippines, all to rave reviews.

If you are looking for customized training courses on Java 7 or 8, JSF 2, PrimeFaces, Android, Ajax, jQuery, Hadoop (and Hadoop cerfication), GWT, Spring, Hibernate, Servlets, JSP, HTML5, or RESTful Web Services taught on-site at your company, please see this page.

Register Early! Five of coreservlets.com's previous public short courses were full, so reserve your spot today. Registrations are taken in the order they are received.


Course Overview

Apache Hadoop is a framework that allows for the distributed processing of massive data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Hadoop has established itself as an industry-leading platform for deploying cloud-based applications and services. The Hadoop eco-system is large, and it includes such popular products as HDFS, Map/Reduce, HBase, Zookeeper, Oozie, Pig, and Hive. However, with such versatility comes complexity and difficulty in deciding on appropriate use cases. This course breaks down the walls of complexity by providing a practical approach to developing Java applications on top of the Hadoop platform.

Hadoop Logo The course presents the material as small building blocks with a thorough coverage of each component in the Hadoop stack. We begin by looking at Hadoop’s architecture and its underlying parts with topdown identification of component interactions within the Hadoop eco-system. The course then provides in-depth coverage of Hadoop Distributed FileSystem (HDFS), HBase, Map/Reduce, Oozie, Pig and Hive. To re-enforce concepts, each section is followed by a set of hands-on exercises. The exercises come in various complexities to accommodate developers with various levels of expertise.

Coreservlets normally runs on-site training courses at customer locations. This is easier administratively, is better for clients since the topics and schedule can be customized, and is more cost effective for students since no travel is required. However, due to demand from those who do not have enough students for an on-site course, Coreservlets will be running a public (open enrollment) Hadoop training course at the Johns Hopkins Dorsey Center in Elkridge MD. This course is personally developed by the instructor Karthik Shyamsunder, an experienced Java Enterprise developer who builds and deploys Hadoop-based applications on a daily basis.

About the Instructor

Karthik Shyamsunder has been working with Apache and Big Data since 2009, and was the main architect for a company-wide storage and computing platform at a major Internet Infrastructure company. Besides having extensive industry experience, Karthik also designed and taught the first Big Data Processing using Apache Hadoop course at the Johns Hopkins University "Engineering for Professionals" program, where he was awarded the Excellence in Teaching award. Karthik also gave three talks on Big Data at the national JavaOne conference.

Karthik is now the Principal Architect and consultant for a major Internet Infrastructure company, architecting solutions, and advising the software engineering department on industry and technology trends. He has given talks and lectures at various conferences including JavaOne, Northern Virginia Software Symposium, and AJAX World. In 2011, Karthik was inducted into the Oracle/JavaOne hall of fame and obtained the "Rock Star" award for his JavaOne presentations.

Prerequisites

The course consists of an approximately equal mixture of lecture and hands-on lab time. The course assumes that all students already have at least moderate previous Java experience, but not necessarily any experience with Hadoop or cloud computing. Although the course will use Java 7, previous experience with earlier Java versions is sufficient. However, the course will definitely move too fast for those with little or no previous experience with Java. Working knowledge of XML is helpful but not absolutely required.

Venue

JHU/EP Logo The course will be held at the Johns Hopkins Dorsey Center in Elkridge, Maryland. This is a modern, comfortable venue with separate computers for each student, fast internet connections, and with coffee, snacks, and meals included. Class meets from 8:30 am to 4:30 pm daily. For students who prefer to bring their own laptops, fast wifi is available, and you can email the instructor for information on installing the class software in advance.

For Maryland residents, the location is centrally located 5 minutes from BWI airport and has plenty of free parking. For out-of-town students, there are many hotels within 1 mile.

Registration

The five-day course costs $2695 per student and includes an extensive course notebook, a commercial textbook, exercises, and exercise solutions. Free breakfast, snacks, and lunch. Compare this price to courses from Learning Tree, GlobalKnowledge, and Oracle University that cost $3500-$4200 for five-day courses and $2800-$3000 for four-day courses and that do not include textbooks or meals. Besides, those courses almost always use an unknown instructor who did not develop the course materials and often lacks significant real-world development experience.

To register, fill out and send in the course registration form. Space is limited: five previous offerings of coreservlets.com courses were full. Bonus: Register at least two weeks in advance and get a $100 gift certificate from amazon.com.

Questions and More Info



  • Guinea pigs? No! Marty's courses are well-tested, having been taught in 8 countries and dozens of US venues. We don't use your developers as guinea pigs for new materials.
  • Regurgitation? No! Marty developed all his own materials. No contract instructor regurgitating memorized PowerPoint slides.
  • Green? No! Marty is an experienced developer, and is the author of 6 popular Java EE texts from Prentice Hall. The course gives best practices and real-world strategies. No newbie instructor dodging tough questions.

Syllabus

The Big Data Economy

  • Data! Data! Data!
  • Data Economy
  • Data Analytics
  • Data Science
  • Traditional Data Processing Technologies

Apache Hadoop Architecture and Ecosystem

  • Hadoop Background
  • Hadoop Architecture
  • Hadoop and RDBMS
  • Hadoop Subprojects
  • Hadoop Distributions
  • Hadoop Documentation

Setting up Hadoop

  • Installing Hadoop
  • Configuring Hadoop
  • Starting Hadoop
  • Running Hadoop Clients
  • Browsing Hadoop UI Consoles

HDFS Architecture

  • Hadoop 1.0 HDFS Architecture
  • Hadoop 1.0 HDFS Architectural Capabilities - – Performance, Scalability, Availability, Installability, Comnfigurability, Operability, Usability, Security
  • Hadoop 2.0 HDFS Architecture

HDFS Programming Basics

  • Hadoop Configuration API
  • HDFS API Overview
  • HDFS File CRUD API
  • HDFS Directory CRUD API

HDFS Programming Advanced

  • File Compression Decompression
  • Type Serialization Deserialization
  • Sequence Files

MapReduce Architecture

  • Hadoop 1.0 MapReduce Architecture
  • Hadoop 1.0 MapReduce Architectural Capabilities – Performance, Scalability, Availability, Installability, Comnfigurability, Operability, Usability, Programmability
  • Hadoop 2.0 MapReduce Architecture

MapReduce Programming Basics

  • MapReduce Programming Concepts – Map Phase and Reduce Phase
  • MapReduce API – Key Java Classes and their Hierarchy
  • Steps to Write a MapReduce Program

MapReduce Programming Intermediate

  • Setting Mapper Counts and Reducer Counts
  • MapReduce Configuration
  • Combiners
  • Partitioners
  • Speculative Execution
  • Task JVM Reuse
  • Compression

MapReduce Programming Advanced

  • Output Format
  • Custom data Format
  • Input Format
  • Built in Mappers and Reducers
  • Counters
  • Multithreading
  • Distributed Cache

MapReduce Streaming and Pipes

  • MapReduce using Hadoop Streaming
  • MapReduce using Hadoop Pipes

MapReduce Development Best Practices

  • Logging in Hadooop
  • Exception Handling
  • Running Jobs Locally
  • Unit Testing with MRUnit
  • Top 10 Hadoop Anti-Patterns

Querying Data using Hive

  • Hive Background
  • Hive Architecture
  • Downloading, Installing and Configuring Hive
  • Simple Hive Example
  • Loading Data into Hive
  • Hive Query Statements
  • Hive Schema Violations
  • Using Built-in Hive Functions
  • Partitioning Data using Hive
  • Joining Data

Querying Data using Pig

  • Pig Background
  • Architecture
  • Downloading, Installing and Configuring Pig
  • Running Pig
  • Pig Latin Language Basics
  • Core Relational Operators – DISTINCT, FILTER, SPLIT, ORDER BY, LIMIT, GROUP, FOREACH
  • Built-in Functions
  • Relational Join Operators
  • Debug Operators

Realtime Database using HBase

  • HBase Overview
  • Data Model
  • Architecture
  • Downloading, Installing and Configuring HBase
  • HBase Shell
  • HBase Java API for CRUD Operations