Customized Hadoop Training Courses

“Wonderful. In 20 years, this is the best organized, most pragmatic and enjoyable course I've taken.”

“The best instructor-led course I have attended, by far.”

“Best short course ever!”

“Compared to the other short courses I have taken, this one completely redefined my scale from 1-10.”

“In my 35+ years of taking technical courses, Marty's classes consistently come out ranking #1 on my list. Highly relevant material is delivered with enthusiasm, humor, and a high degree of class interaction that is unmatched anywhere.”

more student reviews

Looking for practical, hands-on training on Hadoop taught onsite at your organization? Look no further! These courses are personally developed and taught by an experienced Hadoop developer that spoke several times on Hadoop at JavaOne and uses Hadoop daily for real-life apps. No contract instructor regurgitating someone else's materials! Coreservlets.com has given courses on Hadoop, Java 7, Java 8, JSF, PrimeFaces, Android, Ajax/jQuery, Spring, Hibernate, RESTful Web Services, servlets, JSP, GWT, and other Java EE topics to dozens of organizations in the US, Canada, Mexico, Australia, Japan, Puerto Rico, India, Norway, Cambodia, and the Philippines, all to rave reviews.

If you have a group of at least eight interested developers (10 for courses outside North America), contact Marty to arrange a course at your location. Onsite courses are easier administratively, are better for clients since the topics and pace can be customized, are more cost effective for students since no travel is required, and are more convenient (for companies in the Baltimore/Washington area) because the schedule is flexible (e.g. afternoons or evenings instead of n consecutive days). However, if you have too few developers for an onsite course, check out the upcoming public Hadoop training course in Maryland.

Expand some of the following sections for more details and various course options. Then email hall@coreservlets.com to discuss which options would work best for your developers.

Overview and Course Options

Apache Hadoop is a framework that allows for the distributed processing of massive data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Hadoop has established itself as an industry-leading platform for deploying cloud-based applications and services. The Hadoop eco-system is large, and it includes such popular products as HDFS, Map/Reduce, HBase, Zookeeper, Oozie, Pig, and Hive. However, with such versatility comes complexity and difficulty in deciding on appropriate use cases. This course breaks down the walls of complexity by providing a practical approach to developing Java applications on top of the Hadoop platform.

Hadoop Logo The course presents the material as small building blocks with a thorough coverage of each component in the Hadoop stack. We begin by looking at Hadoop’s architecture and its underlying parts with topdown identification of component interactions within the Hadoop eco-system. The course then provides in-depth coverage of Hadoop Distributed FileSystem (HDFS), HBase, Map/Reduce, Oozie, Pig and Hive. To re-enforce concepts, each section is followed by a set of hands-on exercises. The exercises come in various complexities to accommodate developers with various levels of expertise.

There are two main variations of the course:

  • Hadoop training
  • Hadoop training with prep for the Cloudera certification exam

For most organizations, the first option works best, since the exam prep distracts a bit from the main course topics. However, it does not distract too much, and if having certification is important for your organization, including exam prep is worth the small cost in overall course effectiveness.

Email hall@coreservlets.com to discuss which options would work best for your developers.

About the Instructor

Karthik Shyamsunder has been working with Apache and Big Data since 2009, and was the main architect for a company-wide storage and computing platform at a major Internet Infrastructure company. Besides having extensive industry experience, Karthik also designed and taught the first Big Data Processing using Apache Hadoop course at the Johns Hopkins University "Engineering for Professionals" program, where he was awarded the Excellence in Teaching award. Karthik also gave three talks on Big Data at the national JavaOne conference.

Karthik is now the Principal Architect and consultant for a major Internet Infrastructure company, architecting solutions, and advising the software engineering department on industry and technology trends. He has given talks and lectures at various conferences including JavaOne, Northern Virginia Software Symposium, and AJAX World. In 2011, Karthik was inducted into the Oracle/JavaOne hall of fame and obtained the "Rock Star" award for his JavaOne presentations.

Intended Audience

The course is aimed at developers with moderate-to-strong previous Java experience. The course will move much too fast for newcomers to Java.

Syllabus Choices

Here is a potpourri of possible topics. As discussed above, the topics covered in any course are customizable, but the two most common options are: with or without time spent preparing for the Cloudera Hadoop certificatoin exam. When you book a course, we will first decide on the exact topics based on your needs and level of experience. Email hall@coreservlets.com to inquire about a custom course at your location.

The Big Data Economy

  • Data! Data! Data!
  • Data Economy
  • Data Analytics
  • Data Science
  • Traditional Data Processing Technologies

Apache Hadoop Architecture and Ecosystem

  • Hadoop Background
  • Hadoop Architecture
  • Hadoop and RDBMS
  • Hadoop Subprojects
  • Hadoop Distributions
  • Hadoop Documentation

Setting up Hadoop

  • Installing Hadoop
  • Configuring Hadoop
  • Starting Hadoop
  • Running Hadoop Clients
  • Browsing Hadoop UI Consoles

HDFS Architecture

  • Hadoop 1.0 HDFS Architecture
  • Hadoop 1.0 HDFS Architectural Capabilities - – Performance, Scalability, Availability, Installability, Comnfigurability, Operability, Usability, Security
  • Hadoop 2.0 HDFS Architecture

HDFS Programming Basics

  • Hadoop Configuration API
  • HDFS API Overview
  • HDFS File CRUD API
  • HDFS Directory CRUD API

HDFS Programming Advanced

  • File Compression Decompression
  • Type Serialization Deserialization
  • Sequence Files

MapReduce Architecture

  • Hadoop 1.0 MapReduce Architecture
  • Hadoop 1.0 MapReduce Architectural Capabilities – Performance, Scalability, Availability, Installability, Comnfigurability, Operability, Usability, Programmability
  • Hadoop 2.0 MapReduce Architecture

MapReduce Programming Basics

  • MapReduce Programming Concepts – Map Phase and Reduce Phase
  • MapReduce API – Key Java Classes and their Hierarchy
  • Steps to Write a MapReduce Program

MapReduce Programming Intermediate

  • Setting Mapper Counts and Reducer Counts
  • MapReduce Configuration
  • Combiners
  • Partitioners
  • Speculative Execution
  • Task JVM Reuse
  • Compression

MapReduce Programming Advanced

  • Output Format
  • Custom data Format
  • Input Format
  • Built in Mappers and Reducers
  • Counters
  • Multithreading
  • Distributed Cache

MapReduce Streaming and Pipes

  • MapReduce using Hadoop Streaming
  • MapReduce using Hadoop Pipes

MapReduce Development Best Practices

  • Logging in Hadooop
  • Exception Handling
  • Running Jobs Locally
  • Unit Testing with MRUnit
  • Top 10 Hadoop Anti-Patterns

Querying Data using Hive

  • Hive Background
  • Hive Architecture
  • Downloading, Installing and Configuring Hive
  • Simple Hive Example
  • Loading Data into Hive
  • Hive Query Statements
  • Hive Schema Violations
  • Using Built-in Hive Functions
  • Partitioning Data using Hive
  • Joining Data

Querying Data using Pig

  • Pig Background
  • Architecture
  • Downloading, Installing and Configuring Pig
  • Running Pig
  • Pig Latin Language Basics
  • Core Relational Operators – DISTINCT, FILTER, SPLIT, ORDER BY, LIMIT, GROUP, FOREACH
  • Built-in Functions
  • Relational Join Operators
  • Debug Operators

Realtime Database using HBase

  • HBase Overview
  • Data Model
  • Architecture
  • Downloading, Installing and Configuring HBase
  • HBase Shell
  • HBase Java API for CRUD Operations

Course Reviews

Here are a few of the reactions of previous students in coreservlets classes; we are confident that you will have the same reaction. So confident, in fact, that we offer an unconditional guarantee: if you are not satisfied with the course, we will refund the full cost.

“In my 35+ years of taking technical courses, Marty's classes consistently come out ranking #1 on my list. Highly relevant material is delivered with enthusiasm, humor, and a high degree of class interaction that is unmatched anywhere. ”

“Masterful, quick-paced presentation. Witty, but never trite. Discussed but never belabored. A Java ed-venture. A gaggle of Goslings could not have done better!”

“Wonderful. In 20 years, this is the best organized, most pragmatic, and enjoyable course I've taken.”

“Excellent course. The best instructor-led course I have attended, by far. The course was exactly what I was hoping for.”

“Best short course ever!”

“Compared to the other short courses I have taken, this one completely redefined my scale from 1-10.”

“This course was AWESOME. I came with very little knowledge of JSF and now I look forward to using it on my next project.”

“GREAT class [JSF]. Do you make house calls?”

Promos for Marty Hall at GIDS conference in Bangalore

Ads for Marty at GIDS conference in India

“I'm not easily pleased by industry courses. Luckily, not all presenters are as good as Marty, otherwise University lecturers like myself would be out of work.”

“This was, by far, the best Java training course I have attended... After 4 days, I feel prepared to dive into JSF development with a solid understanding of the basics. I know this is going to make my life easier over the next year. Thank you!”

“Marty is a fantastic teacher and communicator. I thoroughly enjoyed the course and it was timely for my current work.”

For more reviews, please see the course review page.

Other Onsite Java EE Training Courses

Coreservlets.com offers customized onsite courses on Java 7, Java 7, JSF 2, PrimeFaces, Hadoop (including certification prep), Android programming, Ajax/jQuery, Spring, Hibernate, RESTful Web Services, GWT, and custom combinations of topics. Available at any location worldwide.

  • Guinea pigs? No! Our courses are well-tested, having been taught in 9 countries and dozens of US venues. We don't use your developers as guinea pigs for new materials.
  • Regurgitation? No! Our instructors developed all their own materials. No contract instructor regurgitating memorized PowerPoint slides.
  • Green? No! Our instructors are experienced developers, and most have authored popular Java EE texts, spoken at JavaOne, and done extensive onsite training. The course gives best practices and real-world strategies. No newbie instructor dodging tough questions.

For more details, please see the training course home page, or email hall@coreservlets.com.

Public Training Courses

Coreservlets normally runs on-site training courses at customer locations. This is easier administratively, is better for clients since the topics and schedule can be customized, and is more cost effective for students since no travel is required. However, due to demand from those who do not have enough students for an on-site course, we periodically run public training course at the Johns Hopkins Dorsey Center in Elkridge MD. These courses feature the same experienced instructors as our onsite courses, and are co-sponsored by Johns Hopkins Engineering for Professionals.

JHU/EP Logo

For more details, please see the public course schedule.