Big Data Analysis & Spark Developer

iConnect Group - Sterling, VA (30+ days ago)4.2


  • 11+ years of experience in IT industry in complete project lifecycles which included Design, Development, Testing and Production Support in Big data and Java, J2EE Technologies
  • Having 4 years’ hands-on experience in Big Data Analysis using Hadoop, HDFS, MapReduce, Hive, Spark, Scala, Python, Avro, Parquet, Sqoop, Flume, Kafka, HBase, Cassandra, Cloudera Manager and Shell Scripting.
  • Experience in JAVA, J2EE technologies, Web Services, WebLogic, Oracle SOA Suit and Oracle Service Bus.
  • Strong experience creating real time data streaming solutions using Spark Core, Spark SQL & Data Frames, Spark Streaming, Kafka.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and from RDBMS to HDFS
  • Expertise in writing MapReduce programs and UDF’s to incorporate complex business logic into Hive queries in the process of performing high level data analysis using Hive
  • Experience in Avro, Parquet and NoSQL technologies like HBase, Cassandra
  • Experience in designing Time Driven and Data Driven automated Oozie workflows
  • Hands-on experience in Spring Boot and Micro Services Development using REST
  • Hands-on Experience in Spring, Hibernate, MVC Architecture, Struts Framework
  • Experience in JSP, Servlets, JDBC, SOAP, XSD, XML, AJAX, ANT, JUnit and TestNG
  • Experience in Web Services Restful, SOAP, WSDL, Apache Axis2, JaxB, XMLBeans
  • Good Experience and exposure to relational databases like Oracle, MySQL and SQLServer
  • Experience and exposure to Applications servers like WebLogic, WebSphere and JBoss.
  • Experience in Cloud Computing TechnologyAWS and Azure
  • Architecting and Developing highly scalable and large-scale distributed data processing systems using Cloud Platform - Amazon Web Services (AWS)
  • Expertise in developing Machine Learning algorithms using Apache Spark
  • Architected and provisioned cloud infrastructure using AWS services like EC2, S3, EBS, SQS, ELB, VPC, Dynamo DB, Redshift.
  • Adept in dealing with people and leading teams. Mentored developers and evaluated their performance.
  • . Expertise in triggering Spark jobs on the AWS – Elastic Map Reduce (EMR) cluster resources and perform fine tuning based on the cluster scalability.


SCJP (Sun certificate Java Programmer) in Java


Masters in Computer Applications (MCA) from Acharya Nagarjuna University, India 2005


Operating Systems: Windows and UNIX

Languages: J2SE & J2EE

BIG DATA Frameworks: Hadoop (HDFS, Map/Reduce. SPARK, HIVE, HBase, Cassandra Phoenix, Zookeeper, Flume, OOzie Sqoop and Kafka)

Cloud Services: AWS services like S3, EC2VPC, Dynamo DB, and Redshift, Azure.

DEV OPs tools: Jenkins, Dockers, GitHub, Chef, Puppet

Web Business Logic Technologies: Servlets

Mailing Services: Java Mailing API

GUI: JSP, JavaScript, AJAX, jQuery

Enterprise Technologies: EJB2.x & EJB 3.x and JMS

Middleware Technologies: Web Services, IBM MQ and Oracle SOA12C

(BPEL, OSB, BAM) and OAG (Oracle API Gateway)

Web Frameworks: Struts Framework and Spring Framework

ORM Frameworks: Hibernate, JPA (Java Persistence API)

Tools & Utilities: My Eclipse, ANT, Maven and RAD

Parsing Technologies: SAX, DOM and JAX-B binding framework

Web/App Servers: Tomcat /Web logic /WebSphere

Testing: Unit Testing ,SOAP UI testing , SOA Testing , Performance Testing , Java Performance Tuning, Profiling, and Memory Management, CA LISA Service Virtualization automation tool


Company: Wipro Technologies Pvt. Ltd, April 2017 - June 2018

Client: National Grid, USA

Role: Big Data & Spark Developer

Project Name: Global Risk Analytics Platform (GRAP)

Team Size: 20


GRAP is migration project form NiteZZa, Teradata, RDBMS to HDFS system. GRAP is an enterprise credit risk and risk appetite data, analytics and reporting application supporting a variety of business use cases for enterprise risk appetite (ERA) and global portfolio strategies (GRA), lines of business, namely risk appetite governance, monitoring and reporting, Moody’s risk frontier simulations, concentration impact analysis, responsible growth, credit risk analytics, commercial loss, forecasting benchmarking process and adhoc reporting and analytics.


  • Involved in requirement study, design, development, unit testing
  • Involved in importing the data from various formats to HDFS environment.
  • Importing & exporting data from Netezza, Teradata to HDFS using SQOOP
  • Involved in implementing the shell script
  • Involved in sourcing the data to HDFS from excel sheets using Java
  • Involved in writing and configuration of Autopsy’s jobs
  • Involved in Data Quality and Data Integrity implementation
  • Involved in creating Hive tables, loading with data, formation of results and writing hive queries and writing the UDF’s for Hive
  • Involved in development Spark workflows
  • Implementing the workflow using oozie for various business scenarios.
  • Involve in all the phases of the SDLC using Agile Scrum methodology.

Environment: HDFS( Hadoop Distributed File System), Hadoop, Map Reduce (YARN), HDFS, Hive, HBase, Phoenix, Sqoop, Flume, Oozie, Spark, Python, Kafka, Cloudera Manager, Hue UI, Java (JDK 1.6), Eclipse, SVN, DB2, Netezza, Teradata, UNIX Shell, Rally and Cloudera Distribution, Kafka Autosys, Tableau, GIT, Linux

Company: UST Global Pvt. Ltd Aug2015-Jan 2017

Client: Dell International Services, USA

Role: Java/ Big Data & Spark Developer

Project Name: Global Dell Cloud Platform

Project Description:

Dell Get the RDBMS data (Oracle/MySQL) and SFTP data, store in AWS-S3. Import and Export the data from OraOop (through Sqoop) and store in S3. Apply Transformation rules on the top of different datasets finally store in the desired output (CSV to JSON). Schedule the tasks in AWS-pipes, Scale up automatically based on data. Finally store the desired data in Dynamo DB, Red Shift and S3 in desired format. Create the dashboards, reports using Splunk.


  • For injection data from Oracle, MySQL to S3 which can be queried using hive and spark SQL tables.
  • Worked on Sqoop jobs for ingesting data from MySQL to Amazon S3
  • Created hive external tables for querying the data
  • Use Spark Data frame APIs to inject Oracle data to S3 and stored in Redshift.
  • Write a script to get RDBMS data to Redshift.
  • Process the datasets and apply different transformation rules on the top of different datasets.
  • Process the complex/nested JSON and Csv data using Data frame API.
  • Automatically scale-up the EMR Instances based on the data.
  • Apply Transformation rules on the top of Data Frames.
  • Run and schedule the Spark script in EMR Pipes.
  • Process Hive, csv, JSON, oracle data at a time (POC).
  • Validate and debug the script between source and destination.
  • Validate the source and final output data.
  • Test the data using Dataset API instead of RDD
  • Debug & test the process is reaching Client's expectations or not.
  • Query execution is trigger. Improve the process timing.
  • Based on new spark versions, applying different optimization transformation rules.
  • Debug the script to minimize the shuffling data
  • Environment: SparkSQL, Streaming, Sqoop, AWS (EMR, Elastic Search, S3, EMR, Dynamo DB, Pipes, Redshift)

Company: UST Global Pvt. Ltd Aug2014 –July2015

Client: Dell International Services, USA

Role: Java/Big Data Developer

Project Name: Agile PLM (Project Lifecycle Management)

Team Size: 20

Project Description:

The object of this program is to provide better reporting experience and have a consolidated solution to host product data reports. Existing implementation uses OBIEE as reporting tool for Agile PLM. The PLM BI solution does not meet current business needs; it is experiencing performance degradation and scalability issues. The PLM BI solution was originally designed to provide user operational reporting. However, it was extended to allow a reduced number of downstream systems to access the data mart directly. Today, ~20 systems are querying the data mart. The PLM BI solution does not contain current data, as it takes ~10 hours for the data upload to complete to ADM, another ~10 hours for data upload to MDS; a process that used to take ~4 hours back in 2010. Several BI reports contain stale data, insufficient data or unnecessary data; other reports or searches are labor intensive; while complex reports require IT development resources.


  • Involved in POC (Proof of Concept) for processing large amount of data (in TB) doing some manipulation on the data.
  • Involved in Informatica Big Data Edition configuration.
  • Involved in Java code writing for large silted xml file into single xml file.
  • Involved in Multithread Batching processing for source XML.
  • Involved in shall scripting for zip xml into unzip xml in HDFS system.
  • Involved in creating the tables in Hive.
  • Involved in data loading into Hive.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance Solved performance issues in Hive scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
  • Installed and configured Hadoop cluster in Test and Production environments.
  • Performed both major and minor upgrades to the existing CDH cluster.
  • Very good experience in monitoring and managing the Hadoop cluster using Cloudera Manager.
  • Developed Sqoop jobs to transfer data from several data sources such as Oracle to HDFS

Environment: Teradata, HIVE, HDFS (Hadoop Distributed File System), Informatica Big Data Edition, Maven, J2EE, Web Services, JSON, XML, Oracle Service Bus,

WebLogic, Linux, TFS.

Client: Dell International Services Oct 2013 –July2014

Company: UST Global Pvt. Ltd

Role: Java/ Hadoop Developer

Team Size: 6

Project Name: GOLF Application


Dell has various applications which are deployed in Weblogic11g application server. The migration from WL 11g to WL12c started in Q4 of 2013. The below Projects are part of migration.


  • Involved in Analysis for Migrating the Oracle Web logic 11g to Oracle Web logic 12.1.2 version.
  • Involved in Design, Estimations, development & Unit testing phases for all the applications.
  • Involved in Environment set up for Web logic 12c track.
  • Involved in preparing the Design Documents (Technical Design Specification document).
  • Involved in New Web logic track set up for Dev, SIT, PREF, UAT and PROD environments.
  • Involved in configuring BPEL (Business Process Execution Language) and composite deployment using Hudson deployment tool.
  • Played the role of a developer and technical lead.
  • Involved in deploying the applications into web logic 12c sever and monitoring.
  • Involved in deploying applications on various non-production environments using Script and Console mode on web logic application
  • Troubleshooting the issues raised in Dev, SIT, UAT and Production environments.
  • Involved in install and configure Oracle SOA Suite 11g components in Oracle Web logic Server domains.
  • Monitor and manage SOA Components by using the Oracle Enterprise Manager Fusion Middleware Control Console to perform administrative tasks.

Environment: Java/J2EE ,WSDL, SOAP, BPEL, OSB, BigData, Oracle 11g , Hadoop HIVE, Web Services (Apache Axis2), EJB2.x, EJB3.x, Web Logic Server 12C, IBM MQ ,SOA (BEPL , OSB and BAM), My Eclipse blue edition 10.5, JDeveloper 10G, MQ Explorer, TOAD, ANT, Windows 7 Basic

Company: Computer Sciences Corporation (CSC) Ltd Sep2012- Oct2013

Client: Avaya System Pvt. Ltd (USA)

Role: Java Application Developer

Team Size: 3

Project Name: Tango Big Red Button

Project Description:

Avaya is the telecom domain web based application .Currently; no functionality exists for providing immediate support for customers out of service. At this time, customers with an out of service condition can generate SRs using the Avaya support site. A customer in a “service down” situation navigates to the Avaya support page. Once there, the signed in user will see and depress the Big Red Button


  • Requirements gathering and updated the design document.
  • Involved in estimate upcoming projects in same business line.
  • Impact Analysis of new enhancements on existing implementation
  • Extensively used JQquery ,Struts 2 Framework ,Spring Framework, and Hibernate Framework and Web Services for Presentation ,Control & Model layers
  • Developed Business Delegate classes for minimizing the tight coupling between Presentation tier and Business tier.
  • Involved in the application deployment of dev., stage, production servers.
  • Responsible for fixing bugs reported in Mercury Quality center.
  • Involved in monthly &quarterly release production.
  • Code moving dev to stage and QC environment

Environment: Java/J2EE, JSP, JSP, jQuery, Ajax, JavaScript, Oracle 11g, Siebel, Struts 2.2 & Spring 2.5 & 3.0, Hibernate 2.5 & 3.0, Web Services (WS), Java Mail API, Web Logic Server 10.6, SOA(BPEL,OSB,BAM), My Eclipse blue edition 10.5, Maven, SVN, Windows 7 Basic

Client: Zurich Insurance, USA June 2011 –Aug2012

Company: Computer Sciences Corporation (CSC) Ltd

Role: Java Application Developer

Team Size: 3

Project Name: eZACCESS (Advanced Computer Claim Entry Support Sys)

Project Description:

Zurich Financial Services is an insurance-based financial services provider with a global network that focuses its activities on its key markets in North America and Europe. Founded in 1872, Zurich is headquartered in Zurich, Switzerland. Claims are processed through the Advanced Computer Claim Entry Support System


  • Requirements gathering and updated the design document.
  • Impact Analysis of new enhancements on existing implementation
  • Extensively used Struts & Hibernate Framework and Web Services for Presentation ,Control & Model layers
  • Involved Web Services in Care Point Customer Search project
  • Involved to writing business logic in state full session bean.
  • Involved to exposed to create generate WSDL file.
  • Involved to create customer information response as a XML tree Structure format
  • Involved to code moving into clear case.
  • Involved to deployment of dev environment.
  • Supporting enhanced project Development.
  • To release monthly and quarterly production ear
  • Unit testing

Environment: JAJA/ J2EE, DB2, Struts, EJB, Hibernate, Web Services, WebSphere 6.1, RAD, Clear Case & Clear Quest Tools, Windows XP.

Company: IBM India Pvt. LtdDec 2008 – May 2011Client: CISCO Systems, USA

Role: Java Developer

Team Size: 3

Project Name: Commitment Compliance and Revenue Manager (CCRM)

Project Description:

CCRM (Commitment Compliance and Revenue Manager) is the tool for managing revenue deferrals related to non-standard deals. Its primary user group is Deal Accounting Operations, yet it provides visibility to others who need insight into specific elements of the transactions, and it is Corporate Revenue’s sub ledger for all revenue and COGS (cost of goods sold) deferrals.


  • Requirements gathering and updated the design document.
  • Impact Analysis of new enhancements on existing implementation
  • Extensively used ADF Framework ,Spring Framework & Hibernate Framework for Presentation ,Control & Model layers
  • Developed Business Delegate classes for minimizing the tight coupling between Presentation tier and Business tier.
  • Developed SQL /PLSQL queries required.
  • Responsible for maintaining and implementing enhancements for Batch Job Engine
  • Involved in the application deployment of dev, stage, production servers.
  • Responsible for fixing bugs reported in Mercury Quality center.
  • Involved in monthly &quarterly release production.
  • Code moving dev to stage environment

Environment: JAVA/J2EE, Struts, Spring, Hibernate, ADF Framework, Web Services, OC4J, Eclipse, CVS, Oracle 9i/PLSQL, Windows XP

Client: AT & T, USA

Role: Java Developer

Team Size: 30

Project Name: GCSM

Project Description:

Global Customer Solution Manager (GCSM) application is Telecom domain application.

GCSM is a web-based application that allows users to perform critical steps in the sales process. These steps are collectively known as DPPCO (Design, Price, Propose, Contract, Order). It supports IP and Managed Services for multiple user communities to perform business function of Design, Price, Propose, Contract Renewal and Addendum and Ordering. The Application also provides workflow function to connect these business functions within GCSM as well as data interfaces with other applications.


  • To resolve the Production tickets issues daily.
  • To release monthly and quarterly production ear.
  • Code moving into ST1 & ST2 environment.
  • Unit testing
  • Requirement gathering from System Engineer (SE) people
  • Supporting enhanced project Development

Environment: Core Java, Servlets, JSP, WebServices, Oracle8i, WebServer 6.1, Eclipse, Clear Case & Clear Quest Tools, Windows XP.

Company: Satyam Computer Services Ltd. Dec 2006 – Dec 2008

Client: Yes OPTUS Telecom, Australia

Role: Java Developer

Team Size: 2

Project Name: Optus-eFulfillment Optimization

Project Description:

The root cause of the eFulfillment Performance Issue is the manner in which the EJB’s are implemented. This means that the way in which the EJB’s are interacted in Service Layers is poor. The Data feed to EJB’s from GUI layer is a primary Bottle Neck, post our investigation. To implementation needs to be optimized, this loop execution is taking much time to complete if the number of objects is 4599.


  • Analysis and Design
  • To Prepare Algorithm for existing code
  • To Prepare Algorithm for PL/SQL code
  • Debugging for previous java code through logs.
  • To Analyzing for Business Logic & Implementation
  • Coding & Implementation (Action Class)
  • Unit Testing

Environment: Java, JSP, JDBC, JSTL, EJB (Session and Entity), PL\SQL Programming, Struts, Entity Bean, Oracle 9i, Weblogic 8.1, My Eclipse 5.5.1 GA, Toad, CVS, Windows 2000 Server, Maven 2.0.4

Client: PSB (Punjab & Sind Bank),India

Role: Java Developer

Team Size: 5

Project Name: Electronic Bill Payment (EBP)

Project Description

Electronic Bill Payment (EBPP) is an application designed to work with electronic bill payments. You can make payments to all the billers and payees that are registered with Bill Desk Easy accessibility. This project Contains 3 modules:

  • Account Holder
  • Bank Admin
  • Bill Desk.


  • Involved in View Biller and Delete Biller module.


  • Involved in URD and Prototype development.
  • Code Review

Environment: Java, Struts, JSP, JDBC, JSTL, ISO Messages, Oracle 8i, JDeveloper, OC4J (Oracle Container for J2EE) server, CVS (Subversion), Windows 2000 Server.

Job Types: Full-time, Contract

Required work authorization:

  • United States