Big Data/Hadoop Developer

iConnect Group - Sterling, VA (30+ days ago)4.2


BIG DATA (Hadoop & Spark) Developer

Murthy

Voice: 1+(571) 635-9796

PROFESSIONAL SUMMARY:

  • 11+ years of experience in IT industry in complete project lifecycles which included Design, Development, Testing and Production Support in Big data and Java, J2EE Technologies
  • Having 4 years’ hands-on experience in Big Data Analysis using Hadoop, HDFS, MapReduce, Hive, Spark, Scala, Python, Avro, Parquet, Sqoop, Flume, Kafka, HBase, Cassandra, Cloudera Manager and Shell Scripting.
  • Experience in JAVA, J2EE technologies, Web Services, WebLogic, Oracle SOA Suit and Oracle Service Bus.
  • Strong experience creating real time data streaming solutions using Spark Core, Spark SQL & Data Frames, Spark Streaming, Kafka.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and from RDBMS to HDFS
  • Expertise in writing MapReduce programs and UDF’s to incorporate complex business logic into Hive queries in the
  • Experience in Avro, Parquet and NoSQL technologies like HBase, Cassandra
  • Experience in designing Time Driven and Data Driven automated Oozie workflows
  • Hands-on experience in Spring Boot and Micro Services Development using REST
  • Hands-on Experience in Spring, Hibernate, MVC Architecture, Struts Framework
  • Experience in JSP, Servlets, JDBC, SOAP, XSD, XML, AJAX, ANT, JUnit and TestNG
  • Experience in Web Services Restful, SOAP, WSDL, Apache Axis2, JaxB, XMLBeans
  • Experience and exposure to Applications servers like WebLogic, WebSphere and JBoss.
  • Experience in Cloud Computing TechnologyAWS and Azure
  • Architecting and Developing highly scalable and large-scale distributed data processing systems using Cloud Platform - Amazon Web Services (AWS)
  • Expertise in triggering Spark jobs on the AWS – Elastic Map Reduce (EMR) cluster resources and perform fine tuning based on the cluster scalability.
  • Expertise in developing Machine Learning algorithms using Apache Spark.
  • Architected and provisioned cloud infrastructure using AWS services like EC2, S3, EBS, SQS, ELB, VPC, DynamoDB, Redshift.
  • Designed and developed Big Data analytical solution using Hadoop, Hive, Spark and Amazon EMR.
  • Extensive experience in administering and deploying applications on WebLogic Server, WebSphere Application Server.
  • Hands-on experience in developing ETL jobs using Informatica.
  • Adept in dealing with people and leading teams. Mentored developers and evaluated their

CERTIFICATIONS:

  • SCJP (Sun certificate Java Programmer) in Java

EDUCATION:

Degree

University

Year of Passing

M.C.A

Acharya Nagarjuna University ,India

Jun/2005

TECHNICAL SKILLSOperating Systems

Windows and Unix

Languages

J2SE & J2EE

BIG DATA Frameworks

Hadoop, HDFS, Map/Reduce. SPARK ,HIVE, HBase, Cassandra Phoenix,Zookeper,Flume,OOzie Sqoop and Kafka , Flume, YARN

DEV OPs tools

Jenkins , Dockers, GitHub ,Chef,puppet,Ansible

Cloud Services

AWS services like S3, EC2VPC, Dynamo DB, and Redshift.

Azure.

Web Business Logic Technologies

Servlets,

Mailing Services

Java Mailing API

GUI

JSP, JavaScript, AJAX, jQuery

Enterprise Technologies

EJB2.x & EJB 3.x and JMS

Middle ware Technologies

Web Services, MQ and Oracle SOA12C( BPEL,OSB,BAM ) and OAG

Web Framework

Struts Framework and Spring Framework

ORM Framework

Hibernate, JPA (Java Persistence API)

Tools & Utilities

My Eclipse, ANT ,Maven and RAD

Parsing Technologies

SAX,DOM and JAX-B binding framework

Web/App Servers

Tomcat /Web logic /WebSphere

Testing

Unit Testing ,SOAP UI testing , SOA Testing , Performance Testing , Java Performance Tuning, Profiling, and Memory Management,

CA LISA Service Virtualization automation tool

Project profile:

Client: HSBC Bank April 2017 - June 2018

Role: Big Data Analyst

Description:

GRAP is migration project form NiteZZa, Teradata, RDBMS to HDFS system.

GRAP is an enterprise credit risk and risk appetite data, analytics and reporting application supporting a variety of business use cases for enterprise risk appetite (ERA) and global portfolio strategies (GRA), lines of business, namely risk appetite governance, monitoring and reporting, Moody’s risk frontier simulations, concentration impact analysis, responsible growth, credit risk analytics, commercial loss, forecasting benchmarking process and adhoc reporting and analytics.

Responsibilities:

  • Involved in requirement study, design, development, unit testing
  • Involved in importing the data from various formats to HDFS environment.
  • Importing & exporting data from Netezza, Teradata to HDFS using SQOOP
  • Involved in implementing the shell script
  • Involved in sourcing the data to HDFS from excel sheets using Java
  • Involved in writing and configuration of Autopsy’s jobs
  • Involved in Data Quality and Data Integrity implementation
  • Involved in creating Hive tables, loading with data, formation of results and writing hive queries and writing the UDF’s for Hive
  • Involved in development Spark workflows
  • Implementing the workflow using oozie for various business scenarios.
  • Involve in all the phases of the SDLC using Agile Scrum methodology.

Environment: Hadoop, Map Reduce (YARN), HDFS, Hive, HBase, Phoenix, Sqoop, Flume, Oozie, Spark, Python,Kafka,Cloudera Manager, Hue UI, Java (JDK 1.6), Eclipse, SVN, DB2, Netezza, Teradata, UNIX Shell, Rally and Cloudera Distribution, Kafka Autosys,Tableau,GIT

Client: Dell Aug2016 -Jan 2017

Role: Big Data Analyst

Project Description:

Dell Get the RDBMS data (Oracle/MySQL) and SFTP data, store in AWS-S3. Import and Export the data from OraOop (through Sqoop) and store in S3. Apply Transformation rules on the top of different datasets finally store in the desired output (CSV to JSON). Schedule the tasks in AWS-pipes, Scale up automatically based on data. Finally store the desired data in Dynamo DB, Red Shift and S3 in desired format. Create the dashboards, reports using Splunk.

Responsibilities:

  • For injection data from Oracle, MySQL to S3 which can be queried using hive and spark SQL tables.
  • Worked on Sqoop jobs for ingesting data from MySQL to Amazon S3
  • Created hive external tables for querying the data
  • Use Spark Data frame APIs to inject Oracle data to S3 and stored in Redshift.
  • Write a script to get RDBMS data to Redshift.
  • Process the datasets and apply different transformation rules on the top of different datasets.
  • Process the complex/nested JSON and Csv data using Data frame API.
  • Automatically scale-up the EMR Instances based on the data.
  • Apply Transformation rules on the top of Data Frames.
  • Run and schedule the Spark script in EMR Pipes.
  • Process Hive, csv, JSON, oracle data at a time (POC).
  • Validate and debug the script between source and destination.
  • Validate the source and final output data.
  • Test the data using Dataset API instead of RDD
  • Debug & test the process is reaching Client's expectations or not.
  • Query execution is trigger. Improve the process timing.
  • Based on new spark versions, applying different optimization transformation rules.
  • Debug the script to minimize the shuffling data

Environment: SparkSQL, Streaming, Sqoop, AWS (EMR, Elastic Search, S3, EMR, Dynamo DB, Pipes, Redshift)

Client: Dell July 2014 -June 2015

Role: Big Data Analyst

Project Description:

The objective of this project is to collecting offline and real time data from various sources and has a consolidated solution to host order and product data reports. Existing implementation uses OSB and Data mart OBIEE and it does not meet current business needs. It is experiencing performance degradation and scalability issues.

Proposed EBI interlock integrates Order information into the Enterprise to allow advanced analytics with other product data sources using Hadoop, HDFS, Hive, Spark, Kafka and NoSQL.

It Reduces cost of ownership for Business Intelligence Solution and delivers Business Intelligence Architecture to include loading, processing and storing into NoSQL data model for reporting, search ability, monitoring, dashboard and analytics. This integration will decouple order data consumption from downstream system and user reporting while providing a universal order data consumption mechanism and aligns BI solution with Business Intelligence architecture for Dell.

Responsibilities:

  • Requirement analysis and prepare context and architecture diagrams for the proposed systems
  • Developed Kafka Producer based Rest API to collect order, product payload data and send to Spark Streaming App
  • Designed and built Spark jobs to consume Order & Product data from Kafka and read Parquet
  • Worked on Spark SQL and Spark HBase Connecter API to read and write HBase data
  • Efficiently organized, managed and analyzed 100 TB of data and simplified data access to support Reporting, Predictive Analysis and Statistical Modelling
  • Designed and implemented REST API to invoke the Spark Job and HBase Data Model
  • Loaded structured data in to Hive tables and write hive scripts to analysis data
  • Developed Sqoop jobs to transfer data from several data sources such as Oracle to HDFS
  • Wrote Hive scripts and UDF’s that efficiently performed batch processing on HDFS to analyze and aggregate data of size 100 TB
  • Created RESTful web services to serve the data processed by Hadoop to external clients
  • Configured Oozie workflows to pull log data from FTP servers and transfer to HDFS for analysis

Environment: Apache Spark, Kafka, Java, Scala, Zookeeper, Hadoop, HDFS, Hive, Pig, Map Reduce, Flume, Sqoop, HBase, Maven, J2EE, Web Services, JSON, XML, Oracle BPEL, TestNG, WebLogic, Linux, TFS.

Client: Dell Nov 2013 -Jun 2014

Role: Sr.Developer

Migration Weblogic11g to Weblogic12c

Description:

Dell has various applications which are deployed in Weblogic11g application server. The migration from WL 11g to WL12c started in Q4 of 2013. The below Projects are part of migration.

Responsibilities

  • Involved in Analysis for Migrating the Oracle Web logic 11g to Oracle Web logic 12.1.2 version.
  • Involved in Design, Estimations, development & Unit testing phases for all the applications.
  • Involved in Environment set up for Web logic 12c track.
  • Involved in preparing the Design Documents (Technical Design Specification document).
  • Involved in New Web logic track set up for Dev, SIT, PREF, UAT and PROD environments.
  • Involved in configuring BPEL (Business Process Execution Language) and composite deployment using Hudson deployment tool.
  • Played the role of a developer and technical lead.
  • Involved in deploying the applications into web logic 12c sever and monitoring.
  • Involved in deploying applications on various non-production environments using Script and Console mode on web logic application
  • Troubleshooting the issues raised in Dev, SIT, UAT and Production environments.
  • Involved in install and configure Oracle SOA Suite 11g components in Oracle Web logic Server domains.
  • Monitor and manage SOA Components by using the Oracle Enterprise Manager Fusion Middleware Control Console to perform administrative tasks.

Environment: Java, J2EE, IBM MQ, Oracle SOA Suite BPEL, OSB, Web Services, Spring, Hibernate, WebLogic, Oracle DB, TestNG, TFS, Linux.

Client: Avaya System (US) Sep 2012 -Oct 2013

Role: Sr.Developer

DESCRIPTION: Avaya is the telecom domain web based application .Currently; no functionality exists for providing immediate support for customers out of service. At this time, customers with an out of service condition can generate SRs using the Avaya support site. A customer in a “service down” situation navigates to the Avaya support page. Once there, the signed in user will see and depress the Big Red Button

CONTRIBUTION

  • Requirements gathering and updated the design document.
  • Involved in estimate upcoming projects in same business line.
  • Impact Analysis of new enhancements on existing implementation
  • Extensively used jQuery ,Struts 2 Framework ,Spring Framework, and Hibernate Framework and Web Services for Presentation ,Control & Model layers
  • Developed Business Delegate classes for minimizing the tight coupling between Presentation tier and Business tier.
  • Involved in the application deployment of dev., stage, production servers.
  • Responsible for fixing bugs reported in Mercury Quality center.
  • Involved in monthly &quarterly release production.
  • Code moving dev to stage and QC environment

Environment: Java, J2EE, Hibernet, Web Services, spring, Hibernate, WebSphere, Clear Case & Clear Quest Tools Clear Case & Clear Quest Tools, RAD

Client: Zurich North America Jul 2011 –Aug 2012

Role: Sr.Developer

DESCRIPTION:

Zurich Financial Services is an insurance-based financial services provider with a global network that focuses its activities on its key markets in North America and Europe. Founded in 1872, Zurich is headquartered in Zurich, Switzerland. Claims are processed through the Advanced Computer Claim Entry Support System

CONTRIBUTION:

  • Requirements gathering and updated the design document.
  • Impact Analysis of new enhancements on existing implementation
  • Extensively used Struts & Hibernate Framework and Web Services for Presentation ,Control & Model layers
  • Involved Web Services in Care Point Customer Search project
  • Involved to writing business logic in state full session bean.
  • Involved to exposed to create generate WSDL file.
  • Involved to create customer information response as a XML tree Structure format
  • Involved to code moving into clear case.
  • Involved to deployment of dev environment.
  • Supporting enhanced project Development.
  • To release monthly and quarterly production ear
  • Unit testing

Client: CISCO systems Oct 2010 –May 2011

Role: Sr.Developer

DESCRIPTION: CCRM (Commitment Compliance and Revenue Manager) is the tool for managing revenue deferrals related to non-standard deals. Its primary user group is Deal Accounting Operations, yet it provides visibility to others who need insight into specific elements of the transactions, and it is Corporate Revenue’s sub ledger for all revenue and COGS (cost of goods sold) deferrals.

CONTRIBUTION

  • Requirements gathering and updated the design document.
  • Impact Analysis of new enhancements on existing implementation
  • Extensively used ADF Framework ,Spring Framework & Hibernate Framework for Presentation ,Control & Model layers
  • Developed Business Delegate classes for minimizing the tight coupling between Presentation tier and Business tier.
  • Developed SQL /PLSQL queries required.
  • Responsible for maintaining and implementing enhancements for Batch Job Engine
  • Involved in the application deployment of dev, stage, production servers.
  • Responsible for fixing bugs reported in Mercury Quality center.
  • Involved in monthly &quarterly release production.
  • Code moving dev to stage environment

Environment: Java, J2EE, Hibernet, Web Services, Struts spring, Hibernate, WebSphere, Clear Case & Clear Quest Tools Clear Case & Clear Quest Tools, RAD

Client: AT & T Jan 2009 -Sep 2010

Role: Sr.Developer

DESCRIPTION:

Global Customer Solution Manager (GCSM) application is Telecom domain application .

GCSM is a web-based application that allows users to perform critical steps in the sales process. These steps are collectively known as DPPCO (Design, Price, Propose, Contract, Order). It supports IP and Managed Services for multiple user communities to perform business function of Design, Price, Propose, Contract Renewal and Addendum and Ordering. The Application also provides workflow function to connect these business functions within GCSM as well as data interfaces with other applications.

CONTRIBUTION

  • To resolve the Production tickets issues daily.
  • To release monthly and quarterly production ear.
  • Code moving into ST1 & ST2 environment.
  • Unit testing
  • Requirement gathering from System Engineer (SE)

Environment: Java, J2EE, Hibernet, Web Services, Struts, Hibernate, WebSphere, Clear Case & Clear Quest Tools Clear Case & Clear Quest Tools, RAD

Client: Yes OPTUS Telecom, Australia Jan 2008 –Dec 2008

Role: Java Developer

DESCRIPTION:

Issue: The root cause of the eFulfillment Performance Issue is the manner in which the EJB’s are implemented. This means that the way in which the EJB’s are interacted in Service Layers is poor. The Data feed to EJB’s from GUI layer is a primary Bottle Neck, post our investigation. To implementation needs to be optimized, this loop execution is taking much time to complete if the number of objects is 4599.

CONTRIBUTION

  • Analysis and Design
  • To Prepare Algorithm for existing code
  • To Prepare Algorithm for PL/SQL code
  • Debugging for previous java code through logs.
  • To Analyzing for Business Logic & Implementation
  • Coding & Implementation ( Action Class)
  • Unit Testing

Environment: Java, J2EE, Java, JSP, JDBC, JSTL, EJB (Session and Entity), PL\SQL, Struts , WebLogic, Oracle DB, CVS, Toad

Client: Punjab&Sind Bank, India Jan 2007 -Dec 2007

Role: Java Developer

DESCRIPTION: Electronic Bill Payment (EBPP) is an application designed to work with electronic bill payments. You can make payments to all the billers and payees that are registered with Bill Desk. Easy accessibility. This project Contains 3 modules:

  • Account Holder
  • Bank Admin
  • Bill Desk.

CONTRIBUTION

  • Involving in View Biller and Delete Biller module.
  • Implementing and Testing.
  • Involving in URD and Prototype development.
  • Code Review

Environment: Java, J2EE, Java, JSP, JDBC, JSTL, EJB (Session and Entity), PL\SQL, Struts , OC4J server, Oracle DB, CVS, Toad , Java, Struts, ISO Messages

Job Type: Full-time

Required work authorization:

  • United States