Harry Lu
Summary
Data Scientist with strong analytical and engineering skills, has
industry experience in designing and developing full-stack big data
pipeline and visualization. Solid experience of using PySpark, R,
SQL/HQL on data science and analytics projects. Experience of leading
end-to-end projects. Great at designing and implement useful metrics to
answer quantitative and qualitative problems, provide insights and
solutions. Deployed several Shiny Dashboards on interactive
visualizations and real-time web wrangling.
Education
Cornell University
Aug 2018 - May 2019
Degree: Master of Professional Studies in Applied
Statistics
Concentration: Data Science
LaiOffer (Online)
May 2018 - Dec 2018
Area: Software Development Essentials
Topics: Data Structure, Algorithms, OO Design, System
Design
Language: Java
University of Massachusetts Amherst
Sep 2013 - May 2017
Degree: Bachelor of Science in Mathematics
Concentrations: Applied Mathematics, Statistics,
Actuary
Cumulative GPA: 3.52 | Major GPA:
3.80
Minor: Chemistry
Dean’s List
Jan 2014 - Jan 2017
Director Scholarship
Sep 2013 - May 2017
Work Experiences
Citigroup, Inc. (New York, NY)
Data Scientist
Oct 2019 - Present
- Worked with global teams on improving both efficiency and quality of
Global Surveillance Solution (TotalConduct) by data science, analytics,
and engineering. Responsible on performing research, developing
automated analytical tools of the platform, and enhancing the data
storage of internal data lake.
- Worked with Olympus data, led a data reconciliation project between
Olympus database and market surveillance database, built data ETL
pipelines to automate the alerting and reporting mechanisms. Role: Lead
Data Scientist.
- Led several end-to-end projects, including defining requirements,
transforming requirements into technical design, implementations (data
engineering + software developing), code review, integrating the final
product with production cluster.
- Designed and implemented big data analytical pipeline using Spark
algorithms to calculate multiple metrics of different surveillance
rules, helped clients to understand how the surveillance rules
performed, and productionized the PySpark application on the cluster.
Role: Data Scientist + Data Engineer + Project Owner.
- Performed ad hoc analytics to provide answers to client’s questions.
Role: Data Scientist.
- Led and coached other members on project design and technical
skills.
- Developed many python packages for the team on data extraction and
big data calculation algorithms across multiple projects, including ETLs
of multiple daily used tables. Maintained several bitbucket repositories
including these python packages. Role: Code Owner + Developer.
- Led on creating curated dataset, built a framework with Spark, and
coached a team on the skills. Role: Project owner and lead, transferred
to other team members.
Internship Experiences
Crane Management Services, Inc. (Los Alamitos, CA)
Statistical Analyst of Marketing
Mar 2018 - Aug 2018
- Accomplished extensive market research for the company (74 pages),
including competitors’ information, logistics, engineering demands,
future projects and labors. Competitor’s location data was collected,
cleaned and visualized with Excel and R.
- Predicted future demands based on the changes in the industry.
- Visualized competitors’ information and summarized market data with
R to help project managers make decisions.
- Analyzed and determined industry demand that assisted in projecting
the company’s direction in the marketplace.
- Enhanced relationships with both domestic customers and
international customers.
- Rewrote and updated the User Operating Manual of SPMT in order to
let the company enter new business areas.
Fishmarket Inc. (Louisville, KY)
Website Developer
Oct 2017 - Aug 2018
- Constructed, maintained and updated the business website for the
company to offer convenient inquiry for customers.
Marketing Analyst
Sep 2017 - Dec 2017
- Comprehended working contents that included food production and
processing, warehouse management, purchase order and invoice handling,
client interview, data processing, accounting and logistics.
- Completed and visualized the trend chart of 1400 products from
August 2015 to August 2017 with R.
- Measured, modified and organized Vendor Info data to give the
president an easier way of understanding the data.
- Learned to use QuickBooks and to handle purchase order and
invoice.
- Helped organize goods of transportation, manage warehouse, check
logistics information, and process cooperation procedures with food
retailers, schools and restaurants.
CERNET Education Development Co. Ltd. (Beijing, China)
Marketing Director Assistant
Jul 2015 - Aug 2015
- Wrote three marketing research reports in English.
- Published five articles, sharing my own college application
preparation experience and my life, on WeChat public platform.
- Organized two successful orientations for high school students and
high school graduates.
- Prepared the project launch ceremony of the International Pathway
Program.
- Planned the guests (Chinese Education Officials) interview, and the
speeches to CERNET executives.
Penghua Fund Management Company Ltd. (Beijing, China)
Business Assistant, Business Department
Jun 2014 - Jul 2014
- Negotiated with bank finance department managers and explained our
product’s benefits, result in selling it successfully.
- Integrated business data in a better way in Excel.
- Created multiple types of charts to visualize the data to give
supervisors a clearer picture of those data.
Research Experiences
Are Customer Review Scores Influencing Sale Rankings? A Study on Amazon
Best Sellers
Assistant Researcher
Sep 2017 - Dec 2017
- Cooperated with Assistant Professor Fang Tian, Business
Administration Division, Seaver College, Pepperdine University.
- Applied nonparametric methods to study Amazon Best Sellers to
determine customer review scores’ impacts on sales ranking.
- Collected and processed data including sales ranking, price, star
points, customer average reviews, review numbers and answered questions
of selected commodities including Baby Musical Toys and Single Room
Humidifiers, conducted Spearman and Kendall tests to evaluate
correlation between sales ranking and rest variables and generate
correlation charts of relevant tests, and executed Wilcoxon Rank-Sum
test to verify derived conclusions.
Forecasting Straits Times Index Stock Value
Undergraduate Researcher
May 2016 - Jul 2016
- Cooperated with Associate Professor Hongkun Zhang, Department of
Mathematics and Statistics, UMass Amherst.
- Applied simple ARMA model to Google Alphabet stock and analyzed
statistically how the data distributes.
- Studied Markov Switching model, ARCH model and GARCH model.
- Applied Markov Switching - AR(1) - GARCH(1,1) model on the STI Index
from the Yahoo Finance data (1987/12/28 - 2016/6/14) as well as
back-testing to predict future log returns for market close values.
- Classified data into two groups by plotting multiple log return
graphs and applied AR(1) - GARCH(1,1) to each section.
- Built a reliable model and predicted almost all the real prices
within given price range.
Study on Measurement Method of Arsenic (As) in Rice with Hach EZ test
kit, Tyson’s Lab
Undergraduate Researcher
Sep 2013 - Dec 2015
- Cooperated with Professor Julian Tyson, Department of Chemistry,
UMass Amherst.
- Conducted the research weekly eight hours on average for five
semesters from my freshmen year, wrote experimental progress reports,
attended weekly meetings to discuss with the professor and group members
on next procedures.
- Utilized statistical methods to deal with experimental data, for
instance, using statistical distribution constant to describe As content
in different rice samples, using ADI Digital Image Analysis to extract
RGB value of test strips when analyzing their As content, and using the
logarithmic diagrams in Excel to correspond to the colors on test strips
and color charts.
- Went through a whole research process, improved my research
capacities and academic writing ability, and learned to use statistical
methods to serve for experimental analysis.
Related Projects & Courseworks
-
Development of Interactive Web Applications using R-Shiny.
-
Exploration of Different Visualization Techniques of the ‘Gapminder’
Data using R-Shiny.
-
Information retrieval on the web and data wrangling (NHL Hockey
Schedule, Forbes 100, top 250 movies from IMDb).
-
Statistical analysis of bird breeding pairs.
-
Future Courses: Python Programming. Big Data Management
and Analysis (Hadoop). Statistics for Financial Engineering.
Activities
EducationUSA, Embassy of USA, Beijing, China
Guest Speaker
Jun 2014 - Aug 2016
- Presented to potential students and their parents on UMass Amherst
advantages, college life lessons and study experiences.
- Represented UMass Amherst to give introduction to the college in the
college fair.
International Programs Office, UMass Amherst
IPO Buddy
Aug 2014 - May 2015
- Handled daily matters of the office, and offered assistance to
international students in choosing courses.
Standard Chartered Investing Competition, Young Elite Camp, Beijing,
China
Group Leader
Jul 2015
- Motivated and led group members to solve challenging financial
problems and decided the best way of investing money.
- Learned British Royal Family Etiquette, and completed mock
interviews.
Teaching Experiences
UMass Amherst
Private Tutor
Sep 2013 - May 2017
- Subjects: Calculus, Linear Algebra, Ordinary Differential Equation,
Probability, Statistics, Chemistry, R programming.
- Tutored students in need these subjects generally two hours per week
and 3-4 students each year.
- Tutored a student on Calculus I in Spring 2017, helped her improve
grade from F to A-.
East Asian Languages and Cultures Program, UMass Amherst
Chinese Tutor
Jan 2015 - Dec 2015
- Communicated and learned Chinese grammar and teaching skills with
head tutor every week.
- Created new ideas to the dean of Chinese Program on how to improve
the program.
- Tutored Chinese Major students pronunciation, grammar, writing and
speaking.
Skills
- Computer Languages: R, R-Shiny, SAS, SQL, Python,
Markdown, HTML, CSS, Java.
- Computer Software: RStudio, Eclipse Java, Microsoft
PowerPoint, Microsoft Excel, Microsoft Word, Tableau, Hadoop (future),
Adobe Photoshop.
- Languages: Fluent in both Mandarin and
English.
- Data Analysis: Descriptive Analysis, Statistical
Modeling, Time Series Analysis (ARIMAX, SARIMA, GARCH), Data
Visualization, Optimization