Harry Lu
New York, NY | LinkedIn

Summary


Data Scientist with strong analytical and engineering skills, has industry experience in designing and developing full-stack big data pipeline and visualization. Solid experience of using PySpark, R, SQL/HQL on data science and analytics projects. Experience of leading end-to-end projects. Great at designing and implement useful metrics to answer quantitative and qualitative problems, provide insights and solutions. Deployed several Shiny Dashboards on interactive visualizations and real-time web wrangling.

Education


Cornell University
Aug 2018 - May 2019
Degree: Master of Professional Studies in Applied Statistics
Concentration: Data Science
LaiOffer (Online)
May 2018 - Dec 2018
Area: Software Development Essentials
Topics: Data Structure, Algorithms, OO Design, System Design
Language: Java
University of Massachusetts Amherst
Sep 2013 - May 2017
Degree: Bachelor of Science in Mathematics
Concentrations: Applied Mathematics, Statistics, Actuary
Cumulative GPA: 3.52 | Major GPA: 3.80
Minor: Chemistry
Dean’s List
Jan 2014 - Jan 2017
Director Scholarship
Sep 2013 - May 2017

Work Experiences


Citigroup, Inc. (New York, NY)
Data Scientist
Oct 2019 - Present
  • Worked with global teams on improving both efficiency and quality of Global Surveillance Solution (TotalConduct) by data science, analytics, and engineering. Responsible on performing research, developing automated analytical tools of the platform, and enhancing the data storage of internal data lake.
  • Worked with Olympus data, led a data reconciliation project between Olympus database and market surveillance database, built data ETL pipelines to automate the alerting and reporting mechanisms. Role: Lead Data Scientist.
  • Led several end-to-end projects, including defining requirements, transforming requirements into technical design, implementations (data engineering + software developing), code review, integrating the final product with production cluster.
  • Designed and implemented big data analytical pipeline using Spark algorithms to calculate multiple metrics of different surveillance rules, helped clients to understand how the surveillance rules performed, and productionized the PySpark application on the cluster. Role: Data Scientist + Data Engineer + Project Owner.
  • Performed ad hoc analytics to provide answers to client’s questions. Role: Data Scientist.
  • Led and coached other members on project design and technical skills.
  • Developed many python packages for the team on data extraction and big data calculation algorithms across multiple projects, including ETLs of multiple daily used tables. Maintained several bitbucket repositories including these python packages. Role: Code Owner + Developer.
  • Led on creating curated dataset, built a framework with Spark, and coached a team on the skills. Role: Project owner and lead, transferred to other team members.

Internship Experiences


Crane Management Services, Inc. (Los Alamitos, CA)
Statistical Analyst of Marketing
Mar 2018 - Aug 2018
  • Accomplished extensive market research for the company (74 pages), including competitors’ information, logistics, engineering demands, future projects and labors. Competitor’s location data was collected, cleaned and visualized with Excel and R.
  • Predicted future demands based on the changes in the industry.
  • Visualized competitors’ information and summarized market data with R to help project managers make decisions.
  • Analyzed and determined industry demand that assisted in projecting the company’s direction in the marketplace.
  • Enhanced relationships with both domestic customers and international customers.
  • Rewrote and updated the User Operating Manual of SPMT in order to let the company enter new business areas.
Fishmarket Inc. (Louisville, KY)
Website Developer
Oct 2017 - Aug 2018
  • Constructed, maintained and updated the business website for the company to offer convenient inquiry for customers.
Marketing Analyst
Sep 2017 - Dec 2017
  • Comprehended working contents that included food production and processing, warehouse management, purchase order and invoice handling, client interview, data processing, accounting and logistics.
  • Completed and visualized the trend chart of 1400 products from August 2015 to August 2017 with R.
  • Measured, modified and organized Vendor Info data to give the president an easier way of understanding the data.
  • Learned to use QuickBooks and to handle purchase order and invoice.
  • Helped organize goods of transportation, manage warehouse, check logistics information, and process cooperation procedures with food retailers, schools and restaurants.
CERNET Education Development Co. Ltd. (Beijing, China)
Marketing Director Assistant
Jul 2015 - Aug 2015
  • Wrote three marketing research reports in English.
  • Published five articles, sharing my own college application preparation experience and my life, on WeChat public platform.
  • Organized two successful orientations for high school students and high school graduates.
  • Prepared the project launch ceremony of the International Pathway Program.
  • Planned the guests (Chinese Education Officials) interview, and the speeches to CERNET executives.
Penghua Fund Management Company Ltd. (Beijing, China)
Business Assistant, Business Department
Jun 2014 - Jul 2014
  • Negotiated with bank finance department managers and explained our product’s benefits, result in selling it successfully.
  • Integrated business data in a better way in Excel.
  • Created multiple types of charts to visualize the data to give supervisors a clearer picture of those data.

Research Experiences


Are Customer Review Scores Influencing Sale Rankings? A Study on Amazon Best Sellers
Assistant Researcher
Sep 2017 - Dec 2017
  • Cooperated with Assistant Professor Fang Tian, Business Administration Division, Seaver College, Pepperdine University.
  • Applied nonparametric methods to study Amazon Best Sellers to determine customer review scores’ impacts on sales ranking.
  • Collected and processed data including sales ranking, price, star points, customer average reviews, review numbers and answered questions of selected commodities including Baby Musical Toys and Single Room Humidifiers, conducted Spearman and Kendall tests to evaluate correlation between sales ranking and rest variables and generate correlation charts of relevant tests, and executed Wilcoxon Rank-Sum test to verify derived conclusions.
Forecasting Straits Times Index Stock Value
Undergraduate Researcher
May 2016 - Jul 2016
  • Cooperated with Associate Professor Hongkun Zhang, Department of Mathematics and Statistics, UMass Amherst.
  • Applied simple ARMA model to Google Alphabet stock and analyzed statistically how the data distributes.
  • Studied Markov Switching model, ARCH model and GARCH model.
  • Applied Markov Switching - AR(1) - GARCH(1,1) model on the STI Index from the Yahoo Finance data (1987/12/28 - 2016/6/14) as well as back-testing to predict future log returns for market close values.
  • Classified data into two groups by plotting multiple log return graphs and applied AR(1) - GARCH(1,1) to each section.
  • Built a reliable model and predicted almost all the real prices within given price range.
Study on Measurement Method of Arsenic (As) in Rice with Hach EZ test kit, Tyson’s Lab
Undergraduate Researcher
Sep 2013 - Dec 2015
  • Cooperated with Professor Julian Tyson, Department of Chemistry, UMass Amherst.
  • Conducted the research weekly eight hours on average for five semesters from my freshmen year, wrote experimental progress reports, attended weekly meetings to discuss with the professor and group members on next procedures.
  • Utilized statistical methods to deal with experimental data, for instance, using statistical distribution constant to describe As content in different rice samples, using ADI Digital Image Analysis to extract RGB value of test strips when analyzing their As content, and using the logarithmic diagrams in Excel to correspond to the colors on test strips and color charts.
  • Went through a whole research process, improved my research capacities and academic writing ability, and learned to use statistical methods to serve for experimental analysis.

Related Projects & Courseworks


  • Development of Interactive Web Applications using R-Shiny.
  • Exploration of Different Visualization Techniques of the ‘Gapminder’ Data using R-Shiny.
  • Information retrieval on the web and data wrangling (NHL Hockey Schedule, Forbes 100, top 250 movies from IMDb).
  • Statistical analysis of bird breeding pairs.
  • Future Courses: Python Programming. Big Data Management and Analysis (Hadoop). Statistics for Financial Engineering.

Activities


EducationUSA, Embassy of USA, Beijing, China
Guest Speaker
Jun 2014 - Aug 2016
  • Presented to potential students and their parents on UMass Amherst advantages, college life lessons and study experiences.
  • Represented UMass Amherst to give introduction to the college in the college fair.
International Programs Office, UMass Amherst
IPO Buddy
Aug 2014 - May 2015
  • Handled daily matters of the office, and offered assistance to international students in choosing courses.
Standard Chartered Investing Competition, Young Elite Camp, Beijing, China
Group Leader
Jul 2015
  • Motivated and led group members to solve challenging financial problems and decided the best way of investing money.
  • Learned British Royal Family Etiquette, and completed mock interviews.

Teaching Experiences


UMass Amherst
Private Tutor
Sep 2013 - May 2017
  • Subjects: Calculus, Linear Algebra, Ordinary Differential Equation, Probability, Statistics, Chemistry, R programming.
  • Tutored students in need these subjects generally two hours per week and 3-4 students each year.
  • Tutored a student on Calculus I in Spring 2017, helped her improve grade from F to A-.
East Asian Languages and Cultures Program, UMass Amherst
Chinese Tutor
Jan 2015 - Dec 2015
  • Communicated and learned Chinese grammar and teaching skills with head tutor every week.
  • Created new ideas to the dean of Chinese Program on how to improve the program.
  • Tutored Chinese Major students pronunciation, grammar, writing and speaking.

Skills


  • Computer Languages: R, R-Shiny, SAS, SQL, Python, Markdown, HTML, CSS, Java.
  • Computer Software: RStudio, Eclipse Java, Microsoft PowerPoint, Microsoft Excel, Microsoft Word, Tableau, Hadoop (future), Adobe Photoshop.
  • Languages: Fluent in both Mandarin and English.
  • Data Analysis: Descriptive Analysis, Statistical Modeling, Time Series Analysis (ARIMAX, SARIMA, GARCH), Data Visualization, Optimization