Portfolio of Works

Hi, I'm Giovanni – welcome to my Portfolio of Works, showcasing the projects I've led and contributed to with a product-driven mindset and a focus on impact.

If you are here you may be interested in how I have been dealing with data within both business and research contexts.

If you would like to know more about my contribution to these projects please feel free to contact me

Mobile: +39 3497039946

email: gbalestrieri.contact@gmail.com

LinkdIn

(Click on the little arrow next to the headlines to open and close content windows)

Personal Projects (GitHub) 👨‍💻

Product

Corporate Sustainability Assessment Saas - (project reference here)
This project serves as a sophisticated Saas product mockup, crafted to demonstrate my ability to understand and translate user needs into functional and user-friendly tech product features.
Try click here for demo try out!!! 🚀

Data Engineering

DBT Proof Of Concept - (project reference here)
This project demonstrates the integration of dbt (Data Build Tool) with MySQL for data transformation and analytics. The project consists of SQL models and transformations, leveraging dbt to run and manage data pipelines inside a dockerized environment.

Data Analytics

Framework for Repeated Customers Analysis - (project reference here)
This project focuses on analyzing the behavior of repeated customers (customers that are present in two following years) by using matrix visualizations to compare spending patterns across different product segments and customer segments over multiple years.
The matrix visualization helps identify trends and areas that may require action, such as:
- Developing targeted marketing campaigns for high-growth segments.
- Offering promotions to reverse declining trends in certain product segments.
- Shifting resources toward high-performing products or customer groups.
The matrix visualization allows businesses to base decisions on actual data rather than assumptions, resulting in more effective strategies.

Work Experience 💼

Analytics Engineer Associate (6 months) - Groover

Data Analytics

Attribution Model
- Project description
  Defined and implemented a robust dbt model architecture to support a comprehensive reporting layer focused on acquisition and retention performance across user acquisition channels. This enabled the business to accurately measure the effectiveness of each channel and make data-driven decisions to optimise business strategy.
Outcome: New core insights about the ideal customer profile representativeness in terms of volumes and the user acquisition and retention power of distinct channels.
Lesson Learned: managerial business users prefer have metrics in terms of revenue rather than other volumes.

Ads Anomaly Detection
- Project description
  Implemented a robust data quality and alerting framework using dbt tests in combination with the Elementary package to monitor critical drops in ad conversion metrics. Designed and deployed a forecasting function to proactively detect and alert on projected increases in monthly cost per acquisition. Integrated the alerting system with Slack to ensure timely, targeted notifications to relevant stakeholders, enabling faster response and mitigation.
Outcome: Reactiveness to conversion tracking issues and ads budget waste.
Lesson learned: Alerting process needs constant fine tuning and monitoring.

CRM Email Automations Analytics
- Project description
  Modelled high-volume email interaction data to enable dynamic analysis of user journeys, allowing grouping and comparison of engagement metrics based on flexible time windows and platform-specific user milestones. This framework surfaced previously unknown insights into which email sequences were most effective in driving key user actions, directly informing lifecycle marketing strategies and user onboarding optimisation.
Outcome: Unlocked actionable insights into email journey performance, enabling the marketing team to optimise communication flows and significantly improve user activation and retention.
Lesson learned: High-quality data models are those designed for flexibility—supporting broad parameterisation to empower end users with a small set of modular, reusable views that can answer a wide spectrum of business questions efficiently.

Data Engineering

Amazon DMS with Terraform
- Project description
  Provisioned and configured Amazon DMS using Terraform to automate and manage data replication from the source database to the target environment. This setup ensured scalability, repeatability, and alignment with infrastructure-as-code best practices.
Outcome: Established a reliable and continuous data stream from source to target databases, enabling near real-time access to raw data for downstream processing and analytics.
Lesson learned: Terraform is a powerful tool and comes with the need of great attention in its usage.

CICD optimisation
- Project description
  Refactored the CICD pipeline by modifying the GitHub Actions workflow to migrate dbt model execution from default GitHub-hosted runners to a dedicated Amazon EC2 instance.
Outcome: Reduced pipeline execution time and increased deployment consistency, resulting in faster feedback cycles and more efficient dbt development workflows.
Lesson learned: Well-structured and highly customisable GitHub workflows are critical to building robust CICD processes—they mark the difference between a production-grade data stack and a fragile one.

Dbt and Airflow debugging
Working with dbt and Airflow to orchestrate and deploy analytics models requires a deep understanding of both tools’ configurations and runtime behaviour. Through hands-on experience, I developed the ability to diagnose and resolve pipeline failures, whether stemming from dbt model logic, dependency management, or Airflow DAG misconfigurations.

Data Science Associate (1 year) - Weeztix

Data Analytics

Business Growth Dashboard
- Project description
  Designed and implemented a dynamic dashboard using dbt and Redash to visualise company growth across KPIs.
  The tool allows users to compare different time periods, slice data by key business dimensions, and assess individual customer performance.
  By highlighting underperforming accounts and growth trends, this solution empowered the Weeztix team to make data-driven decisions, enhance customer success strategies, and unlock new business opportunities.
Outcome: Enabled strategic decision-making, customer success initiatives, and improved forecasting.
Lesson learned: a well done business oriented data product documentation is as important as the product since it will greatly impact adoptability across organisation.

Shop Conversion Rates and Funnel Analysis for Customer Success
- Project description
  Developed a conversion rate calculation pipeline for ticket shops, leveraging customer-tracked data. The results are displayed on a multi-dimensional dashboard used by the Sales and Customer Success Departments. This tool enables those departments to continuously explore and compare the effectiveness of different clients in using Weeztix. It does so by providing insights into their ticket shops' conversion patterns through interactive display and filtering options.
Outcome: Uncovered key conversion metrics and enabled feature-driven improvements across customer shops.
Lesson learned: simplifying the problem can still allow for actionable quality solutions. When tracking large data volumes, simplicity is key. In these cases, customers’ sessions and funnels should be defined with a balance between simplicity and accurately representing customers' behaviour.

Gold Customers Tailored Analysis
- Project description
  Led a series of cutom-made analyses for Weeztix's top-tier customers using dbt and Python, focused on uncovering best practices in ticket launches and identifying optimal sales targets within the sales window.
Outcome: Strengthened strategic relationships with key clients and improved their ticket sales performance through actionable insights.
Lesson learned: delivering quality work at high pace makes you stand out.

Cohort Analysis for Churn and LTV
- Project Description:
  Built a cohort analysis dashboard using dbt, enabling business users to explore customer lifecycle trends.
  The dashboard tracks customer retention, churn rates, and lifetime value over time, allowing the business to better understand usage patterns and take targeted actions to reduce churn and increase customer value.
Outcome: Provided visibility into customer retention patterns and long-term value.
Lesson learned: sometimes one graph speaks more than ten.

Data Science

Master’s thesis: Optimal Ticket Release Strategies for Club Nights and Music Festivals (abstract here) - collaboration between Jheronimous Academy of Data Science & Weeztix
- Project Description:
  Conducted an in-depth analysis of ticket sales strategies for club nights and music festivals. The research identified optimal strategic approaches that event organisers should adopt based on their brand strength, with the aim of boosting ticket sales for emerging brands and maximising revenue for established ones.
  Tools, Techniques, and Platforms: Dbt, Jupyter Notebooks, Rstudio, APIs, NLP, Clustering, Regression analysis, Hypothesis testing, LaTeX.
Outcome: Findings were directly implemented by Weeztix in their customer support guidelines. Grade: 8.5/10
Lesson learned: Proactiveness should not result in overestimating project feasibility within set timelines. Sometimes it is better to narrow the project scope ex-ante, while allowing for the future expansion of relevant outcomes.

Ticket Name Classification
- Project description
  A natural language processing, rule-based Python algorithm designed to classify event tickets and related services. It utilises a keywords dictionary related to club nights, festivals, and complementary goods/services to categorise tickets based on their names. The model can differentiate between entry, transportation, storage, accommodation, merchandise, and bundled offers. It supports both English and Dutch ticket names.
Outcome: Unlocked the new dimension of analysis ‘ticket type’ for the whole organisaiton.
Lesson learned: Rule-based algorithms are highly effective when specific keywords carry a significant weight in the classification process. Machine learning approaches become more suitable when human labelling is feasible and can provide sufficient data for training.

Music Events Segmentation
- Project description
  Developed and evaluated a clustering algorithm in Python for segmenting music events. First, the solution leverages the K-means technique by categorising events as small, medium, or large. Then, it labels them as basic club nights, premium club nights, or festivals. This classification is based on key features such as median ticket price, sales window, and event size. The segmentation provides valuable insights into sales performance patterns across different organising companies, enabling more targeted business analysis.
Outcome: Unlocked the new dimension of analysis ‘event type’ for the whole organisaiton.
Lesson learned: Too many features can result in clusters that are far from reality, even when using weighted features. The best practice is to leverage domain knowledge to identify a core set of variables that enable clear segment distinctions.

Business Intelligence Consultant (1 year) - Artexe

Business Intelligence

Patient Journey Solution (project reference here)
- Project description
  An optimal product is both simple and intuitive, as well as pedagogic. When social interactions play a significant role in the data collection process, the best practice is to explain potential system misuse and avoid displaying incomplete data to human actors, while only retaining consistent data for the final output.
Outcome: Major Italians healthcare firms were able to Reduce waiting times, and better support the work of medical, organisational, and administrative staff.
Lesson learned: When social interactions play a significant role in the data collection process, the best practice is to explain potential system misuse and avoid displaying incomplete data to human actors, while only retaining consistent data for the final output.

Education 📚

MSc Data Science in Business and Entrepreneurship (2 years)

Sales Forecasting Pipeline for Payout Fund Allocation (project reference here) - collaboration with CM.com
- Project description
  Training, evaluation, and deployment of two models. One feature-based (Random Forest) and one time-series (Prophet) for weekly merchant sales forecasting. For each merchant, the model with the lowest mean absolute percentage error was chosen to decide the amount to allocate before the next payout. Models’ retraining happens automatically in a CICD environment on Google Vertex AI pipelines.
Outcome: Improved UX with faster payouts to CM clients and better resource management for CM.com itself.
Lesson learned: Time-series models are more suitable when data is highly consistent, whereas feature-based algorithms perform better when the data lacks strong consistency.

Music Discovery Product (project reference here) - Data Entrepreneurship course project
- Project description
  Led deep research and validation of an innovative music discovery tool, with the aim of creating a product for users wishing to explore new music, such as DJs. The product allows users to create distinct and customised profiles that are linked to personalised recommendation models. Users can adjust their own algorithm configurations and switch between them, enabling users to break free from the recommendation bubbles typical of mainstream streaming services. Additionally, the tool
  supports independent artists by connecting them with listeners actively seeking music in their specific genre and style.
Outcome: learned product managerial practices such as user research, backlog structure and orchestration.
Lesson learned: Platform reliance can be a problem when developing an envisioned product. This is due to limitations, restrictions, or dependencies related to third parties stakeholders.

Personality Trait Prediction (project reference here) - NLP course project
- Project description
  Developed a model to predict OCEAN/BIG5 personality traits using 3,000 records of human-written texts in response to questions related to life’s meaning and purpose. The project involved three key tasks: comparing machine learning and deep learning binary classifiers, comparing machine learning and deep learning probabilistic classifiers, and incorporating Dutch responses into the modelling process to evaluate and compare performances across languages.
Outcome: comparison of predictive capabilities between machine learning and deep learning models. resulting in machine learning providing slight higher accuracy around 60%.
Lesson learned: proper data preprocessing is the core point of a NLP task

Cross-sell Solution (project reference here) - collaboration with Exsell
- Project description
  Trained and evaluated two reinforcement learning models (epsilon-greedy and linUCB) to optimise cross-selling item selection based on user context. The models are designed to select actions that maximise user rewards, defined as adding the recommended item to the cart. Notably, the linUCB model achieved an average payoff of 0.96, earning compliments from both professors and industry stakeholders.
Outcome: deployable recommender system for item cross-sell in CICD environment
Lesson learned: To better evaluate model performance, it is useful to compare it against a benchmark model, like a random prediction.

Bomb Detection Model (project reference here) - collaboration with Datacation
- Project description
  Developed a logistic regression model using data collected by a drone equipped with a magnetic field sensor to detect buried unexploded bombs. The model analyses 10x10 metre squares and flags anomalies for further inspection. Heatmaps are employed to assess these anomalies, identifying patterns across detected images visually. This process informs feature engineering to enhance a random forest model’s ability to differentiate between bombs and other anomalies. Thus aiming at reducing the costs of false positives and mitigate the risks associated with false negatives.
Lesson learned: Actively trying to disprove initial assumptions helps spot mistakes and strengthens result validity, reducing the risk of overlooking mistakes.