Skip to content
View EricBorba's full-sized avatar

Block or report EricBorba

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
EricBorba/README.md
Eric Borba — AI Engineer & Cloud Infrastructure

Eric Borba

I work where AI meets infrastructure — because great models need great plumbing.


LinkedIn   Email   GitHub followers


The Stack I'm Building

AI Research (PhD)  →  ML Engineering  →  Cloud Infrastructure for AI
                                         (AWS · Terraform · Observability)

Started in academia studying how systems fail — modeling SSD and HDD reliability at HPC scale, published at ARCS 2024, funded by the EU Horizon 2020 IO-SEA project. That work pushed me toward engineering resilient systems at scale.

Today I bridge research and deployment: building reliable, observable, scalable systems that turn cutting-edge ML into production-grade infrastructure.

The goal: speak fluent ML and fluent AWS — own the full path from research to production.


Tech Stack

AI & Machine Learning

Python Jupyter scikit-learn XGBoost TensorFlow CrewAI

Cloud & Infrastructure

AWS Terraform GitHub Actions ECS Fargate Lambda CloudWatch RDS VPC IAM S3

Languages, Data & Tooling

Docker FastAPI PostgreSQL MongoDB Git Linux


Featured Projects

Production-grade observability stack on AWS. An order-processing API (FastAPI · ECS Fargate · RDS) with structured JSON logging, 8 custom CloudWatch metrics, Golden-Signal dashboards, tiered SNS alerting, Lambda auto-remediation, FinOps cost monitoring, and AI-powered incident analysis via CrewAI — all infrastructure-as-code with Terraform and shipped through GitHub Actions.

Highlight: three injected failure scenarios (error flood, high latency, CPU spike), each diagnosed from the CloudWatch correlation view and remediated automatically by Lambda — closing the alert loop with no human in it.

FastAPI ECS Fargate RDS CloudWatch Lambda SNS Terraform GitHub Actions CrewAI


Production-grade AWS 3-tier architecture: internet-facing ALB → 6 Node.js EC2 instances across 2 AZs with Auto Scaling → isolated data tier. Custom VPC with network segmentation, security-group chaining, and CloudWatch monitoring.

Highlight: fully automated scaling and high availability across multiple availability zones.

AWS VPC EC2 Auto Scaling ALB CloudWatch Terraform


ML-driven reliability analysis of SSD and HDD failure in HPC burst buffers. Uses SMART telemetry from ~1M Alibaba SSDs and Backblaze HDDs to predict Mean Time to Failure with Random Forest and LSTM models.

Highlight: 94% prediction accuracy — published at ARCS 2024, funded by EU Horizon 2020 IO-SEA.

Python MongoDB scikit-learn XGBoost LSTM


End-to-end ML application forecasting monthly road-accident occurrences from Munich open traffic data — trained, serialized, and served via a REST API, containerized with Docker and deployed to the cloud.

Highlight: the full ML pipeline in one repo — preprocessing → training → API serving → deployment.

Python Flask Docker scikit-learn


GitHub Activity

Contribution graph snake animation

Currently

  • 📜 Pursuing the HashiCorp Terraform Associate certification
  • 🏗️ Building production-ready AI services on AWS — containerized, observable, infrastructure-as-code
  • 🔍 Going deeper on MLOps: model serving, cost optimization, and distributed training on cloud

Open to ML Engineering · MLOps · Cloud Infrastructure roles

Last updated June 2026 · View all repositories →

Pinned Loading

  1. StorageFailurePredictor StorageFailurePredictor Public

    ML-driven SSD/HDD failure prediction for HPC burst buffers using SMART telemetry, GSPN/RBD models, and Random Forest · LSTM · XGBoost. Published at ARCS 2024.

    Jupyter Notebook 2 1

  2. AccidentPredictorApp AccidentPredictorApp Public

    End-to-end ML app forecasting road accident occurrences from Munich traffic data — Flask REST API, Docker, deployed to Heroku.

    Jupyter Notebook 1

  3. three-tier-architecture-aws three-tier-architecture-aws Public

    AWS 3-tier architecture: internet-facing ALB → 6 Node.js EC2 instances (Auto Scaling, 2 AZs) → data tier — custom VPC with network segmentation, security group chaining, CloudWatch monitoring, and …

    HTML

  4. ce-project-2-instrumented-monitored-service ce-project-2-instrumented-monitored-service Public

    Production-grade observability on AWS — structured logging, custom CloudWatch metrics, Golden Signal dashboards, tiered alerting, Lambda auto-remediation, FinOps cost monitoring, and AI-powered inc…

    Python