Akash M S — Data Science Student & ML Engineer

01About

I'm a Computer Science undergraduate specialising in Data Science, focused on turning raw data into actionable insight and practical solutions. I work with Python, SQL, Pandas, NumPy, Power BI, and ML frameworks.

My experience spans AI-based speech systems, data preprocessing pipelines, analytics workflows, and ML model deployment. I focus on understanding data deeply and solving real business problems — not just collecting certifications.

Actively preparing for Data Analyst, Data Science, and ML roles. Open to internships and entry-level opportunities. Also learning German 🇩🇪 on the side.

📧 ms29akash@gmail.com

Core Tech Stack

PythonSQLPandasNumPy Scikit-learnTensorFlowPower BIStreamlit FastAPIFlaskAWSOracle Cloud MongoDBMySQLRHTML/CSSSHAPNLP

Honors & Education

🏆 Optum STEM Scholar 2025–26

Optum | NASSCOM Foundation · Feb 2026

₹30,000 scholarship — national selection for academic performance and STEM potential.

🌍 4th Innovative Project Expo 2024 — Top 70

Presidency University · Jun 2024

World's Largest Innovative Project Expo finalist.

B.Tech — Data Science

Presidency University, Bangalore · 2023–2027

First Class with Distinction

PU Science (PCMC)

RR Institute of Technology · 2021–2023

Distinction and Honors

02Experience

Jul 2025 – Sep 2025 · 3 months · Bangalore, On-site

Hostingsignal

AI & Data Intern — Speech-to-Text Systems

Internship

Situation: Organisation needed multilingual ASR pipelines for Indian language transcription at scale.
Processed, cleaned, and annotated large audio datasets to improve transcription accuracy.
Evaluated ASR model performance using error analysis and test-set validation across multiple languages.
Result: Contributed structured annotation workflows that improved data quality pipeline-wide.

Feb 2025 – Jun 2025 · 5 months · Bengaluru, On-site

CampusX

Logistics & Operations Intern

Internship

Situation: CampusX ran 5+ large-scale student programs requiring coordinated logistics.
Tracked operational workflows using structured checklists and reporting formats.
Collaborated cross-functionally to resolve on-ground issues in real time.
Result: Optimised logistics processes reducing delays across all programs.

03Live Deployed Projects

● LIVEML · Finance

Cost-Sensitive Fraud Detection System

Cost-sensitive ML pipeline weighting false negatives far higher than false positives. Real-time dashboard for live transaction scoring with SHAP explainability.

XAIExplainable

LiveReal-time

SHAPAttribution

PythonScikit-learnStreamlitSHAPXAI

⬡ Live Demo ↗ GitHub

● LIVENLP · RecSys

Domain-Agnostic Explainable Recommendation Engine

TF-IDF + semantic similarity engine accepting any dataset. Handles cold-start users and produces human-readable explanations per recommendation.

AnyDomain

ColdStart Ready

XAIPer Item

NLPTF-IDFSemantic SimStreamlit

⬡ Live Demo ↗ GitHub

● LIVEML · HR Analytics

Explainable HR Attrition Risk System

ML classification with SHAP-based explainability identifying at-risk employees. Interactive Streamlit dashboard bridging ML output and business decisions.

SHAPExplainable

HRAnalytics

LiveDashboard

MLSHAPXAIStreamlitHR

⬡ Live Demo ↗ GitHub

● LIVENLP · Safety

Explainable Job Scam Risk Detection System

NLP + ML pipeline classifying job postings as legitimate or fraudulent. Paste any JD — get instant fraud risk score with SHAP feature attribution.

NLPText ML

SHAPAttribution

LivePublic App

NLPClassificationSHAPPython

⬡ Live Demo ↗ GitHub

NLP · ASR · Internship

Speech-to-Text — Indian Languages (Flask)

Flask-based web app using SpeechRecognition API supporting multiple Indian languages with real-time audio processing. Direct production deliverable.

MultiLanguage

FlaskWeb App

RealProduction

FlaskPythonNLPASRHTML/CSS

↗ GitHub

● LIVENLP · XAI · AgriTech

FarmVoice AI — Explainable Crop Advisory

Explainable AI-powered crop advisory system using NLP + ML that converts natural language into accurate, transparent crop recommendations with SHAP attribution per suggestion.

NLPNatural Lang

XAIExplainable

LiveDeployed

PythonNLPExplainable AIStreamlit

⬡ Live Demo ↗ GitHub

● LIVEML · LLM · Business Intelligence

DecisionIQ — AI Business Intelligence Platform

AI-powered decision intelligence system combining ML + LLM to forecast trends, detect anomalies, and generate actionable business insights. XGBoost + Random Forest + LLM reasoning layer.

LLMInsights

MLForecasting

LiveStreamlit

XGBoostLLMStreamlitRandom Forest

⬡ Live Demo ↗ GitHub

Statistics · Analytics

A/B Testing — Statistical Conversion Analysis

Rigorous A/B testing analysis using statistical hypothesis testing, confidence intervals, and p-value evaluation to drive data-backed business decisions on conversion uplift.

StatsRigorous

CI95%

DataDriven

PythonStatisticsHypothesis TestingPandas

↗ GitHub

NLP · Chatbot · AgriTech

JeevanMitra AI — Smart Farming Chatbot

Rule-based NLP chatbot serving as a smart farming companion. Provides crop guidance, weather advisory, and farming tips through an intuitive conversational interface.

NLPRule-Based

JSFrontend

SmartAdvisory

JavaScriptRule-based NLPHTML/CSSChatbot

↗ GitHub

RECRUITER SUMMARY

✔ Python & ML ✔ Explainable AI ✔ NLP ✔ 4 Live Apps ✔ 18+ Certs ✔ Internship XP ✔ Streamlit · FastAPI 🟢 Open to Work

04Project Deep Dive — Fraud Detection System

My most production-realistic ML build — cost-sensitive, explainable, live-deployed

BUSINESS PROBLEM

Banks lose billions to fraud yearly — but standard classifiers optimize for accuracy, not cost.

A model with 99% accuracy that flags no fraud is useless in production. Standard ML treats a missed fraud (false negative) the same as a false alarm (false positive) — but in banking, a missed ₹50,000 fraud costs far more than wrongly flagging a transaction. The real objective isn't accuracy. It's minimizing cost-weighted error.

KEY INSIGHTBuilt a cost-sensitive classifier where FN penalty = 5× FP penalty — mirroring real banking loss ratios.

DATASET & PREPROCESSING

284,807 transactions · 492 fraud cases · 0.17% fraud rate

Extreme class imbalance required deliberate handling — not just oversampling. Applied SMOTE for minority class augmentation combined with class_weight='balanced' in the model. Features were PCA-transformed (V1–V28) plus Amount and Time. Amount required log-scaling; Time required cyclical encoding to capture daily patterns.

1Log-scale Amount (removes right skew)

2Cyclical encode Time (sin/cos of hour)

3SMOTE oversampling on training split only

4StandardScaler — fit on train, transform both

SYSTEM ARCHITECTURE

End-to-end pipeline: raw transaction → fraud verdict + explanation in <20ms

Raw CSV
Input

→

Feature Eng.
Log · Cyclical

→

XGBoost
Cost-sensitive

→

SHAP
TreeExplainer

→

FastAPI
/predict

→

Streamlit
Dashboard

Prediction and explanation computed in a single forward pass. SHAP values cached per request. FastAPI handles async inference; Streamlit polls the API every 2s for live demo.

MODEL SELECTION RATIONALE

XGBoost chosen over Random Forest and Logistic Regression — here's exactly why.

ModelRecall (Fraud)LatencySHAP SupportVerdict

Logistic Regression61%2msPartial✗ Too simple

Random Forest79%35ms✓ Full∼ Good baseline

XGBoost ★94%18ms✓ Full✓ Chosen

Neural Network92%210ms✗ None✗ Black box

XGBoost + cost_matrix gave 94% recall on fraud class — every 6 in 100 fraud cases missed vs 39 in 100 with Logistic Regression.

EXPLAINABILITY — WHY SHAP

A fraud alert without a reason is useless. SHAP turns every prediction into a story.

Used TreeExplainer (SHAP) — exact Shapley values for tree models, 10× faster than KernelExplainer. Each prediction outputs: which features pushed toward fraud, by how much, and the baseline fraud rate. This is what regulators and compliance teams actually need.

Example Output — Transaction flagged FRAUD (confidence: 92.3%)

High Amount

+0.41 ↑ fraud

Unknown Merchant

+0.28 ↑ fraud

Late Night (2AM)

+0.19 ↑ fraud

Known Location

−0.09 ↓ fraud

ENGINEERING CHALLENGES & DECISIONS

Every obstacle became an architecture decision. Here's my thinking.

Problem EncounteredDecision MadeWhy This Works

0.17% fraud — model predicts all LegitSMOTE + class_weightSynthetic minority samples balance gradient updates

Equal FP/FN penalty → poor recallCustom cost matrix (FN = 5× FP)Aligns loss function with real business cost

SHAP too slow for real-time (>200ms)TreeExplainer + value cachingExact Shapley in <5ms for tree models

Amount scale dwarfs other featuresLog1p transform + StandardScalerPrevents single feature dominating gradients

Time is circular (23:59 ≈ 00:01)Cyclical encoding (sin/cos)Preserves continuity at day boundaries

RESULTS & METRICS

94% fraud recall · 18ms inference · live on Streamlit Cloud

94%

Fraud Recall

Catches 94 in 100 real fraud cases

92.3%

Avg Confidence

On correctly flagged fraud

18ms

P95 Latency

Predict + SHAP explain

0.87

F1 Score

Fraud class (minority)

920

True Positives

Out of 975 fraud cases

Missed Frauds

False negatives (target: <60)

Deployed on Streamlit Cloud · accepts any transaction CSV · full SHAP waterfall chart per prediction · production-realistic pipeline from day one.

04.1Live Demo Gallery

🛡️

Fraud Detection

Upload any CSV · get fraud scores + SHAP explanations per transaction in real time

XGBoostSHAPFastAPI

⬡ Launch App

🔍

SCAMGUARD AI

Paste any job description · instant fraud risk score with NLP feature attribution

NLPscikit-learnSHAP

⬡ Launch App

👥

HR Attrition Risk

Upload HR data · see who will leave + exactly why, ranked by SHAP importance

Random ForestSHAPStreamlit

⬡ Launch App

🎯

Recommendation Engine

Drop any CSV · domain-agnostic recommender with cold-start handling + explanations

TF-IDFNLPStreamlit

⬡ Launch App

04.2ML Workflow — How I Build Every Project

📥

Data Ingestion

CSV · API · DB

→

🧹

Cleaning

Nulls · Outliers · Types

→

⚙️

Feature Eng.

Scale · Encode · Select

→

🤖

Train + Tune

CV · Cost-aware

→

🧠

SHAP Explain

WHY > WHAT

→

🚀

Deploy

Streamlit · FastAPI

04.3GitHub Activity

14+ repos · active development

Less

06Certifications

Oracle

OCI 2025 Certified Data Science Professional

Oct 2025 · Expires Oct 2027Active

View Badge ↗

Oracle

OCI 2025 Certified Generative AI Professional

Oct 2025 · Expires Oct 2027Active

View Badge ↗

Oracle

OCI 2025 Certified AI Foundations Associate

Oct 2025 · Expires Oct 2027Active

View Badge ↗

Google

Google Data Analytics Professional Certificate

Nov 2025Google

Verify ↗

AkashM S

Akash
M S