Agentic AI · Course Project

One command.
Every application.

An autonomous CLI agent that scrapes job boards, ranks matches to your profile, and generates tailored CVs & cover letters — fully local, no cloud required.

$ kpk run --role "ML Engineer" --location Berlin --generate
● Scraping 3 sources in parallel...
● Deduplicating & extracting skills...
● Ranked 47 jobs → 12 matches (≥70%)
✓ Generated 12 CVs + 12 cover letters
✓ Tracked in local database. Done in 34s.
$

01 — Problem & Motivation

Job hunting is manual, repetitive,
and perfectly suited for an agent

Every step of the job search workflow — scraping, evaluating, tailoring, tracking — is a task an autonomous agent can own with better consistency and zero fatigue.

🔍

Fragmented Sources

Jobs scattered across LinkedIn, Indeed, and government portals. No unified view, no ranking — just noise across browser tabs.

📄

Repetitive Tailoring

Each application demands a customised CV and cover letter. Same structure, different emphasis. Exactly what LLMs do best.

📊

No Feedback Loop

Which skills got interviews? Which phrasing landed offers? Without a data pipeline, candidates can't learn from outcomes.

The agent opportunity: every step of this workflow — scraping, parsing, ranking, generating, tracking — is modelled as an agent tool. The orchestrator chains them in a single command. The human only decides: what role, what location, go.

02 — The Agent Pipeline

Seven stages, one run command

A central orchestrator coordinates 7 stages in sequence from a single command. Each stage is independent — it can fail without blocking the rest.

› Agent Pipeline triggered by a single command

STEP 1

Scrape

3 sources
in parallel

STEP 2

Deduplicate

Fuzzy match
85% threshold

STEP 3

Extract

Skill parsing
NLP tagging

STEP 4

Rank

Weighted score
vs. profile

STEP 5

Market

Trends & salary
analysis

STEP 6

Track

SQLite persist
status history

STEP 7

Generate

LLM → .docx
CV + letter

● Tool invocation ● Analysis ● Intelligence ● Persistence ● LLM generation

Agentic pattern — Plan & Execute: The pipeline is a fixed plan, each step is a tool invocation. Errors are collected, never thrown — partial results always returned. This mirrors how production agents handle tool failures gracefully.

03 — Agentic Components

Tools, memory, LLM, and the orchestration loop

Each agentic capability maps to a concrete module. Tool abstraction, structured LLM output, persistent state, and feedback collection.

⚙️

Tool Use — Scrapers as Agent Tools

3 sources run in parallel, each as an independent tool with its own failure handling. Two strategies: REST API for Arbeitsagentur (official public API, no auth needed), and browser automation via Playwright for Indeed (embeds data in JS and blocks HTTP) and LinkedIn (tries the guest API first, falls back to Playwright if blocked).

🧠

LLM — Structured Output

The agent instructs the LLM to return a strict JSON structure — not free text. It reorders your experience, rewrites bullet points, and generates a company-specific cover letter. Responses are cached to avoid redundant calls. Supports Ollama, OpenRouter, and KISSKI.

💾

Memory — Application Tracker

The agent maintains a persistent record of every application: status (Wishlist → Applied → Interview → Offer), key dates, recruiter contacts, and generated documents. This is the agent's long-term memory across sessions.

🔄

Feedback — Outcome Tracking

After each application, the user rates the generated documents and records the outcome. Over time this builds a dataset of what worked — the foundation for prompt refinement and re-ranking tuned in Sprint 4.

Local-first Ollama OpenRouter KISSKI Browser automation REST scraping Local DB Document export Response caching Docker Web dashboard (optional)

04 — Architecture

Three layers, CLI-first

The agent runs entirely from the terminal. The web UI is an optional dashboard. All state is local. LLM calls go through OpenRouter, KISSKI (academic API), or a local Ollama instance.

Interface

Command LinePrimary interface

REST APIOptional

Web DashboardOptional

→

Agent Core

Orchestrator Scrape → Deduplicate → Extract Rank → Market → Track → Generate

→

Storage & External

SQLiteApplications + history

YAML ProfileCandidate config

Filesystem.docx output

OpenRouter / KISSKI / OllamaLLM gateways

Job Board APIs3 scraped sources

Local-first design: No cloud account needed. Candidate profile = YAML file. Jobs stored in SQLite. Documents are .docx on disk. LLM can point at local Ollama. The agent is fully functional offline after the initial scrape.

05 — Sprint Plan

Four sprints, each ships a working agent

Usable after Sprint 1. Each sprint adds agentic capabilities — from basic tool use to closed-loop learning.

Sprint 1

Scraping & Search

Playwright browser automation setup
3 scrapers: Arbeitsagentur, LinkedIn, Indeed
Initial LLM connection (KISSKI / Ollama)
Basic search config: role + location input
CLI command to trigger the pipeline

✓ Agent collects real job data

Sprint 2

Matching & Profile

Candidate profile configuration
Fuzzy deduplication across sources
Skill extraction from listings
Weighted ranking against your profile
Application memory & status tracking

✓ Agent ranks jobs for you specifically

Sprint 3

Generation & Apply

Structured LLM output for CV rewriting
Cover letter generation per job
Export to .docx format
Auto-apply to compatible job boards
Market trend report per search

✓ Agent applies on your behalf

Sprint 4

Fine-tuning & Polish

Feedback collection on outcomes
Prompt refinement based on results
Re-ranking tuned by past successes
Edge case handling & error recovery
Performance & cost optimisation

✓ Agent improves with each run

06 — Open Questions & Risks

Where we need input

Honest trade-offs and active design decisions — not blockers.

Open Question

Fixed pipeline vs. dynamic planning

Current orchestrator follows a fixed 7-step plan. Should the agent dynamically decide which steps to run? E.g. skip generation if no high-match jobs. Trade-off: predictability vs. autonomy.

Risk

Scraper fragility

Indeed and LinkedIn actively block bots. Mitigation: Arbeitsagentur uses the official REST API and is fully stable. LinkedIn tries the public guest API first and falls back to Playwright. Indeed uses browser automation only. All failures are isolated — partial results always returned.

Open Question

Document quality evaluation

LLM-as-judge is circular. Human eval is slow. Application-to-interview conversion is the real metric but needs weeks of data. Should we use rubric-based scoring as an interim proxy?

Risk

LLM quality on free tier

Llama 3.1 8B (via OpenRouter or KISSKI) produces adequate but not excellent documents. Architecture is provider-agnostic — users can switch between Ollama, OpenRouter, or KISSKI with a config change.

Open Question

Closing the feedback loop

Feedback is collected but doesn't yet influence ranking or generation. Sprint 4 is dedicated to closing this loop — the open question is whether prompt refinement based on past outcomes is more effective than re-ranking.

Risk

Privacy — PII in LLM calls

Candidate profiles contain name, email, salary. LLM API calls send this to third parties. Mitigation: local-only mode (Ollama), clear consent flow, no provider-side persistence.

One command.Every application.

Fragmented Sources

Repetitive Tailoring

No Feedback Loop

Tool Use — Scrapers as Agent Tools

LLM — Structured Output

Memory — Application Tracker

Feedback — Outcome Tracking

Scraping & Search

Matching & Profile

Generation & Apply

Fine-tuning & Polish

Fixed pipeline vs. dynamic planning

Scraper fragility

Document quality evaluation

LLM quality on free tier

Closing the feedback loop

Privacy — PII in LLM calls

One command.
Every application.