Agentic AI · Course Project

One command.
Every application.

An autonomous CLI agent that scrapes job boards, ranks matches to your profile, and generates tailored CVs & cover letters — fully local, no cloud required.

keinplankarriere — zsh
$ kpk run --role "ML Engineer" --location Berlin --generate
● Scraping 3 sources in parallel...
● Deduplicating & extracting skills...
● Ranked 47 jobs → 12 matches (≥70%)
✓ Generated 12 CVs + 12 cover letters
✓ Tracked in local database. Done in 34s.
$
0-step
Pipeline stages
0 tools
Concurrent scrapers
0 LLMs
Provider support
0%
Local execution
Job hunting is manual, repetitive,
and perfectly suited for an agent

Every step of the job search workflow — scraping, evaluating, tailoring, tracking — is a task an autonomous agent can own with better consistency and zero fatigue.

🔍

Fragmented Sources

Jobs scattered across LinkedIn, Indeed, and government portals. No unified view, no ranking — just noise across browser tabs.

📄

Repetitive Tailoring

Each application demands a customised CV and cover letter. Same structure, different emphasis. Exactly what LLMs do best.

📊

No Feedback Loop

Which skills got interviews? Which phrasing landed offers? Without a data pipeline, candidates can't learn from outcomes.

The agent opportunity: every step of this workflow — scraping, parsing, ranking, generating, tracking — is modelled as an agent tool. The orchestrator chains them in a single command. The human only decides: what role, what location, go.
Seven stages, one run command

A central orchestrator coordinates 7 stages in sequence from a single command. Each stage is independent — it can fail without blocking the rest.

Agent Pipeline triggered by a single command
STEP 1
Scrape
3 sources
in parallel
STEP 2
Deduplicate
Fuzzy match
85% threshold
STEP 3
Extract
Skill parsing
NLP tagging
STEP 4
Rank
Weighted score
vs. profile
STEP 5
Market
Trends & salary
analysis
STEP 6
Track
SQLite persist
status history
STEP 7
Generate
LLM → .docx
CV + letter
● Tool invocation ● Analysis ● Intelligence ● Persistence ● LLM generation
Agentic pattern — Plan & Execute: The pipeline is a fixed plan, each step is a tool invocation. Errors are collected, never thrown — partial results always returned. This mirrors how production agents handle tool failures gracefully.
Tools, memory, LLM, and the orchestration loop

Each agentic capability maps to a concrete module. Tool abstraction, structured LLM output, persistent state, and feedback collection.

⚙️

Tool Use — Scrapers as Agent Tools

3 sources run in parallel, each as an independent tool with its own failure handling. Two strategies: REST API for Arbeitsagentur (official public API, no auth needed), and browser automation via Playwright for Indeed (embeds data in JS and blocks HTTP) and LinkedIn (tries the guest API first, falls back to Playwright if blocked).

🧠

LLM — Structured Output

The agent instructs the LLM to return a strict JSON structure — not free text. It reorders your experience, rewrites bullet points, and generates a company-specific cover letter. Responses are cached to avoid redundant calls. Supports Ollama, OpenRouter, and KISSKI.

💾

Memory — Application Tracker

The agent maintains a persistent record of every application: status (Wishlist → Applied → Interview → Offer), key dates, recruiter contacts, and generated documents. This is the agent's long-term memory across sessions.

🔄

Feedback — Outcome Tracking

After each application, the user rates the generated documents and records the outcome. Over time this builds a dataset of what worked — the foundation for prompt refinement and re-ranking tuned in Sprint 4.

Local-first Ollama OpenRouter KISSKI Browser automation REST scraping Local DB Document export Response caching Docker Web dashboard (optional)
Three layers, CLI-first

The agent runs entirely from the terminal. The web UI is an optional dashboard. All state is local. LLM calls go through OpenRouter, KISSKI (academic API), or a local Ollama instance.

Interface
Command LinePrimary interface
REST APIOptional
Web DashboardOptional
Agent Core
Orchestrator Scrape → Deduplicate → Extract Rank → Market → Track → Generate
Storage & External
SQLiteApplications + history
YAML ProfileCandidate config
Filesystem.docx output
OpenRouter / KISSKI / OllamaLLM gateways
Job Board APIs3 scraped sources
Local-first design: No cloud account needed. Candidate profile = YAML file. Jobs stored in SQLite. Documents are .docx on disk. LLM can point at local Ollama. The agent is fully functional offline after the initial scrape.
Four sprints, each ships a working agent

Usable after Sprint 1. Each sprint adds agentic capabilities — from basic tool use to closed-loop learning.

Sprint 1

Scraping & Search

  • Playwright browser automation setup
  • 3 scrapers: Arbeitsagentur, LinkedIn, Indeed
  • Initial LLM connection (KISSKI / Ollama)
  • Basic search config: role + location input
  • CLI command to trigger the pipeline
✓ Agent collects real job data
Sprint 2

Matching & Profile

  • Candidate profile configuration
  • Fuzzy deduplication across sources
  • Skill extraction from listings
  • Weighted ranking against your profile
  • Application memory & status tracking
✓ Agent ranks jobs for you specifically
Sprint 3

Generation & Apply

  • Structured LLM output for CV rewriting
  • Cover letter generation per job
  • Export to .docx format
  • Auto-apply to compatible job boards
  • Market trend report per search
✓ Agent applies on your behalf
Sprint 4

Fine-tuning & Polish

  • Feedback collection on outcomes
  • Prompt refinement based on results
  • Re-ranking tuned by past successes
  • Edge case handling & error recovery
  • Performance & cost optimisation
✓ Agent improves with each run
Where we need input

Honest trade-offs and active design decisions — not blockers.

Open Question

Fixed pipeline vs. dynamic planning

Current orchestrator follows a fixed 7-step plan. Should the agent dynamically decide which steps to run? E.g. skip generation if no high-match jobs. Trade-off: predictability vs. autonomy.

Risk

Scraper fragility

Indeed and LinkedIn actively block bots. Mitigation: Arbeitsagentur uses the official REST API and is fully stable. LinkedIn tries the public guest API first and falls back to Playwright. Indeed uses browser automation only. All failures are isolated — partial results always returned.

Open Question

Document quality evaluation

LLM-as-judge is circular. Human eval is slow. Application-to-interview conversion is the real metric but needs weeks of data. Should we use rubric-based scoring as an interim proxy?

Risk

LLM quality on free tier

Llama 3.1 8B (via OpenRouter or KISSKI) produces adequate but not excellent documents. Architecture is provider-agnostic — users can switch between Ollama, OpenRouter, or KISSKI with a config change.

Open Question

Closing the feedback loop

Feedback is collected but doesn't yet influence ranking or generation. Sprint 4 is dedicated to closing this loop — the open question is whether prompt refinement based on past outcomes is more effective than re-ranking.

Risk

Privacy — PII in LLM calls

Candidate profiles contain name, email, salary. LLM API calls send this to third parties. Mitigation: local-only mode (Ollama), clear consent flow, no provider-side persistence.