open to ML / SWE / LLM roles · Sydney, Australia

anyesh

Anish Shrestha

Machine learning & software engineer. Building LLM infrastructure.

focus: ML engineering · SWE · LLM systems · Python · agent infrastructure
open to: ML / SWE / LLM-engineering roles · Sydney AU and remote
reach: sir.anishshrestha@gmail.com

projects

budget=1024  n_ctx=16384

fact turn:   66/1024  ev=0   rec=0
filler  0:  154/1024  ev=0   rec=0
filler  1:  300/1024  ev=0   rec=0
filler  2:  422/1024  ev=0   rec=0
filler  3:  549/1024  ev=0   rec=0
filler  4:  706/1024  ev=0   rec=0
filler  5:  844/1024  ev=0   rec=0
filler  6:  621/1024  ev=11  rec=0   <- eviction fires
filler  7:  714/1024  ev=16  rec=4   <- kv_restore fires
filler  8:  913/1024  ev=21  rec=4
filler  9:  692/1024  ev=30  rec=5
filler 10:  712/1024  ev=41  rec=8
filler 11:  687/1024  ev=55  rec=10
PROBE:      595/1024  ev=68  rec=10

"favorite number?" -> "4242"   PASS

fig. 01 · real transcript, 14-turn session under a 1024-token budget on Qwen 2.5 7B; the fact planted at turn 1 survives 68 evictions and is recalled at turn 14.

EVOKE

Long-running agent sessions outgrow the physical KV cache within a few turns. EVOKE evicts low-relevance blocks under budget pressure and recovers them recompute-free through a custom save/restore primitive in a forked llama.cpp.

evictions survived: 68
recoveries: 10
vs re-prefilling: 20–32× faster

writeup github ↗

```
stale  y:   0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
emitted y': h  h  h  h  h  h  D  S  .  h  h  h  h  h  h  h  h  h
```
held 15/17 · diverge at 6 (edit's effect enters here) · serial burst at 7 · position 8 skipped, re-anchored at 9
fig. 02 · one divergence, resolved: a rejected draft token triggers a short serial burst, then an n-gram match re-anchors past the changed span so agreement downstream is salvaged too.
redraft

When you edit an LLM's context, redraft recomputes only what the edit actually changed: the stale answer is replayed as a self-speculative draft, held where the model still agrees, and reserialized only past the divergence. Implemented as a streaming mode inside llama.cpp.

document summaries
2.31× median

factual answers
2.76× median

code review
1.66× median

writeup github ↗
incremental inserts vs from-scratch

100–236× faster

propagation through function DAGs

~135 ns/node

collection operators

9

fig. 02 · incr benchmarks, measured with criterion.

incr

A Rust library that tracks dependencies between computations automatically and reruns only what a change actually affects. One engine, two surface crates: single-threaded with zero atomic-fence cost, or Send + Sync with lock-free reads. Same API, one-line swap.

writeup github ↗
claude code cursor memory storeSQLite · usearch · tantivy MCP any agent

reinforces · contradicts · supersedes

fig. 03 · a single memory store ingesting every tool's history, served back over MCP.

second-brain

Claude Code and Cursor each keep conversation history locked in their own format. second-brain ingests them all into one graph-backed store, embeds everything for semantic search, and serves it over MCP so any agent can recall what was discussed, decided, and built.

writeup github ↗
```
key = blake3(
    command  + cwd
  + hash(src/lib.rs) + hash(src/main.rs) + hash(Cargo.lock) + ...
  + env(RUSTFLAGS, CARGO_TARGET_DIR)
)
```
fig. 04 · a hit is a proof (inputs identical), not a gamble: 23.3% tool-time saved on real agent runs, 99.97% LLM latency saved on hits.
verdant

Like make: if the inputs haven't changed, the output can't have changed. Verdant keys agent tool calls and LLM completions on actual content and returns exact bytes from prior executions.

writeup github ↗
snap AI tags daily outfits

self-hosted · any OpenAI-compatible API · OIDC

fig. 05 · works great on a Raspberry Pi 5.

wardrowbe

Self-hosted wardrobe management with AI-powered outfit recommendations: photos in, AI tagging, daily suggestions by weather and occasion. Your data stays on your hardware.

writeup github ↗

all projects, indexed

name	kind	started	status	stack
EVOKE	oss	2025	in-progress	Python · C++ · llama.cpp fork · CUDA · Qwen · Jacobian lens
redraft	oss	2026	shipped	Python · C++ · llama.cpp · FastAPI · pytest
incr	oss	2023	shipped	Rust · Cargo · proptest · criterion
second-brain	oss	2025	active	Rust · SQLite · usearch · tantivy · BGE-small · MCP
verdant	oss	2025	active	Rust · blake3 · MCP
cognitive-cache	oss	2024	shipped	Python · scikit-learn · networkx · Hypothesis
wardrowbe	oss	2024	shipped	Next.js · TypeScript · FastAPI · Python · PostgreSQL · Redis · Docker · Ollama
memories-for-llms	oss	2025	in-progress	Python · SQLite · QLoRA · unsloth · Qwen
wardrowbe.com	proprietary	2024	active	Next.js · TypeScript · FastAPI · Python · PostgreSQL · iOS · Android · Stable Diffusion
Certus	oss	2024	shipped	Python · Qwen 2.5 Coder · QLoRA · Hypothesis · unsloth
skillprobe	oss	2025	active	Python · Claude Code · Cursor
eon	side	2024	active	Rust · Python · NEAT · Godot · WebSocket
art_gan	side	2020	archived	Python · GAN
dreamery	proprietary	2023	shipped	SvelteKit · ComfyUI · RunPod · DeepFace · Stripe · PostgreSQL · microservices

My old stuff lives here → browse the archive

work history

2022-05 now
Senior Software Engineer
AlayaCare active

Backend and product engineering on a cloud-based SaaS for home care. Architected the Support at Home reconciliation pipeline, built a third-party-API testing harness, and lead AI coding-agent adoption as the Australian engineering team's AI champion.
- Support at Home (2025–2026) — architected and built the entire reconciliation engine and pipeline for Services Australia's new Support at Home funding model (the successor to Home Care Packages).
- Testing harness (2025–2026) — Services Australia doesn't provide sandbox or test endpoints, so I architected and built our own: a mock / monkey-patching layer that suppresses third-party API calls in-app and lets us run the rest of the system end-to-end against fixtures.
- AI champion, Australian engineering team (2026) — proposed, built, and adopted across the team an end-to-end Cypress test-generation framework on top of Cursor's hooks, skills, and commands. Given a PRD or Jira ticket, the LLM explores the web app through browser tools, finds useful selectors, maintains memories and navigation indexes, and writes Cypress specs nearly one-shot.
- Engineered HCP invoicing, claiming, smart reconciliation, and budget management. 30% reduction in customer support tickets.
- Shipped the auto travel time feature in the scheduling and payroll workflow. 25% efficiency gain for clients.
- Pioneered a core product delivery team driving technical design and strategic value for Australian clients. 100% compliance with Australian standards.
PythonTypeScriptVue.jsPostgreSQLCeleryRedisMySQLCursorCypressNewRelic
2020-02 2021-11
Machine Learning Engineer
Fusemachines

ML engineer across three Fusemachines clients: TIME Magazine, Hospital for Special Surgery, and Fuse AI. Pipelines, OLAP consolidation, viral-content prediction, implant supply chain optimization, and ML curriculum.
- TIME Magazine — streamlined data pipelines for brand audits, insights, and strategic recommendations. Consolidated scattered data into a single OLAP system.
- TIME Magazine — collaborated with the Data VP and a Ph.D. on a viral-content prediction ML pipeline. 90% accuracy.
- TIME Magazine — automated reporting systems. 75% reduction in report generation time.
- Hospital for Special Surgery — ML-driven implant supply chain optimization. Analyzed 15,000+ cases, predicted patient implant needs at over 90% accuracy.
- Fuse AI — restructured and rebuilt the best ML and deep learning academic courses taught in Fuse Classroom to tens of thousands of students worldwide. Implemented core ML and NLP algorithms at the maths level to explain mechanics.
PythonPyTorchTensorFlowSklearnGCPVertex AILinear AlgebraProbabilityCalculus

credentials

education

Bachelor of Science in Computer Science

Lord Buddha Education Foundation (APU) · Kathmandu, Nepal

2016 – 2019

certifications

TensorFlow Developer Certification ↗

TensorFlow / Google
Rasa Developer Certification ↗

Rasa
Rasa Advanced Deployment Certification ↗

Rasa
Sequence Models ↗

deeplearning.ai
Mathematics for Machine Learning ↗

Imperial College London / Coursera
Machine Learning ↗

Stanford Online
Getting Started with Machine Learning ↗

AWS

speaking

2024-06

Google I/O Extended

"What does the future of AI hold for us"

Panelist · NSW Teachers Federation Conference Centre, Sydney
2024-05

Google Developer Student Clubs — Google Labs

"Leveraging open-source text-to-image generative AI for practical applications"

Speaker · Google HQ, Sydney
2024-04

Google Developer Group — Google Cloud

"Dreamery: generative-AI on GCP. Distributed services, serverless GPU, 90% cost reduction"

Speaker · Google Developer Group, Sydney
2020-11

Fuse AI Training

"End-to-end ML pipeline development and deployment workshop"

AI Instructor · Fusemachines

anyesh

projects

EVOKE

redraft

incr

second-brain

verdant

wardrowbe

all projects, indexed

work history

Senior Software Engineer

Machine Learning Engineer

credentials

education

certifications

speaking

elsewhere