task-rag.md

The team needs an internal workflow that ingests messy legal-style documents, pulls usable information out of them, and turns that information into grounded draft outputs an operator can edit.

The inputs will not be clean. Expect scanned pages, low-resolution PDFs, handwritten notes, partially illegible records, and inconsistently formatted files. Your system has to cope with that.

At a high level, the system you build should:

Ingest and process the source documents.
Extract usable text and structured fields.
Retrieve relevant evidence from those documents.
Generate grounded draft responses or legal-style drafts.
Get better over time by learning from how operators edit the default drafts.

The kind of draft output you choose to generate is up to you. Reasonable picks include:

A title review summary.
A case fact summary.
A notice-related summary.
A document checklist.
A first-pass internal memo.

Whatever you choose, the output must be grounded in the underlying documents. We are not looking for confident-sounding text built on unsupported assumptions.

Task Requirements

1. Document Processing

Accept messy legal-style documents and pull useful content out of them. Concretely:

Perform OCR or text extraction over scanned or noisy files.
Reasonably handle partially unclear inputs.
Produce extracted text plus structured data that downstream steps can actually use.

The output of this stage should be ready to feed into retrieval and drafting without further cleanup.

2. Grounded Retrieval

Build a retrieval layer over the processed documents so that generation is anchored to actual source material. The retrieval layer should:

Surface the relevant passages for a given drafting task.
Feed that evidence into the generation step.
Make it possible to inspect which evidence supported which part of the output.

The goal is grounded answering, not generic generation.

3. Draft Generation

Using the processed content and retrieved evidence, generate a draft response or draft legal-style output. A good draft is:

Relevant to the provided documents.
Grounded in retrieved evidence.
Structured well enough to be useful as a first pass.

We are not evaluating legal correctness. We are evaluating whether the output is well supported by the source material and whether the system around it is designed well.

4. Improvement from Operator Edits

Assume an operator reviews the default draft and edits it. Your system should:

Capture those edits.
Extract something reusable from them.
Use that signal to make future drafts better.

We are looking for a real improvement loop, not a side-by-side version diff.

Required

Source code.
README with setup and run instructions.
Short architecture overview.
Brief write-up of assumptions and tradeoffs.
Sample inputs and outputs.
Evaluation approach and results.

Optional

Nice to have, but not required:

API endpoints.
Simple UI.
Tests.
Docker setup.

Evaluation Rubric

Total: 100 points

1. Document Processing — 25 points

Handling of messy inputs.
OCR / extraction quality.
Usefulness of extracted and structured outputs.
Whether the extracted output is genuinely usable downstream.

2. Retrieval and Grounding — 25 points

Retrieval quality.
Relevance of retrieved context.
Whether generated outputs are grounded in source material.
Whether supporting evidence can be inspected.
How well unsupported generation is controlled.

3. Draft Quality — 10 points

Usefulness of the generated draft.
Clarity and structure.
Consistency with the source documents.
Overall quality as a first-pass output.

4. Improvement from Edits — 25 points

How edits are captured.
Whether reusable patterns are learned.
Whether future outputs improve meaningfully.

5. Code Quality and System Design — 10 points

Code organization.
Maintainability.
Modularity.
Error handling.
Scalability of the overall design.

6. Documentation and Clarity — 5 points

Ease of understanding.
Setup clarity.
Quality of explanation and reviewer experience.

Notes

You may use mock or synthetic sample documents.
You may simulate operator edits if needed.
Keep the scope practical.
We care more about engineering quality, grounding, and thoughtful design than visual polish.
Your README, architecture notes, and sample inputs/outputs matter as much as the code. Do not skip them.
Scope down where needed. Pick the parts you can do well and ship those cleanly.

Reviewer Focus

How messy documents are processed.
How retrieval supports the generated output.
How the draft stays grounded in source evidence.
How operator edits improve future results.

For model you can use the APIs that would work better here, like maybe pdf processing from gemini or openai or from Anthropic or Google's AI services.. but whenever you need API keys let me know I will export it in the PATH..but you could make the complete pipeline before it and you need to do a shit load of Research to find the best of best as of today...for me quality matters a lot..code quality and code output quality and ask me questions when you have any confusion

ehzawad/task-rag.md

Select an option

No results found