朱 / SHU · A LENS ON JAPANESE FILINGS

N° 047 / TOKYO · NEW YORK

VOL. II · MMXXVI

EDINET FEED · LIVE●

Vol. II, Edition 047·有価証券報告書、英訳

Japan’s annual reports.Read in English. With span-cited receipts.

Yūhō are Japan’s annual securities reports: the equivalent of US 10-Ks, ~88,000 pages filed each year by listed companies. YuhoLens reads them in English with every claim, currency, margin, and segment linked back to a page and span in the source. Open weights, GGUF on AMD silicon.

Read the case Try the model

§ 01The reading problem読まれない

Every Japanese annual report. Read in English. With receipts.

Japan publishes 88,000 pages of rigorous financial disclosure each year. Almost no one outside Japan reads them.

0.000Citation rate

3.88KG-2 coherence

14Bnekomata-qfin

1×MI300X

~$80Total compute

Beat 01The wall

The wall.

Eighty-eight thousand pages of Japanese regulatory filings. Published annually. Mostly unread outside Japan.

Beat 02The translation gap

The translation gap.

Machine translation loses the meaning. Professional translation takes weeks and costs thousands.

Beat 03The lens

The lens.

YuhoLens reads the source. Translates with context. Refuses when the source doesn't say so.

Read a sample memo How it was built

§ 02How it works仕組み

A four-stage pipeline. Span-grounded. Refuses when uncertain.

Section-split → translate-with-context → citation-grounder → judge. Every claim ties to a verbatim Japanese span; sentences without grounding are replaced with [evidence insufficient].

Step 01 / Ingest

Paste any EDINET row or ticker.

Pull a row from EDINET-Bench, or upload your own filing. The pipeline runs section-split and span-grounding in one query.

Step 02 / Fetch

We fetch the source.

Section-split, regex-bounded, page-aligned. Every claim will trace back to a specific span.

Step 03 / Read

Read it in English. With receipts.

Span-cited memo. Hover any number to see the original Japanese and page reference.

§ 03The receipt領収書

Open weights. Open eval. Every row maps to a script in the public repo.

The whole pipeline, corpus build, SFT, ORPO, KG-2 eval, GGUF export, reproduces in one MI300X-day. ~$80 of compute. No private data, no held-out tricks; click any row to open the script that produced it.

RECEIPT · 7 ROWS

BF16 weightsMIT · HuggingFace

GGUF Q3–Q8 quantsFive sizes · 7.18–14.03 GiB

KG-2 eval scripts50-prompt set · graders

DPO + ORPO logsFull training run history

§ 04Hardware物理

Trained on AMD Instinct MI300X. 192 GB HBM3. ROCm 7.0.

Full-parameter SFT of a 14B model at sequence length 8,192 needs ~140 GB peak VRAM. The MI300X has 192 GB of HBM3 in a single accelerator, an 80 GB H100 cannot fit this run. We trained on a single MI300X for 23 days at ~$3.50/hour, then exported six GGUF quantizations so the same model fits on consumer 8 GB laptops.

192 GB

HBM3 in one accelerator

Largest single-GPU memory in production. An 80 GB H100 cannot fit this run.

ROCm 7.0

Full-stack open

Same toolchain in dev and prod. PyTorch, FlashAttention, vLLM, all upstream.

5.3 TB/s

HBM3 bandwidth

Why long-context Japanese filings stream through SFT without OOM at seq_len 8 192.

Same weights, six sizes, 7.18 → 14.03 GiB

Five GGUF quantizations ship with the model. Click any bar for size delta against the Q4_K_M baseline.

10.06 tok/s on an 8 GB consumer laptop (Q3_K_M)

Q3_K_M

Q4_K_M

Q5_K_M

Q6_K

Q8_0

Train on MI300X. Run on a Macbook. Same weights.

§ 05Get it開示

Open weights.
Open eval. Open ledger.

BF16 weights for the lab, GGUF Q4–Q8 for the laptop, and the full eval pipeline for the auditor. MIT-licensed today.

01Read a memoBF16 weights for the labHUGGINGFACE · BF16 →02Run it locallyGGUF Q4_K_M for the laptopHUGGINGFACE · GGUF →03Reproduce the evalFull pipeline for the auditorGITHUB →

Read a sample memo

YuhoLens v2.5 · MIT (code) · pfnet/nekomata-14b-pfn-qfin (base · Qwen 1) · 1,910 EDINET-Bench rows · Built for the AMD Developer Hackathon

Every Japanese annual report. Read in English. With receiptsreceipts.