Companies
3 of 10 CAC 40 companies in the manifest have a full FY2024 ESG profile in the warehouse. Each profile groups extracted ESRS metrics by topic and shows the confidence score, model attribution, and source-page citation for every value.
Need the bigger picture first? Read the project landing page · or jump to the portfolio rollup.
Profiles available
3 of 10What does the confidence score mean?
A composite of four internal signals — not a claim of factual correctness. The source citation on every row is the actual verification handle.
The model rates its own certainty in [0, 1]. Useful but weak alone — models hallucinate confidently.
Output parsed into a valid Pydantic ESRSMetric shape. Hard fail → score is 0.
The verbatim source text we returned literally contains the extracted value. Strong circumstantial evidence; failure halves the score.
Manifest-claimed language matches detected language (cross-check via langdetect — placeholder in v1, always passing).
What the score doesn't catch: column-confusion in tables (extracted FY2023 instead of FY2024), unit mistakes (kt read as tonnes), or values picked from chart captions instead of the disclosure proper. The custom dbt test metric_value_in_source_textcatches LLM normalisations (e.g. “129 million” → 129000000) — currently flagging 14 rows in the warehouse for review.
How correctness is actually verified: every row carries (source_page, source_snippet). A human can open the source PDF at the cited page and verify any value in seconds. The 800-datapoint hand-verified gold-set (see README, planned v1.1) is what would let us claim a percentage accuracy — until then, treat published-mart values as system-validated, not human-validated.
Manifest scope · pending ingestion
The full manifest at src/csrd_lake/ingestion/data/cac40.toml also includes 7 companies whose sustainability PDFs have not yet been ingested:
Ingestion is a one-line manifest update + ~$0.50 LLM cost per company once a known PDF URL is set.