CSRD-Lake
CAC 40 — FY2024 — extracted 2026-05-03

Companies

3 of 10 CAC 40 companies in the manifest have a full FY2024 ESG profile in the warehouse. Each profile groups extracted ESRS metrics by topic and shows the confidence score, model attribution, and source-page citation for every value.

Need the bigger picture first? Read the project landing page · or jump to the portfolio rollup.

Profiles available

3 of 10

What does the confidence score mean?

A composite of four internal signals — not a claim of factual correctness. The source citation on every row is the actual verification handle.

1. LLM self-rating

The model rates its own certainty in [0, 1]. Useful but weak alone — models hallucinate confidently.

2. Structural pass

Output parsed into a valid Pydantic ESRSMetric shape. Hard fail → score is 0.

3. Snippet contains value

The verbatim source text we returned literally contains the extracted value. Strong circumstantial evidence; failure halves the score.

4. Language match

Manifest-claimed language matches detected language (cross-check via langdetect — placeholder in v1, always passing).

≥ 0.80 · published mart< 0.80 · human review queue· routing is automatic, never silent

What the score doesn't catch: column-confusion in tables (extracted FY2023 instead of FY2024), unit mistakes (kt read as tonnes), or values picked from chart captions instead of the disclosure proper. The custom dbt test metric_value_in_source_textcatches LLM normalisations (e.g. “129 million” → 129000000) — currently flagging 14 rows in the warehouse for review.

How correctness is actually verified: every row carries (source_page, source_snippet). A human can open the source PDF at the cited page and verify any value in seconds. The 800-datapoint hand-verified gold-set (see README, planned v1.1) is what would let us claim a percentage accuracy — until then, treat published-mart values as system-validated, not human-validated.

Manifest scope · pending ingestion

The full manifest at src/csrd_lake/ingestion/data/cac40.toml also includes 7 companies whose sustainability PDFs have not yet been ingested:

AI.PA · Air LiquideBNP.PA · BNP ParibasCS.PA · AXAOR.PA · L'OrealORA.PA · OrangeSAN.PA · SanofiSGO.PA · Saint-Gobain

Ingestion is a one-line manifest update + ~$0.50 LLM cost per company once a known PDF URL is set.