๐Ÿ”— private-foundry-iq-task.vercel.app
PHASE 3 ยท FABRIC + FOUNDRY IQ

๐Ÿ“Š Flight Data Profile

US DOT Flight Operations 2015 โ€” Fabric Lakehouse: airline_flights โ†—

5,819,079
Total Flights
14
Airlines
9.0
Avg Delay (min)
89,884
Cancelled

Try Semantic JOIN

Select a preset question, then click Search to see the demo in action.

Try:
Fabric IQ โ€” structured
flights table
5.8M rows โ€” US airline operations 2015
Waiting for query...
+
Semantic
JOIN
Foundry IQ โ€” unstructured
DOT policy PDFs
4 documents โ€” passenger rights & regulations
Waiting for query...
Semantic JOIN โ€” unified answer
AI Search โ€” the router behind Semantic JOIN
Your question POST /retrieve unified-airline-kb (Knowledge Base) AI Search router Planning + fan-out KS-1 Fabric indexedOneLake KS-2 Foundry searchIndex LLM synthesis Merge + cite sources Hybrid search (indexed) Vector + keyword
1
Single API call
Your app sends one question to one KB. That's the only call.
2
AI Search plans the route
Agentic Retrieval analyzes the query and decides it needs both KS-1 (flights data) and KS-2 (policy docs). Automatic โ€” no code needed.
3
Fan-out parallel search
AI Search queries both sources simultaneously. Hybrid search (vector+keyword) for OneLake indexed JSON, vector+keyword hybrid for documents.
4
LLM (AI) synthesizes the answer
LLM merges numbers from Fabric with policies from Foundry into one answer with citations from each source.
unified-airline-kb
  โ”œโ”€โ”€ KS-1: flights-onelake (indexedOneLake) โ† Fabric
  โ”œโ”€โ”€ KS-2: sp-airline-policies (searchIndex) โ† SharePoint
  โ””โ”€โ”€ LLM answerSynthesis โ† AI reasoning

Data lives everywhere โ€” Foundry IQ searches it as one.

SharePoint, Fabric OneLake, Azure Blob Storage, the web โ€” wherever your enterprise data lives, one Knowledge Base unifies it all. One API call. Multiple sources. Cited answers.

๐Ÿ“„ What is Foundry IQ? โ€” Microsoft Learn โ†’ ๐Ÿ“ Foundry IQ Deep Dive โ€” Tech Community โ†’

One Question, Two Brains

"Numbers come from Fabric, reasons come from Foundry IQ."

Phase 3 demonstrates Semantic JOIN โ€” a single question answered by combining structured data (this flight dataset from Fabric) with unstructured documents (DOT regulation PDFs indexed from SharePoint via Foundry IQ).

This is not a SQL JOIN. The LLM reasons across both sources and synthesizes a unified answer with citations from each.

Fabric IQ โ€” Structured
flights table (5.8M rows)
Delay counts, cancellation stats, route data
โ†’ "2,566 flights delayed 2+ hrs at JFK"
โšก
Semantic
JOIN
Foundry IQ โ€” Unstructured
DOT Policy PDFs (4 docs)
Passenger rights, compensation rules, regulations
โ†’ "Full refund for 3+ hr delays per DOT"

How this data gets to AI Search

AI Search cannot directly index 5.8M raw flight records from Fabric Lakehouse (CSV/Delta Parquet is not supported by OneLake Indexer). Instead, we use a two-step pipeline:

โ‘  Spark SQL Aggregation (Fabric Notebook)
    5.8M rows โ†’ SQL GROUP BY โ†’ 5 summary JSON documents

โ‘ก OneLake Indexer (AI Search)
    JSON files โ†’ chunk + vectorize โ†’ searchable index

This is the same pattern for any large-scale Fabric data: aggregate first, then index the summaries for AI Search retrieval.

Aggregated JSON Files (Fabric OneLake โ†’ AI Search)

FileRecordsDescription
airline_delay_stats.json14Per-airline delay/cancellation statistics
top_airport_stats.json30Top 30 airports delay/cancellation stats
monthly_trend.json12Monthly operations trend
cancellation_reasons.json4Cancellation reason distribution
jfk_detailed_analysis.json9JFK airport detailed analysis (demo questions)

DOT Policy Documents (SharePoint โ†’ AI Search)

These PDF documents are stored in SharePoint and indexed by AI Search with: SplitSkill (2000 chars, 500 overlap) + EmbeddingSkill (text-embedding-3-large, 3072 dim).

DocumentSize
ANPRM_Airline_Passenger_Rights_2105_AF20.pdf419 KB
Airline_Passenger_Right_CRS_Report_R43078.pdf751 KB
Bumping_Oversales_US_Department_of_Transportation.pdf178 KB
Fly_Rights_US_Department_of_Transportation.pdf352 KB

What is this data?

This dataset contains every US domestic flight in 2015 โ€” 5.8 million records from 14 airlines across 322 airports. Each row represents a single flight with departure/arrival delays, cancellation status, and routing information.

The data is sourced from the US Department of Transportation Bureau of Transportation Statistics and hosted on Kaggle under CC0 Public Domain license.

It is loaded into a Microsoft Fabric Lakehouse as Delta Tables, enabling Spark SQL queries and OneLake indexing for AI Search integration.

Key Schema

ColumnTypeExampleDescription
AIRLINEstrAAIATA airline code
ORIGIN_AIRPORTstrJFKDeparture airport
DESTINATION_AIRPORTstrLAXArrival airport
DEPARTURE_DELAYint45Minutes delayed (negative = early)
ARRIVAL_DELAYint42Minutes delayed at arrival
CANCELLEDint0 / 10 = operated, 1 = cancelled
CANCELLATION_REASONstrBA=Airline, B=Weather, C=NAS, D=Security
DISTANCEint2475Flight distance in miles

EDA Visualizations โ€” Aggregated Data Used in Demo

These charts are generated from the same aggregated JSON files that power the Semantic JOIN demo above.

Fabric Notebook (PySpark + Seaborn) runs SQL aggregation on 5.8M flight records, producing summary statistics stored as JSON in OneLake โ†’ indexed by AI Search โ†’ searchable via Foundry IQ Knowledge Base.

๐Ÿ““ View Fabric Notebook โ†’

Flight Data Profile Charts
๐Ÿ“Š Average Delay by Airline
airline_delay_stats.json (14 airlines)
๐Ÿ“Š Delay Pattern by Hour
hourly aggregation from flights table
๐Ÿ“Š Cancellation Reasons
cancellation_reasons.json (4 reasons)
๐Ÿ“Š Cancellation Rate by Airline
airline_delay_stats.json (14 airlines)