US DOT Flight Operations 2015 โ Fabric Lakehouse: airline_flights โ
Select a preset question, then click Search to see the demo in action.
SharePoint, Fabric OneLake, Azure Blob Storage, the web โ wherever your enterprise data lives, one Knowledge Base unifies it all. One API call. Multiple sources. Cited answers.
Phase 3 demonstrates Semantic JOIN โ a single question answered by combining structured data (this flight dataset from Fabric) with unstructured documents (DOT regulation PDFs indexed from SharePoint via Foundry IQ).
This is not a SQL JOIN. The LLM reasons across both sources and synthesizes a unified answer with citations from each.
AI Search cannot directly index 5.8M raw flight records from Fabric Lakehouse (CSV/Delta Parquet is not supported by OneLake Indexer). Instead, we use a two-step pipeline:
This is the same pattern for any large-scale Fabric data: aggregate first, then index the summaries for AI Search retrieval.
| File | Records | Description |
|---|---|---|
| airline_delay_stats.json | 14 | Per-airline delay/cancellation statistics |
| top_airport_stats.json | 30 | Top 30 airports delay/cancellation stats |
| monthly_trend.json | 12 | Monthly operations trend |
| cancellation_reasons.json | 4 | Cancellation reason distribution |
| jfk_detailed_analysis.json | 9 | JFK airport detailed analysis (demo questions) |
These PDF documents are stored in SharePoint and indexed by AI Search with: SplitSkill (2000 chars, 500 overlap) + EmbeddingSkill (text-embedding-3-large, 3072 dim).
| Document | Size |
|---|---|
| ANPRM_Airline_Passenger_Rights_2105_AF20.pdf | 419 KB |
| Airline_Passenger_Right_CRS_Report_R43078.pdf | 751 KB |
| Bumping_Oversales_US_Department_of_Transportation.pdf | 178 KB |
| Fly_Rights_US_Department_of_Transportation.pdf | 352 KB |
This dataset contains every US domestic flight in 2015 โ 5.8 million records from 14 airlines across 322 airports. Each row represents a single flight with departure/arrival delays, cancellation status, and routing information.
The data is sourced from the US Department of Transportation Bureau of Transportation Statistics and hosted on Kaggle under CC0 Public Domain license.
It is loaded into a Microsoft Fabric Lakehouse as Delta Tables, enabling Spark SQL queries and OneLake indexing for AI Search integration.
| Column | Type | Example | Description |
|---|---|---|---|
| AIRLINE | str | AA | IATA airline code |
| ORIGIN_AIRPORT | str | JFK | Departure airport |
| DESTINATION_AIRPORT | str | LAX | Arrival airport |
| DEPARTURE_DELAY | int | 45 | Minutes delayed (negative = early) |
| ARRIVAL_DELAY | int | 42 | Minutes delayed at arrival |
| CANCELLED | int | 0 / 1 | 0 = operated, 1 = cancelled |
| CANCELLATION_REASON | str | B | A=Airline, B=Weather, C=NAS, D=Security |
| DISTANCE | int | 2475 | Flight distance in miles |
These charts are generated from the same aggregated JSON files that power the Semantic JOIN demo above.
Fabric Notebook (PySpark + Seaborn) runs SQL aggregation on 5.8M flight records,
producing summary statistics stored as JSON in OneLake โ indexed by AI Search
โ searchable via Foundry IQ Knowledge Base.
๐ View Fabric Notebook โ