: Improving the robustness of distinct aggregations and adaptive query execution.
: Ensuring that technical tutorials evolve alongside the libraries they describe.
: In-depth technical articles on Spark internals, such as Unified Memory Management and performance optimizations for Parquet. benfradet,github,io
The site primarily documents Fradet's work in data engineering and distributed systems. Key themes include:
: Enhancements to the Data Source API to prevent double-filtering and speed up complex type caching. GitHub Ecosystem Integration : Improving the robustness of distinct aggregations and
: Strategies for avoiding GC overhead using off-heap memory.
The site is built using GitHub Pages , reflecting a commitment to the "documentation-as-code" philosophy. This allows for: benfradet,github,io
Fradet's work often bridges the gap between high-level data processing and low-level system performance. His write-ups typically cover: