DE int

 0    18 fiche    guest3164346
Télécharger mP3 Imprimer jouer consultez
 
question English réponse English
ETL (Extract, Transform, Load)
commencer à apprendre
A process where data is extracted from a source, transformed (e.g. cleaned or aggregated), and then loaded into a database or data warehouse.
ELT (Extract, Load, Transform)
commencer à apprendre
Raw data is first loaded into the destination (like BigQuery), and then transformed using SQL or other tools inside the warehouse.
DAG (Directed Acyclic Graph – Airflow)
commencer à apprendre
A structure used in Airflow to define workflows. It represents a sequence of tasks that must run in a specific, non-circular order.
Partitioning (BigQuery)
commencer à apprendre
Dividing a large table into parts (usually by date) to make queries faster and cheaper by scanning only relevant partitions.
JOIN (SQL)
commencer à apprendre
A way to combine data from two or more tables based on a related column (e.g. user_id).
HAVING vs WHERE (SQL)
commencer à apprendre
WHERE filters rows before aggregation; HAVING filters after. Example: HAVING COUNT(*) > 100.
PySpark
commencer à apprendre
Python API for Apache Spark. It’s used to process very large datasets in a distributed, parallelized way.
BigQuery
commencer à apprendre
A serverless cloud data warehouse from Google, designed for running fast SQL queries on large datasets.
Data Lake
commencer à apprendre
A storage system for raw, unstructured, or semi-structured data — often used for flexible analytics or staging.
Data Warehouse
commencer à apprendre
A structured database optimized for analysis and reporting, typically holding cleaned and transformed data.
Airflow Operator
commencer à apprendre
A unit of work in Airflow DAGs – defines what each task does (e.g. PythonOperator, BashOperator).
Kafka Topic
commencer à apprendre
A named data stream in Apache Kafka where producers send and consumers receive messages.
IAM (Identity and Access Management – GCP)
commencer à apprendre
A system for managing permissions and access to resources in Google Cloud – defines who can do what.
KPI (Key Performance Indicator)
commencer à apprendre
A measurable value that shows how effectively a process or business is performing (e.g. conversion rate, average delay).
Lazy Evaluation (Spark)
commencer à apprendre
Transformations are not executed until an action (like. count() or. collect()) is called – helps optimize performance.
Retry (Airflow)
commencer à apprendre
A setting that allows a task to be automatically retried after failure, helpful for unstable operations.
Data Validation
commencer à apprendre
The process of ensuring that data is accurate and consistent – includes checking for missing values, duplicates, or wrong formats.
Window Function (SQL)
commencer à apprendre
A function that performs calculations across a "window" of rows related to the current row, without collapsing them into a single result (e.g. ROW_NUMBER(), AVG(...) OVER(...)).

Vous devez vous connecter pour poster un commentaire.