DE int

 0    18 tarjetas    guest3164346
descargar mp3 imprimir jugar test de práctica
 
término English definición English
ETL (Extract, Transform, Load)
empezar lección
A process where data is extracted from a source, transformed (e.g. cleaned or aggregated), and then loaded into a database or data warehouse.
ELT (Extract, Load, Transform)
empezar lección
Raw data is first loaded into the destination (like BigQuery), and then transformed using SQL or other tools inside the warehouse.
DAG (Directed Acyclic Graph – Airflow)
empezar lección
A structure used in Airflow to define workflows. It represents a sequence of tasks that must run in a specific, non-circular order.
Partitioning (BigQuery)
empezar lección
Dividing a large table into parts (usually by date) to make queries faster and cheaper by scanning only relevant partitions.
JOIN (SQL)
empezar lección
A way to combine data from two or more tables based on a related column (e.g. user_id).
HAVING vs WHERE (SQL)
empezar lección
WHERE filters rows before aggregation; HAVING filters after. Example: HAVING COUNT(*) > 100.
PySpark
empezar lección
Python API for Apache Spark. It’s used to process very large datasets in a distributed, parallelized way.
BigQuery
empezar lección
A serverless cloud data warehouse from Google, designed for running fast SQL queries on large datasets.
Data Lake
empezar lección
A storage system for raw, unstructured, or semi-structured data — often used for flexible analytics or staging.
Data Warehouse
empezar lección
A structured database optimized for analysis and reporting, typically holding cleaned and transformed data.
Airflow Operator
empezar lección
A unit of work in Airflow DAGs – defines what each task does (e.g. PythonOperator, BashOperator).
Kafka Topic
empezar lección
A named data stream in Apache Kafka where producers send and consumers receive messages.
IAM (Identity and Access Management – GCP)
empezar lección
A system for managing permissions and access to resources in Google Cloud – defines who can do what.
KPI (Key Performance Indicator)
empezar lección
A measurable value that shows how effectively a process or business is performing (e.g. conversion rate, average delay).
Lazy Evaluation (Spark)
empezar lección
Transformations are not executed until an action (like. count() or. collect()) is called – helps optimize performance.
Retry (Airflow)
empezar lección
A setting that allows a task to be automatically retried after failure, helpful for unstable operations.
Data Validation
empezar lección
The process of ensuring that data is accurate and consistent – includes checking for missing values, duplicates, or wrong formats.
Window Function (SQL)
empezar lección
A function that performs calculations across a "window" of rows related to the current row, without collapsing them into a single result (e.g. ROW_NUMBER(), AVG(...) OVER(...)).

Debes iniciar sesión para poder comentar.