término |
definición |
What is a Population in statistics? empezar lección
|
|
The set of units (usually people, objects, transactions, events) that we are interested in studying.
|
|
|
What is a Sample in statistics? empezar lección
|
|
An observed subset of the population.
|
|
|
What is the difference between a Parameter and a Statistic? empezar lección
|
|
A Parameter is a specific characteristic of a population. A Statistic is a specific characteristic of a sample. Values computed from sample data are called statistics, while values calculated using population data are called parameters.
|
|
|
What are the two main types of data? empezar lección
|
|
Qualitative data and Quantitative data.
|
|
|
Describe Qualitative Data. empezar lección
|
|
Consists of attributes, labels or non-numerical entries. It deals with descriptions and can be observed but not measured (e.g., colors, textures, smells, tastes, appearance, beauty).
|
|
|
Describe Quantitative Data. empezar lección
|
|
Consists of numerical measures or counts. It deals with numbers and can be measured (e.g., length, height, area, weight, speed, time, temperature, sound levels, cost, ages).
|
|
|
What are the three main measures of central tendency for quantitative data? empezar lección
|
|
The mode, the median, and the mean.
|
|
|
What is the Mean (Arithmetic Mean) and its main disadvantage? empezar lección
|
|
The arithmetic mean is the sum of the measurements divided by the number of observations in the data set. Its main disadvantage is that it is particularly susceptible to the influence of outliers (values that are very small or large in numerical value).
|
|
|
empezar lección
|
|
The median of n values, sorted from the smallest to the highest, is the middle value. It is the value ranked in position (n+1)/2 if n is odd and is the average of the two middle observations, ranked n/2 and (n/2)+1, when n is even.
|
|
|
empezar lección
|
|
The mode is the most frequent value in the data set. On a histogram it represents the highest bar. The mode is also used for categorical data where we wish to know which is the most common category.
|
|
|
What is the Coefficient of Determination and what does it indicate? empezar lección
|
|
The Coefficient of Determination is a measure of goodness of fit. It measures how well the model fits the observed data, essentially indicating the proportion of the variance in the dependent variable that is predictable from the independent variable(s).
|
|
|
What does the 'b' coefficient (slope) represent in a simple linear trend model for time series? empezar lección
|
|
The 'b' coefficient represents the average, estimated increase/decrease of the variable.
|
|
|
Define Cyclical Component in a time series. empezar lección
|
|
Cyclical component (C) refers to long-term wave-like patterns, which may vary in length, and cycles are longer than 1 year.
|
|
|
Define Seasonal Component in a time series. empezar lección
|
|
Seasonal component (S) refers to short-term regular wave-like patterns, with cycles not longer than 1 year.
|
|
|
Define Trend in a time series. empezar lección
|
|
Trend (T) is a long-run increase or decrease over time.
|
|
|
What are the main components of a classical time series model? empezar lección
|
|
Irregular component (random fluctuations, 'noise') and regular components which include Constant level, Trend, Seasonal component, and Cyclical component.
|
|
|
Explain the decision rule in hypothesis testing. empezar lección
|
|
If the test statistic is in the rejection region, reject H_0. Otherwise, do not reject H_0.
|
|
|
What is the Rejection Region (or Critical Region)? empezar lección
|
|
The rejection region is the set of values of the test statistic for which the null hypothesis is rejected.
|
|
|
empezar lección
|
|
The probability of obtaining a test statistic at least as far from the tested parameter as the statistic that was actually observed, assuming that the null hypothesis is true.
|
|
|
What is a Test Statistic? empezar lección
|
|
A test statistic is a sample statistic which is used to determine whether to reject or not H_0.
|
|
|
What is the Level of Significance (\alpha)? empezar lección
|
|
The Level of Significance (\alpha) is the probability of making type I error. It is the criterion used for rejecting the null hypothesis.
|
|
|
empezar lección
|
|
Type II error is not rejecting H_0 when it is false.
|
|
|
empezar lección
|
|
Type I error is rejecting H_0 when it is true.
|
|
|
What is the Alternative Hypothesis (H_1)? empezar lección
|
|
It is the opposite of the null hypothesis. It never contains the "=", "\leq" or "\geq" sign. It is generally the hypothesis that the researcher is trying to support.
|
|
|
What is the Null Hypothesis (H_0)? empezar lección
|
|
It states the assumption to be tested. It is always about a population parameter, not about a sample statistic. It always contains "=", "\leq" or "\geq" sign. It may or may not be rejected.
|
|
|
What is a Statistical Hypotheses? empezar lección
|
|
A statistical hypothesis is a claim (assumption) about a population made without a full census (based on a sample).
|
|
|
How do you interpret a 95% Confidence Interval for the population mean? empezar lección
|
|
If we were to take many samples and construct a 95% con. int for each, app. 95% of those intervals would contain the true population mean. It does not mean there is a 95% probability that the true mean falls within a single, specific calculated interval.
|
|
|
How is the Confidence Level related to the Confidence Coefficient? empezar lección
|
|
The Confidence Level is a confidence coefficient expressed as a percentage.
|
|
|
What is the Confidence Coefficient? empezar lección
|
|
The Confidence Coefficient is the probability that a randomly selected confidence interval encloses the population parameter.
|
|
|
What is a Confidence Interval? empezar lección
|
|
A Confidence Interval is an interval that is supposed to comprise a population parameter. It is a range of values within which, we believe, the true parameter lies with high probability.
|
|
|
Define Consistency for an estimator. empezar lección
|
|
An estimator (\hat{\theta_n}) is a consistent estimator of \theta if the difference between the estimator and \theta decreases as the sample size increases.
|
|
|
Define Efficiency for an estimator. empezar lección
|
|
The most efficient estimator of \theta is the unbiased estimator with the smallest variance.
|
|
|
Define Unbiasedness for an estimator. empezar lección
|
|
An estimator (\hat{\theta}) is an unbiased estimator of the parameter (\theta), if the expected value of the sampling distribution of the estimator is the parameter itself (E(\hat{\theta}) = \theta).
|
|
|
List the three properties of a good estimator. empezar lección
|
|
Unbiasedness, efficiency, and consistency
|
|
|
empezar lección
|
|
An Estimate (\hat{\theta}) is a specific value of an estimator.
|
|
|
empezar lección
|
|
An Estimator (\hat{\theta}) is a statistic that we use to estimate an unknown population parameter (\theta)
|
|
|
What is Point Estimation? empezar lección
|
|
Point estimate is a single number used to estimate (approximate) the true value of population parameters.
|
|
|
What is Standardization in the context of Normal distribution? empezar lección
|
|
Transformation of a random variable to the variable with the zero mean and the unit variance. If X \sim N(\mu, \sigma) then Z \sim N(0, 1) – Z is standard normal distributed.
|
|
|
When can a Poisson distribution approximate a Binomial distribution? empezar lección
|
|
For large 'n' and small 'p' (n \geq 20 and p \leq 0.05 or n \geq 100 and np \leq 10).
|
|
|
What conditions define a Binomial distribution? empezar lección
|
|
A Binomial distribution describes the number of successes in a sequence of trials were the number of trials (n) is fixed; possible outcomes on each trial success(p)/failure(q); and the trials are independent.
|
|
|
What is a Bernoulli Trial? empezar lección
|
|
Bernoulli trial, there must be only 2 possible outcomes (such as black or red, sweet or sour); one of these outcomes is called a success, and the other a failure; and each experiment and result are completely independent of all others.
|
|
|