Gr. 12 Data Management

Data Visualization

The students should be familiar with bar charts, pie charts, scatter plots, stem and leaf plots, Box and Whisker plots and histograms, both how to produce them and how to extract data from them. They should also know when to use each, and when not to. Shodor interactives has a number of activities on various graphs and charts

Scaling data for histograms

THer shohob be between 5 and 15 bins, taking the sqrt of the number of data points is a good rule of thumb - round o nearest integer Bin width is max - min / # of bins Round to reasonable numbers check bin width * number of bins covers the data start at first min value less 1/2 the bin width Data Visualization from Wikipedia Plots from Wikipedia Create-a-Graph Every picture tells a story Data Visualization: Modern Approaches from Smashing Magazine 50 Great Examples of Data Visualization Graphing Overview David McCandless: The beauty of data visualization Hans Rosling shows the best stats you've ever seen

Data Analysis

Mean, median, mode, quartiles, percentiles, z-score, variance, standard deviation, interquartile ranges, outliers (1.5)*IQR.

Counting and Probability

Probability, sample space, sets and subsets, dependent and independent events. Permutations and combinations, pascals triangle, selections with and without replacement Consider how many way 4 people can line up. The first spot can be taken by any of the four people, the second by any of the 3 remaining, the third from any of the two remaining, and the final slot has no choice. This leads to 4x3x2x1 or 4! permutations or ways to arrange the 4 people. Now consider there are 4 spots in line and 6 people. Following the same logic, we would have 6x5x4x3 ways of arranging these people. We write this as $P_{6,4}$ and calculated as ${{6!}\over {2!} }$, or in general $P_{n,m} = { {n!}\over {(n-m)!} }$ There is one more step, which is if we don't care about the order. In this case, lets say we are picking 4 people from 6. We start by picking our 4 people as above, then divide this by the number of ways we can re-arrange the order of people. Essentially, we divide the two formula above. This means there are $P_{6,4}\div{4!}$ ways to choose 4 people from 6. We write this as $C_{6,4}$ or ${6\choose 4}$, and we calculate it as ${n \choose m} = { {n!}\over {m!(n-m)!}}$

Combining Probabilities

The probability of A or B is $P(A\cup B) = P(A)+P(B)-P(A\cap B)$ The probability of A given B is $P(A|B) = {{P(A\cap B)}\over{P(B)}}$

Probability Distributions

Binomial Distributions

Binomial distributions occur when you have any situation where there are multiple, independent events which have two outcomes. The obvious example is tossing a coin several times in a row. So if we let the probability of heads be p and the probability of tails be q, then we know that (p + q) = 1. Let's further consider tossing a coin 3 times in a row. We know that the probability of getting 3 heads in a row would be $p^3$. Now, what about getting 2 heads and a tail. You might think that that probability would be $p^2q$, but you didn't notice that I said nothing about order. There is in fact three arrangements of two heads and 1 tail, HHT, HTH, THH, so the the actual probablility of getting 2 heads and a tail is $3p^2q$. Now that we're talking about the order of selections, you know that we are going to be using combinatorics, so the chance of choosing 2 heads and a tail is more correctly ${3\choose 2}p^2q$ or more generally, the chance of getting n heads in m tosses is ${m\choose n}p^nq^{m-n}$ If you've played with Pascal's Triangle, that expression should look familiar to you. It is the terms of the binomial expansion $(p + q)^n$

Geometric Distribution

Hypergeometric Distribution

Expected values

Organization of Data for Analysis

Statistical Analysis

Markov chains and transition matrices, setting up initial probability vector and finding probability after N iterations

Culminating Investigation

Resources

Introduction to Probablilty - a complete text from Dartmouth U, free to download

Worksheets

Range, Variance and Standard Deviation Worksheets

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer