| | |
Time: 20 hours Level: Introductory
| | | | |
Introduction Resource
- This unit will introduce you to a number of ways of representing data graphically and of summarising data numerically. You will learn the uses for pie charts, bar charts, histograms and scatterplots. You...
| | | | |
Introducing data Resource
- Chambers English Dictionary defines the word data as follows.
| | | | |
1.1 Introduction Resource
- The data sets you will meet in this section are very different from each other, both in structure and character. By the time you reach the end of the unit, you will have carried out a preliminary investigation...
1.2: Nuclear power stations Resource
- The first data set is a very simple one. Table 1 shows the number of nuclear power stations in various countries throughout the world before the end of the cold war (that is, prior to 1989). The names...
1.3: USA workforce Resource
- The data set in Table 2 comprises the figures published by the US Labor Department for the composition of its workforce in 1986. It shows the average numbers over the year of male and female workers in...
1.4: Infants with SIRDS Resource
- The data in Table 3 are the recorded birth weights of 50 infants who displayed severe idiopathic respiratory distress syndrome (SIRDS). This is a serious condition which can result in death.
1.5: Runners Resource
- The next data set relates to 22 of the competitors in an annual championship run, the Tyneside Great North Run. Blood samples were taken from eleven runners before and after the run, and also from another...
1.6: Cirrhosis and alcoholism Resource
- The data in Table 5, which are given for several countries in Europe and elsewhere, are the average annual alcohol consumption in litres per person and the death rate per 100 000 of the population from...
1.7: Body weights and brain weights for animals Resource
- The next data set comprises average body and brain weights for 28 kinds of animal, some of them extinct. The data are given in Table 6.
1.8: Surgical removal of tattoos Resource
- The final data set in this section is different from the others in that the data are not numerical. So far you have only seen numerical data in the form of measurements or counts. However, there is no...
1.9: Data and questions: summary Resource
- In this section you have met some real data sets and briefly considered some of the questions you might ask of them. They will be referred to and investigated in the remaining sections of this unit. Some...
| | | | | 2: Pie charts and bar charts
2.1: Introduction Resource
- The data set in Table 7 (section 1.8) comprised non-numerical or categorical data. Such data often appear in newspaper reports and are usually represented as one or other of two types of graphical display,...
2.2: Pie charts: surgical removal of tattoos Resource
- Suppose we count the numbers of large, medium and small tattoos from the data in Table 7: there were 30 large tattoos, 16 of medium size and 9 small tattoos. These data are represented in Figure 1. This...
2.3: Pie charts: Nuclear power stations Resource
- Figure 2 shows a pie chart of the number of nuclear power stations in countries where nuclear power is used, based on the data from Table 1.
2.4: Bar charts: nuclear power stations Resource
- A better way of displaying the data on nuclear power stations is by constructing a rectangular bar for each country, the length of which is proportional to the count. Bars are drawn separated from each...
2.5: Bar charts: Surgical removal of tattoos Resource
- Figure 4 shows a bar chart for the data in Table 7 on the effectiveness of tattoo removal.
2.6: Problems with graphics Resource
- In this subsection we consider, briefly, some problems that can arise with certain ways of drawing bar charts and pie charts.
2.7: Problems with graphics: USA workforce Resource
- The danger of using three-dimensional effects is really brought home when two data sets are displayed on the same bar chart. Table 2 may be thought of as consisting of two data sets, one for male workers...
2.8: Problems with graphics: nuclear power stations Resource
- Figure 28 shows a pie chart of the data on nuclear power stations from Table 1. This diagram is similar to Figure 2, except that the data for all countries apart from the five with the largest numbers...
2.9: Pie charts and bar charts: summary Resource
- Two common display methods for data relating to a set of categories have been introduced in this section. In a pie chart, the number in each category is proportional to the angle subtended at the centre...
| | | | | 3: Histograms and scatterplots
3.1: Introduction Resource
- In this section, two more kinds of graphical display are introduced – histograms in section 3.2 and scatterplots in section 3.3. Both are most commonly used with data that do not relate to separate categories,...
3.2: Histograms Resource
- It is a fundamental principle in modern practical data analysis that all investigations should begin, wherever possible, with one or more suitable diagrams of the data. Such displays should certainly show...
3.3: Scatterplots Resource
- In recent years, graphical displays have come into prominence because computers have made them quick and easy to produce. Techniques of data exploration have been developed which have revolutionised the...
3.4: Scatterplots: body weights and brain weights for animals Resource
- In our discussion of the data on body weights and brain weights for animals in section 1.7, we conjectured a strong relationship between these weights on the grounds that a large body might well need a...
3.5: Histograms and scatterplots: summary Resource
- Two common graphical displays, most frequently used for continuous data (arising from measurements), have been introduced in this section. A histogram is in a sense a development of the idea of a bar chart....
| | | | |
4.1: Introduction Resource
- Histograms provide a quick way of looking at data sets, but they lose sight of individual observations and they tend to play down ‘intuitive feel’ for the magnitude of the numbers themselves. We may often...
4.2: Measures of location Resource
- Everyone professes to understand what is meant by the term ‘average’, in that it should be representative of a group of objects. The objects may well be numbers from, say, a batch or sample of measurements,...
4.3: The median Resource
- The median describes the central value of a set of data. Here, to be precise, we are discussing the sample median, in contrast to the population median.
4.4: The mean Resource
- The second measure of location defined in this course for a collection of data is the mean. Again, to be precise, we are discussing the sample mean, as opposed to the population mean. This is what most...
4.5: The mode Resource
- The USA workforce data in Table 2 were usefully summarised in Figure 6, which is reproduced below as Figure 18.
4.6: Measures of dispersion Resource
- During the above discussion of suitable numerical summaries for a typical value (measures of location), you may have noticed that it was not possible to make any kind of decision about the relative merits...
4.7: Quartiles and the interquartile range Resource
- The first alternative measure of dispersion we shall discuss is the interquartile range: this is the difference between summary measures known as the lower and upper quartiles. The quartiles are simple...
4.8: The standard deviation Resource
- The interquartile range is a useful measure of dispersion in the data and it has the excellent property of not being too sensitive to outlying data values. (That is, it is a resistant measure.) However,...
4.9: Sample variance Resource
- It is worth noting that a special term is reserved for the square of the sample standard deviation: it is known as the sample variance.
4.10: A note on accuracy Resource
- To what accuracy should you give the results of calculations? If you look through the examples in this section, you will find that, in general, results have been given either to the same accuracy as the...
4.11: Symmetry and skewness Resource
- For many purposes the location and dispersion of a set of data are the main features of its distribution that we might wish to summarise, numerically or otherwise. But for some purposes it can be important...
4.12: Numerical summaries: summary Resource
- In this section, various ways of summarising certain aspects of a data set by a single number have been discussed. You have been introduced to two pairs of statistics for assessing location and dispersion....
| | | | |
5: Summary Resource
- In this unit, you have been introduced to a number of ways of representing data graphically and of summarizing data numerically. We began by looking at some data sets and considering informally the kinds...
| | | | | References and Acknowledgements
| | |
| |