Michael Baker - Thesis - Problems in Longterm Forecasting and Planning
About | Contents
Previous: 5. Forecasting | This: 6. Criticisms of Forecasting | Next: 7. Revised View of Forecasting
Following the outline of forecasting methods in the last chapter I shall go over some of my criticisms of forecasting in this chapter. Mos of my criticisms are due to forecasting models not living up to my ideal. My ideal forecasting model would be one:
In the process of setting out my criticisms I shall use examples of forecasts mainly in the transport field. My criticisms are in three areas, which are to do with forecasting models, the data input to those models and other criticisms. (The main forecasting exercise which I shall use for examples of criticisms is Tanner (1974). This is not because I think that it is a particularly good or bad example of forecasting, but because of its very explicit explanation of what was done, it illustrates most of my criticisms) [1] .
I shall make criticisms of forecasting models in three broad areas. These are the level of aggregation used, the relationships used in the models and the data used to calibrate the models.
I shall show that one of the ultimate points in the forecasting process is the extrapolation of trends in variables. If a composite variable is not homogeneous it may well be that its several components display different trends over time. However my using the aggregate of these components this variation will not be seen.
An example of the problems caused by using an aggregate measure model is the
tonne-kilometre/GDP relationship used by Tanner (1974)
to forecast freight transport.
Having said that ... goods vehicles ... contribute substantially to traffic
and ... give rise to particular policy issues, and must therefore be
carefully studied
(Tanner 1974, p9), he
develops a very simple model.
He justifies this by saying ... it would have been preferable to use
a method that gave forecasts disaggregated by vehicle weight or load.
However understanding of how factors which influence lorry size will operate
in the future is insufficient for this to be done with any confidence
(Tanner 1974, p24).
He also recognises that the tonne-km/GDP relationship may change due to increasing
services as a proportion of GDP, environmental restrictions on Road Transport
leading to shorter hauls and technological improvements leading to lighter
weight per unit of GDP.
However he concludes analysis of these possibilities in terms of particular
industries or commodities is beyond the scope of this report
(Tanner 1974, p25).
Having made forecasts of the tonne-km of freight carried by road he goes on to use this in conjunction with a projection of average carrying capacity of vehicles to get a forecast of road freight vehicle kilometres. However due to the aggregate nature of his forecasts he takes no account of changes in the distribution of vehicle size within the vehicle stock. He further goes on to make a projection of increases in kilometres travelled by each vehicle without considering the relationship between vehicle size and kilometres travelled.
The advantages of using an aggregate model are that the model is easy to construct and calibrate, and that few inputs are required. However aggregate models suffer from the disadvantage that they obscure relationships with the system under study. Consequently it is not possible for the forecaster (or others) to be sure if the model will continue to be reliable in the future, since the underlying relationships may change.
There are at least four ways in which relationships used in models can be criticised. These include having no causal explanation for a relationship, the use of cross-sectional relationships, the exclusion of relevant variables and lack of understanding.
If there is no causal explanation for an observed relationship there can be no way
of knowing if the relationship will continue in the future.
As an example of a relationship used without causal explanation
(Tanner 1974, p9) assumes that the total
of all ... elements of cost [for cars, other than fuel] will, in real terms, decrease by
1 per cent per year from 1972 onwards
on the basis of observed trends over the period 1952-1972.
A further example is the extrapolation of a trend in goods vehicle load factors by Tanner (1974, p31) without any understanding of the underlying causes of the historic trend.
In many cases where data on a relationship between variables is not available over a period
of time (longitudinal) the relationship is defined for subgroups (such as income bands and
geographical areas) at one point in time (cross-sectional).
For example as a justification for using a car kilometre/average income relationship
Tanner (1974, p16) uses relationships found
from the 1972-73 National Travel Survey (Department of Energy 1974b).
He finds a positive relationship between income and kilometres per car.
However the relationship found may be influenced by factors such as residential location and life
style and may therefore overestimate the true effect of income
(Tanner 1974, p16).
In this example there is no guarantee that the cross-sectional relationship will be the
same as the longitudinal relationship because it would imply increasing
average income leading to, or at least being associated with, changed
average residential location and life style
.
In general cross-sectional and longitudinal relationships can not be assumed to be the same.
For simplicity in a forecasting model variables which have an effect on the variables being forecast are often excluded. Examples of variables which Tanner (1974) did not consider in his car ownership forecasts were population density and public transport provision.
Cross (1975, p41) makes the following comment on
forecasts of car ownership in the London Traffic Survey
(Greater London Council 1966):
A real relationship has been found for the year 1962 between residential density and car
ownership and it is independent of income.
The forecasters reject this in favour of a model which is much more
sensitive to net household income and thus capable of giving any result
desired by relatively small adjustments in the rate of increase of the G.N.P.
Do the forecasters imply that they do not believe the relationship including
residential density or that it is insignificant?
There would appear to be many instances in which there is a basic lack of understanding
of the processes which influence variables being forecast.
Several examples of lack of understanding are displayed by Tanner (1974).
For example he relates car use to cost.
Although it is usually argued that travel behaviours is influenced by generalised cost,
including travel time, it is not plausible to suppose that increases in generalised cost
due to higher incomes and therefore higher valuations of car travel times will lead to lower
car ownership levels; one expects the reverse to apply ... In this report calculation
are made in two ways, with and without a valuation of travel time; fortunately, because
of the calibration process employed, the alternative methods give rise to very similar
forecasts
Tanner (1974, p10) (emphasis added).
In his forecasts of bus transport he does not consider the effects of variations in
incomes or fuel prices because it is not clear how the figures would be affected by alternative
assumptions about trends in incomes or fuel prices
Tanner (1974, p23).
He also finds difficulty with motor cycles.
Having commented that there is no obvious basis for estimating an ultimate
or saturation level in the distant future
(Tanner 1974, p19) he uses a continuation of the current
ownership level and annual mileage.
He further comments no suggestions are made as to how the figures might respond
to alternative cost or income assumptions
(Tanner 1974, p22).
The "accuracy" of a forecasting model depends upon the reliability of the data used to calibrate it. (This is assuming that the relationships in the model are "accurate".)
There are several ways in which the data used for calibration may be inadequate. These include the absence of data on a variable (in which case a surrogate is ofter used), incompatible data classification and different values for data between sources, insufficient data for calibration and finally the data may not fit the model. Examples of all these problems can be found in Tanner (1974). Perhaps a more serious problem is the accuracy of the data used.
Tanner (1974) relates car ownership to GDP but comments
It can be argued that disposable personal income or consumer's expenditure would be
more relevant to car ownership and use, but there is some doubt about this and
the use of a quantity that included more than the personal sector has been preferred
Tanner (1974, p4).
In considering annual mileages of cars Tanner (1974, p18) considers the effects of improvements to the road system but uses the length of motorway open as a "road quality factor".
The data Tanner (1974, pp2,3) uses on vehicle populations and vehicle mileage are not compatible due to differences in the classifications of vehicles used in the two sources of this data. He also notes differences in data on numbers of cars per household from two different sources.
An example of insufficient data for the calibration of a model is Tanner's exclusion
of changes in travel speed from the calculation of the time component of generalised
cost because insufficient data exists to allow this to be done
Tanner (1974, p10).
If there is empirical data which will not fit into a forecasters analytic model he may be tempted to ignore it. For example, in a regression analysis of increase in car ownership per year against car ownership, Tanner (1974, p61) excludes data for Scotland. This is probably because its inclusion would give a meaningless intercept, and greatly increase the standard error.
Consideration must also be given to the accuracy of the data used to calibrate a model since the model can be more accurate that the data. The following outlines some of the ways errors can creep into statistics. There are many others!
"From the time when figures are first entered on a form in a local Government or business office, until the statistics are published in statistical volumes and reports, data processing is highly sensitive to many mundane sources of error - misunderstood instructions on forms, misreading and hastily written figures, misplacing a decimal point, losing one's place in copying, accidental 'corruption' of data in computer files, or printing errors. It is quite possible for a mistake anywhere along the line to go undetected and work its way through into published figures.; Once in a while such a case emerges into the glare of publicity, giving newspaper cartoonists a chance to re-use their civil servant caricatures.
One example was when, following the accidental omission of a zeor by an Olivetti employee reporting the firm's exports, an underestimate of national exports (and thus an overestimate of the excess of imports over exports) generated a phoney balance of payments crisis. Another was when the trade figures went haywire over a period of many months because a clerk at one point copied two lines of figures onto a coding sheet in the wrong order. (The first assumption, as reported in the press, was that it was the fault of an excessively complex computer programme for carrying out seasonal adjustments on the figures.) A major error in Home Office migration figures resulted from accidentally counting the same set of movements twice" (Government Statisticians' Collective 1979, p144).
Criticisms of data input to models in forecasting can be made in the following areas: the reliability of the inputs, the use of other forecasts, the ultimate sources of input data, and the limited ranges used when varying input variables.
The reliability of a forecast is closely related to the reliability of the inputs to the forecasting model used. Just as with any other type of model the well known phrase "Garbage In - Garbage Out" applies to forecasting models.
As explained in the previous chapter one of the inputs to forecasting models are other forecasts.
This is often done with little consideration of the original forecasts used.
For example Tanner (1974) uses population projections made by the
Office of Population Census and Surveys (1974) as
population forecasts, with no consideration of them, other than to note that
between projections made in 1971 and 1974 there was a fall of 8% for the 2010
population.
Another example is that having said that inn forecasting Gross Domestic Product
he makes what can be little more than guesses about the future
Tanner (1974, p4) uses Organisation
for Economic Cooperation and Development (1972) Expenditure trends in OECD countries
1960-1980 (which he calls long term forecasts) as the basis for the GDP forecasts he uses.
There are several dangers in using other forecasts. There are three possible events which could affect the result of the forecast being made. The first is that there can be a circularity of inputs. For example a series of economic forecasts could be used as inputs to a series of transport forecasts and vice versa. If this does happen the exact nature of the interaction between the two sets of forecasts may not be apparent. The second is that all of the assumptions of the forecast used as an input will implicitly be included in the forecast being made (as will any errors). The third is related to the second and is that assumptions made in the input forecast may be inconsistent with those used in the output forecast.
Finally there is a danger in using targets as forecasts.
For example commenting on sources of error in forecasts of car ownership in the London
Traffic Survey (Greater London Council 1966),
Cross (1975, p40) comments:
Probably the most important sources of error is the assumed value [sic]
of the Gross National Product at 4% for the period 1962-1981 (Vol. II, p26).
(This appears to have been as acceptance of the National Plan figure
which was an overestimate if it is accepted as a forecast not a target.)
This charge is of major importance since car ownership is shown to be a
sensitive funcion of mean household income.
The other independent input to forecasting is the extrapolation of past trends.
Often these extrapolations are no more than guesses.
In some instances variables are assumed to remain constant in the future.
An example of this is that Tanner (1974, p18) considers that his road
quality factor (the amount of motorway per 1000 cars) will remain at the
1972 level.
In other cases past trends are extrapolated.
For example Tanner (1974) uses extrapolations of modal splits
between road, rail, water and pipeline freight distributions without any
consideration of the underlying causes.
Often trend projections are acknowledged to be guesses.
We have already seen that in considering GDP Tanner (1974, p4) ... makes what can be little
more than guesses about the future.
However guesses are often obscured.
For example ... price levels [for liquid fuels] have been chosen ... to give
some indication of the effects of the future price levels at present being
suggested [by others]
(Tanner 1974, p9).
One way of acknowledging the basic uncertainty of the future is to use probabilistic
models.
However these are difficult to construct.
A simple way of overcoming this problem is to use a range of input values in a
deterministic model.
This is the approach adopted by Tanner (1974).
He uses a range of values for "elasticity", GDP and fuel costs but
not for others such as population and saturation level for car ownership.
There are considerable uncertainties about various other aspects of the methods and data;
these include the concept of a car saturation level and the value used for it,
the population forecasts, the implicit assumptions that certain past trends due
to unquantified factors will continue, and the policies that future governments
will adopt towards road building, restraint and public transport.
No attempt is made in this report to give a range of forecasts to cover
alternative assumptions about such matters
(Tanner 1974, p35).
The effect of this is to give a narrower range of forecasts than would result if all input variables were varied.
Other than criticisms of forecasting models and of the input data used, I have an assortment of other criticisms. These include criticisms of assumptions made about the continuity of relationships over time, other assumptions which are made, the self-fulfilling nature of some forecasts and the lax use of terminology.
The most common assumption found in forecasting models is that the relationships built into the forecasting model (which are usually base upon past behaviour) will continue in the future, or will change in some known or predictable way. However it is precisely because the future is conceived of a being different from the past that forecasts are needed. The only justification for making this type of assumption is that the forecaster perceives there being less chance of change in the relationship he holds constant than in what he is forecasting. The consequence of this is that any forecast is the product of the subjective judgement of the forecaster. There can be no objective forecasts.
A further problem with relationships used in models is that they are often assumed to be valid outside of the range within which they have been observed.
A model can only be as good as its initial assumptions - a model based
upon unrealistic assumptions will produce unrealistic answers ... [It is
always necessary to] ... ask ... to what extent ... assumptions are justified.
For example is it reasonable to extrapolate an estimated relationship into
situations which have not been observed (That is, to use the relationship
when the independent variables take values much greater or smaller than
those found in the sample which was used to estimate it)
(Robinson 1972b, p159)
As an example of a relationship which was assumed to continue in the future it is interesting to look at that between tonne-km and GDP. To make forecasts of freight transport Tulpule (1972) finds and uses an approximately constant ratio between tonne-km and GDP. Only two years later Tanner (1974, p24) claims to have found a fall in the ratio between 1970 and 1972. This he uses as justification for assuming a proportionality between increases in tonne-km and increases in GDP, with tonne-km growing at 2/3 the rate of GDP. Tulpules assumption is that:
dTkm/dt = dGDP/dt
where as Tanners is:
dTkm/dt = 2/3 dGDP/dt
This illustrates the fact that trends between variables need not continue in the future.
Another example of the assumption of continuity of relationships is the assumption that past trends will continue, such as the previously mentioned trend in falling costs of owning and running cars.
Apart from the assumption of continuity of relationships many other assumptions are usually made in forecasts. Many of these are very arbitrary. Some examples from Tanner (1974) are given below (emphasis has been added).
... some rather arbitrary assumptions will be made about the future
[with respect to motorcycles]
(Tanner 1974, p19).
In view of busses and taxis forming a small proportion of road traffic
and the various uncertainties very simple assumptions will be made
(Tanner 1974, p23).
While it is difficult to see what is likely in the future, this growth
[of freight transport] seems perhaps a little high [reference to Tulpule (1972)], and it is felt
that lower rates of increase are appropriate ...
(Tanner 1974, p29).
In the absence of any clear indication from the [previous] analyses,
this report follows Tulpule's [1972]
arbitrary assumption that the present ratio of light vans to
lorries will be maintained
(Tanner 1974, p33).
There are many instances in which forecasts are self-fulfilling. For example the announcement of difficulties in say sugar supplies and the prediction that there will be shortages in the shops will at least exacerbate the situation if not actually cause the shortage.
Another example of the self-fulfilling nature of forecasts, is that
having noted a relationship between the quality of the road system (measured
in terms of length of motorway) and car use, Tanner (1974, p18) assumes that improvements
in the road system will keep pace with increasing car ownership.
In his introduction Tanner (1974, p2) says that his forecasts reflect
a view of how the future will develop. Others may
disagree with this view, and no special authority is claimed for it.
However his forecasts were used as the basis of forecasts made by the Department of the Environment (1975c).
These forecasts were used to justify the building of motorways, so helping
to bring about the very thing which was forecast (increasing traffic).
In this chapter I have outlined some of my criticisms of the forecasting process.
These were about forecasting models, their inputs and some more general criticisms of
forecasting. Perhaps the majority of these criticisms can be summed up by saying
... the future, in fact, cannot be predicted, we cannot know 'the truth' about
the future until it has occurred but we act as if we can foretell the future to some
extent
(Chadwick 1978, p155).
However I believe that this extent is limited and to pretend otherwise is to limit
the range of futures open to us.
[1] One of my problems when writing this chapter was in saying what the effects of my criticisms are. I have since realised that this was because I had not erected any criteria upon which to judge forecasts. It now appears to me that forecasts cannot be judged independently of the purpose for which they are made. To me (now, as opposed to when I wrote this chapter) the only criterion on which it is sensible to judge a forecast is: How well does it serve its purpose?
Unfortunately it is now too late to incorporate this criterion into this chapter. Apart from anything else it would first be necessary to specify which of many possible purposes that I was judging Tanner's forecasts against.
About | Contents
Previous: 5. Forecasting | This: 6. Criticisms of Forecasting | Next: 7. Revised View of Forecasting
Copyright © Michael Baker 1981,2005. All Rights Reserved.