Introduction

Let us discuss the sources of data and the methods of data collection. Clear solution to a problem is possible only with proper evidence. The purpose of collection of data is to show such evidence.

In economics, you often come across various statements. Given below is one of such statements.

“After many fluctuations, the output of food grains rose to 108 million tonnes in 1970-71 and 132 million tonnes in 1978-79 but fell to 108 million tonnes in 1979-80. Afterwards, production rose continuously to 252 million tonnes in 2015-16 and touched 272 million tonnes in 2016-17”.

This is an excerpt from the table 1.1 of production of food grains from 1970- 71 to 2016-17.

Table 2.1 Production of food grains in India

(in Million Tonnes)

X Y
1970 - 71 108
1978 - 79 132
1979 - 80 108
1990 - 91 176
1997 - 98 194
2001 - 02 212
2015 - 16 252
2016 - 17 272

From this statement we see that production of food grains varies or changes from year to year. As these values vary, we call them variables. Change that can be numerically measured is known as a variable. These variables are denoted by x, y and so on. The numerical value of the variable is known as observation. In the above statement the characteristic that varies is production. Production varies as follows:

Whenever a change in year takes place, there is a change in production. Here, we can say that whenever there is a change in X, there will be a change in Y. When X takes the value of 1970 - 71, Y takes the value of 108.

Here the values of variables X and Y are the data, from which we can obtain information about production of food grains in India. Data give information about fluctuations in food grains production in India over a period of time. Thus data is a tool, which helps in understanding the problem or the real situation, by providing information.

What are the Sources of Data?

Statistical data can be obtained from two sources. They are

  • Primary data and

  • Secondary data

Primary Data

Data collected directly by the enumerator are primary data: As they are based on first hand information, such data are original. Thus data collected for the first time are primary data. For example, conducting a survey among students in a college and asking them relevant questions to know the popularity of a film star.

Secondary Data

When we use data collected and processed (scrutinised and tabulated) by some other agency, they are called secondary data. For example, if one uses the data:collected by a publishing company regarding the reading habits among high school students, such data are secondary data. Data for the publishing company are primary whereas the one who uses the data originally collected, they are secondary.

Secondary data can be obtained from published sources such as reports of central, state and local governments, publications of IMF, RBI, UNO, etc. and from unpublished sources such as data maintained by private offices, research institutions, website, etc.

Choice between Primary and Secondary Data

The choice between the two depends on the following considerations:

  • 1) Nature and scope of enquiry
  • 2) Availability of financial resources
  • 3) Availability of time
  • 4) Degree of accuracy desired
  • 5) The status of the investigator, i.e., individual, government corporate entity, etc
Table 2.2 Examples of Primary Sources of Data
Letters Manuscripts
Diaries Journals
Maps* Video footages*
Oral histories Speeches
Interviews News Papers*
Research data Audio recordings
Photographs* Objects or artifacts

*usually, but not always will be a primary source.

Table 2.3 Diference Between Primary and Secondary Data
PRIMARY DATA SECONDARY DATA
Original Data Second-hand Data
Expensive Less expensive
Time Consuming Less time consuming
Raw item Finished material
More care needed at the time of data collection More care needed while using

How do we Collect Primary Data?

Did you ever think how a manufacturer decides about a product or how a political party chooses a candidate? For this purpose, they conduct a survey by asking questions about the product or candidate from a large group of people. Acceptability of the products depends on characteristics like price, quality, usefulness, etc. A candidate’s choice depends on his popularity, honesty, loyalty, etc. The method of gathering information from individuals is known as survey.

Personal Interview

In personal interview, the investigator himself goes to the place of the informants and collects the information by asking questions or through observation. The questions asked may be unstructured questions or from structured questionnaires or schedules.

Structured Questionnaires and Schedules:

After studying the problem of investigation, the investigator prepare a list of questions that he feels relevant and useful. This is called a structured questionnaire. Appropriate spaces are given in the questionnaire for recording the answers of the informants. This sheet is called a schedule. The printed form of the questionnaire is given to the respondent and is asked to record his answers by himself. This form of questionnaire is called a structured questionnaire.

Unstructured Questions:

Suppose an investigator is collecting data from HIV+ people. Since he is reluctant to unveil his situation or the facts lead him to the situation, he may not give proper answer to the questions that the investigator already framed. In such circumstance it would be better to establish a rapport with the informant by asking him questions appropriate for the situation. Answers to some of these questions may provide some relevant data to the investigator. This type of questions is called unstructured questions.

Observation Method:

Suppose an investigator is studying about the living conditions of certain tribal communities. Data for this study cannot be collected through asking them questions, structured on unstructured. Sometimes a long period of stay is essential to have an adequate and suitable rapport with them. The data is collected through observation. Observation method is useful in anthropological studies.

Merits of Personal Interview

  • Original data are collected
  • Likely to be more reliable and accurate
  • Response will be more encouraging
  • Uniformity and homogeneity
  • Possible to collect supplementary information
  • Misinterpretation can be avoided

Disadvantages of Personal Interview

  • Most expensive method
  • More Time consuming
  • Possibility of influencing the respondents
  • Chances of personal bias

Indirect Oral Investigation

Indirect Oral Investigation is the method of collecting data through indirect sources. Persons who are likely to have information about the problems, are interrogated and on thebasis of their answers, factual data have to be compiled.

Merits of Indirect Oral Investigation

  • Less time consuming
  • Less Expensive
  • Less effort
  • Confidential information can be collected.
  • Information is likely to be unbiased and reliable.
  • This method is relatively simple to understand.

Demerits of Indirect Oral Investigation

  • The degree of accuracy of information is less.
  • This method leads to doubtful, conclusion
  • Carelessness of the witness is likely to happen.

Information from correspondents

Here, the investigator appoints local agents or correspondents in different places to obtain information. They collect the required information and supply them to the investigator. Newspaper agencies generally follow this method. This method is more suitable in those cases where information is to be obtained at regular intervals from a wide area.

Telephonic Interview

In this method, the investigator collects data from the informants through telephonic conversation. In early times this method was not much applicable in India, since most of the people have no access to telephone. But now, thanks to the giant leap of information technology, the number of people with access to telephone is high. Hence the method of primary data collection through telephone interview finds a better position.

Merits of Telephonic Interview

  • Less time consuming
  • Less Expensive
  • Less effort
  • Can cover a wide geographic area
  • Easy to conduct
  • More personal in nature
  • Fast data collection

Limitations of Telephonic Interview

  • Questions cannot be of a complex nature.
  • May refuse to participate.
  • Cannot see body language.
  • May provide incorrect data.

Mail Questionnaire

In this method, the questionnaire prepared is send to the informants. A self addressed envelope and a covering letter requesting to furnish the necessary information are also sent along with the questionnaire. This method is useful in cases where the informants are spread over a wider field.

Merits of Mail Questionnaire

  • Useful where the field of investigation is vast.
  • Need only less effort.
  • Less Expensive.
  • It is free from the bias of the interviewer.
  • Respondents have adequate time to give answers.
  • Results can be made more dependable and reliable.
  • Respondents, who are not easily approachable, can also be reached conveniently.

Demerits of Mail Questionnaire

  • It cannot be used for illiterate or uneducated respondents.
  • Rate of non-response is high in comparatively with other method.
  • If there is any confusion in the questionnaire, they cannot be solved.
  • The control over questionnaire may be lost once it is sent.
  • It is difficult to verify the accuracy of the answers given.
  • There is no scope for asking supplementary questions.
  • Filled in questionnaire may be incomplete as well as inaccurate.

Construction of a Questionnaire

The success or failure of the investigation depends on it. So it should be scientific and designed carefully so that they are extremely reliable and highly accurate. The construction or drafting of a questionnaire is an art. Only an expert with sound intelligence and rich experience can design a meaningful questionnaire. The questionnaire should contain two parts: The aims and objectives of the investigation should be mentioned in the first part. A request seeking help and co-operation of informants should also be included there. If the information that should be furnished by the informants needs secrecy, an assurance should be given that the information furnished by them will be kept confidential. All these will be given in the first part of the questionnaire. The questionnaire should provide the necessary instructions such as the time within which and the place to which the furnished questionnaire to be returned. In the second part, the questions are included.

Major Problems:

  1. Selection of Type of Questions
  2. Order of Questions
  3. Question wording and form of response

(1) Selection of type of Questions

Selection of type of Questions is very important while preparing schedules or questionnaires. We may choose two way, multiple choice or free answer questions. Each type has its own advantages. The two way question requires the informant to choose between alternatives such as “Yes” or “No”, “Good” or “Bad” and so on (example for two way questions: Do you own a car: Yes/No). Multiple choice question give the informant a wide range of possible answers. In the case of free answers or open-end questions the informant is at liberty to answer the questions. The interviewer should select any one of the above methods according to the purpose for which it is conducted.

(2) Order of questions

Questions should be arranged logically. The first few questions should generally deal with the identification of informant and his family members. The other questions should be arranged in such a way that one question leads to the next and so on.

(3) Question wording

Questions should be worded in a manner that they convey the same meaning to all. Terms-and concepts used in the questionnaire should be clearly defined so to make it unambiguous. Simple and direct questions should be prepared. Questions which give rise to motives, achievements, etc., should be asked in an indirect manner. In some cases, avoiding direct questions will give good results. Similarly, questions which offend the feelings of informant should be avoided.

A good questionnaire should have the following qualities:

  • The questions should be clear, simple and easy to understand:

  • The questionnaire should be brief:

  • The questions should not use double negatives:

  • The questions should not be leading:

  • The questions should be arranged in a logical order:

  • Personal questions should be avoided:

  • The questionnaire should look attractive:

  • The questionnaire may consist of closed ended questions:

  • The questionnaire should be pre-tested.

  • Questionnaire should not be too long: There should be minimum number of questions as far as possible. Lengthy questionnaires discourage respondents from completing them.
  • Questions should move from general to specific: The order of questions should be such that it moves from general to specific ones. This makes the respondents feel comfortable.
  • Questions should be precise and clear: The questions should be to the point and with clarity.
  • Questions should not be ambiguous: The questions should enable the respondents to answer quickly, correctly and clearly.
  • Questions should not have double negatives: Do not begin questions with ‘Wouldn’t you...’ ‘Don’t you...’ as they may result in biased responses.
  • Questions should not lead to answer: A question which gives clue about how the respondent should answer is to be avoided. Example: How do you like the flavour of this top quality coffee?
  • Questions should not indicate alternatives to the answer: e.g. What would you like to become after studies, go for a job or be a house-wife?

Questionnaire may include close-ended (structured) questions or open-ended (unstructured) questions.

Close-ended questions: It can be a two-way question or a multi-choice question. If there are only two possible answers like ‘yes’ or ‘no’, it is a two-way question. When there are more than two options of answers say four, it is a multiplechoice question.

Open-ended questions: Such questions allow for increased individualised responses. But the problem is that they are difficult to interpret because of variations in responses. Example: What is your view about privatisation?

Pilot Survey

When questionnaire is ready, it is always advisable to tryout the questionnaire or schedule on a limited number of informants which is known as pilot survey or pre-testing of the questionnaire. This should be done before the actual survey is undertaken. This brings out problems which can be attended to before the large scale survey begins. This is called pilot survey.

Advantages of Pilot Survey

  1. Helps in assessing the suitability of questions.
  2. It can check clarity of instructions.
  3. It helps to check performance of enumerators.
  4. The time and cost of actual survey can be estimated exactly.
  5. The questions to be asked are being tested through pilot survey.

Collection of Secondary Data

Secondary data are those data which can be obtained from published or unpublished sources. Secondary data are those which are available in published or unpublished records. Once a decision is taken to collect secondary data, the question of sources of data arises.

There are two sources for the collection of secondary data, namely, published sources and unpublished sources.

Published Sources

  • Reports and publications of central and state governments
  • Official publications of international bodies
  • Financial and economic bodies
  • Publications of research scholars
  • Annual reports of firms and companies
  • Reports of commitees

Unpublished Sources

Data maintained by

  • Government departments
  • Private offices
  • Studies made by research institutions
  • Scholars
  • Individuals

Data from Secondary Sources - Precautions

Before data collection, the investigator has to acquaint himself with the work already done on the problem. He should see whether data have already been generated and what other information relevant to the problem is available. This will give him an idea about the essential aspects which have not been properly studied in detail. For this purpose, he can make use of secondary sources and decide upon a fresh survey.

Government, business firms, research organisations, etc., collect a large mass of data either in the process of administration or as a part of statistical surveys. These are the sources of secondary data (published or unpublished) which can be tapped by the investigator.

Though data from primary and secondary sources are needed, one should be very cautious in using the secondary data. This is because there are a number of limitations in using secondary data. The definitions and concepts and the data collection methods used in the primary source might have been outdated by the time the secondary data are processed. There may be personal bias and prejudices on the part of the person who collected this primary data (now it is secondary data).

Population, Census, Sample Survey

POPULATION

The aggregates from which data are to be collected in a statistical enquiry is called population. It is the items under consideration in any field of enquiry.

CENSUS

A complete enumeration of all the items in the population is known as Census survey. It is also called complete enumeration.

SAMPLE SURVEY

A survey conducted by taking sample to represent the characteristics of the population under study is called sample survey.

Advantages of Census Method

  1. Free from sampling errors.
  2. Results will be highly accurate.
  3. Useful for further studies.
  4. Can study each unit in detail.
  5. All the characters of the population are maintained in original form.
  6. Suitable for heterogeneous units.

Disdvantages of Census Method

  1. Large number of enumerators required.
  2. It is time consuming.
  3. Highly expensive.
  4. Inconvenient.
  5. Possible in limited circumstances.
  6. Not applicable for infinite population.
  7. Labour consuming.
  8. More statistical errors

Sample

A sample is that part of the population which represents the entire population in terms of the characteristics under study.

Sample Survey

In census method we collect information from each unit in the population. But sometimes it is not possible due to many reasons. Suppose that a mobile phone company wants to know about the popularity and using habits of mobile phone among college students. Is it advisable to collect information from each college student, which is the population of the particular statistical survey? Definitely not. Then, how will they conduct the study? They will collect information from some of them only. From this data they come to a conclusion. The selected students in this investigation is known as sample. That is, a sample is that part of the population which represents the entire population in terms of the characteristics under study. The result obtained through this sample will be reasonably matching to the result that could be obtained through complete enumeration method. Thus, a survey conducted by taking sample to represent the characteristics of the population under study is referred to as sample survey. The process of selecting a sample out of a given population is called sampling. The number of units in the sample is called sample size.

Sample method is an important method of statistical investigation. In most of the statistical investigations we rely on sample survey method. In our daily life also we take on sampling to obtain results. A physician makes inferences about a patient’s blood by testing a few drops of his blood. It is the only method that can be used in certain cases, as we had seen in the testing of blood Sample surveys are performed because of the following reasons:

Sample method is an important method of statistical investigation. In most of the statistical investigations we rely on sample survey method. In our daily life also we take on sampling to obtain results. A physician makes inferences about a patient’s blood by testing a few drops of his blood. It is the only method that can be used in certain cases, as we had seen in the testing of blood Sample surveys are performed because of the following reasons:

  1. Less expensive: The cost of conducting a sample survey is much lesser than the cost of conducting the investigation through census method.
  2. Less time consuming : Since we consider only a part of the population, it will save considerable time and labour in collecting data. Since data collected is less in number compared to census method, it enables quick classification and processing of data.
  3. More reliable: The results obtained through sample survey are more accurate and reliable because it will be free from errors arise from inaccuracy of information or incompleteness of returns.
  4. Detailed study of the selected unit is possible: Since only a part of the population is put under study, we can collect more detailed information from all the selected units.

The sample in a sample survey should be selected very carefully. The selection may be made either deliberately or randomly. But it should have the following essential characteristics:

  • A good sample should be a representative of the population.
  • lt should be homogeneous.
  • It should be adequate.

Merits of Sample Survey

  1. Less expensive
  2. Less time consuming
  3. More reliable
  4. More detailed information
  5. Organisational Convenience
  6. More Scientific
  7. Indispensable Method in certain cases

Demerits of Sample Survey

  1. Absence of being a representative
  2. Likely to arrive at wrong conclusions
  3. Small universe
  4. Specialised knowledge
  5. Inherent defects
  6. Sampling Error
  7. Personal bias

Methods of Sampling

There are various methods of selecting samples from a population. These are called sampling techniques. The two types of sampling techniques are random sampling and non-random sampling.

Random Sampling Method

Random sampling a is a technique of drawing a sample from the population in which each and every unit of the population has equal chance of being included in the sample. It is further divided into simple random sampling and restricted random sampling.

1. SIMPLE RANDOM SAMPLING

In this method, the sample is taken from the population without making any division or classification of the population. Hence every unit of the population has an equal chance of being selected in the sample. Simple random sampling may be done either by using lottery method or by Table of random numbers.

  1. Lottery method
  2. This is the most popular and simple method of sampling. Under this method, all the items of the population are numbered on separate slips of paper of same size, shape and colour. The paper slips are folded in a uniform manner and mixed up in a container. A blindfold selection is then made of the number of slips required to form the desired sample size. The selection of items depends purely on chance.

  3. Table of random numbers
  4. Several standard random number tables are available, and the most popular one is Tippett’s random number table. The random number table constructed by Tippett consists of 10400 four digit numbers giving a total of 41600 numbers (10400x4). The technique of selecting random sample with the help of this table is like this: if we have to select a sample of 100 from a population of 5000, we first have to number the population from 1 to 5000. Then we can open any page of Tippett’s table and select the first 100 numbers which are less than 5000.

2. RESTRICTED RANDOM SAMPLING

Restricted random sampling is of mainly three types. Stratified sampling, systematic sampling and cluster sampling.

  1. Stratified Sampling:-
  2. When the population is heterogeneous, stratified sampling method is used. Under this method the whole population is divided into various groups or strata of units, such that the units in each class possess similar characteristics. For example, suppose you are studying about the consumption pattern of students in your school. The population comprises the whole student, studying in various standards of your school. A student studying in standard-5 and a student studying in standard-9 may have different consumption patterns. That is, for this characteristic, the population is heterogeneous. Hence, different standards can be selected as different group, or strata. Then sample is drawn from each stratum at random.

  3. Systematic Sampling:-
  4. A systematic sampling is formed by selecting one at random and then selecting the rest at evenly spaced intervals until the sample size has been reached. Suppose that that in the nature club of your school, there are 100 members and you want to make a core group of 10. First you number the 100 students of the club from 1 to 100. By lottery method or by random table method you select one student from the first ten. Let it be the 7th student. Then take an appropriate interval and select the rest 9 students. If the interval you had taken is 10, then the second student in the sample is the 17th student, the third student in the sample is the 27th student, etc.

  5. Cluster Sampling:-
  6. This type of sampling is carried out in several stages. Suppose we are studying about the employment of households in Kerala. In the first stage, Kerala is divided into three or four zones. Then each zone is divided into districts. Then each district is divided into villages. From each district, sample of villages may be taken at random. From each selected village, households of required size are also taken at random. Since several stages involve in cluster sampling, it is also known as multi-stage sampling.

Non-Random Sampling Method

In this method of sampling the investigator himself makes the choice of sample from the population according to his own discretion which he thinks to be the best. Here all the units in the population do not have equal chance of being selected in to the sample. Since the investigator gets the freedom to include or avoid a particular unit, it enables him to collect data smoothly. But, it has many shortcomings as:

  • The bias and prejudices of the investigator influence the selection of the sample. Sometimes this sample cannot be considered as a true representative of the population.
  • The degree of accuracy may not be assured.

Exit Polls

You must have seen that when an election takes place, the television networks provide election coverage. They also try to predict the results. This is done through exit polls, wherein a random sample of voters who exit the polling booths are asked whom they voted for. From the data of the sample of voters, the prediction is made.

The below given chart shows different sampling method we using for collecting data.

Sampling and Non-sampling errors

Statistical investigations are carried out for drawing some inferences about the population. ‘Errors may occur at any time of the investigation. It may be at the time of framing the sample, or in the collection of data, or at any other process of investigation. Mainly we address two types of errors, namely, sampling errors and non-sampling errors.

1. SAMPLING ERRORS

In sampling method we collect data from only a small fraction of the population. After drawing the inference by studying the data, we generalize the result to the population as well. That is, we are drawing inferences about the population on the basis of a few observations (or sample). Errors may occur in this process. This sample value may differ from the population value that is the value we might have got if we conduct complete enumeration. The error arising due to drawing inferences about the population on the basis of sample is known as sampling error. Let us see an example:

Suppose that there are 30 students in your class. Let their marks in Economics is as below. Let the problem is to find the performance of the students in Economics examination. Let it be carried out by calculating the average mark.

Table-2.4
43 48 65 87 32 17
43 50 80 32 48 82
73 67 62 70 39 12
43 45 61 76 32 75
71 63 32 18 60 42

If we consider the whole population (i.e. all the 30 students). the average mark is 52. Now, let . the problem be done in sampling method. Let the sample size be 5. At random we are selecting marks of five students. Let they be 70, 32, 39, 82 and 17. The average of these five marks is 48. That is, the population value is in fact 52 but the sampling value is only 48. This difference comes from sampling error. That is, the sample we had chosen may not be the correct representatives of the population. Here the sampling error is 52 —48 = 4.

As the size of the sample increases, the error in sampling decreases. Hence, sampling errors may be minimized by taking large samples. For minimizing sampling error, the following precautions may be taken:

  1. Sample size be made large
  2. Sample should be taken with care
  3. Enumerators should be well trained
  4. Scientific methods of data collection should be employed

2. NON-SAMPLING ERRORS

Will the result of the investigation be free from errors if we conduct the census method? No. There may creep in errors because of many factors. The vast size of the population, inability of certain enumerators, tabulation of huge data, all these factors put in errors. Errors arising in this manner are called non-sampling errors. That is, non-sampling errors are those that creep in due to human factors. Some of the non-sampling errors are errors in data acquisition, non-response errors and sampling bias. More specifically, non-sampling errors may arise from one or more of the following factors:

  1. Data specification being inadequate with respect to the objectives of the survey Statistical units prescribed may be inappropriate
  2. Methods of interview may be unsuitable
  3. Investigators may be inexperienced or untrained
  4. No or incorrect response from respondents
  5. Errors in data processing
  6. Errors occurred during presentation and tabulation of data

Non-sampling error is more serious than sampling error because sampling error be minimized by taking larger samples. But, sampling errors cannot be minimized even by making samples large or small.