![]() |
![]() |
![]() |
![]() |
|
Information on Census of Agriculture
Contents
The 1997 Census of Agriculture featured a pre-census screening phase that surveyed selected records, by mail or telephone, for presence or absence of agricultural activity. Records selected for screening had a low probability of qualifying as farms. All records responding to the screener and reporting no agricultural activity were removed from the census mail list. Eliminating nonfarm records from the mail list reduced respondent burden and data collection costs. The screening phase included nearly 500,000 records. Records were selected for screening using one of the following criteria: A mail list model predicted the probability that an addressee on the 1997 preliminary census mail list operated a farm. The model defined groups based on combinations of characteristics such as source(s) of the mail list record, expected value of agricultural production, and geographic location. Farm proportions were estimated for these groups by calculating the proportion of 1992 census respondent records that were farms which exhibited the characteristics defined by the group. This proportion, also called the in-scope rate, provided an estimate of the probability that an addressee in the group operated a farm. Each address record on the 1997 preliminary census mail list was assigned to a model group by matching record characteristics to model group characteristics. Records belonging to the groups with the highest farm probability were those more likely to be farms. Records with a farm probability of approximately 30 percent or less were selected for screening, along with records included on selected agriculture specialty lists as noted above. Before screening, the preliminary census mail list consisted of 3,314,790 records. There were 478,298 records selected for screening. Of these, 125,570 records were determined to be nonfarms as a result of the screening phase, and were removed from the final census mail list. The remaining 3,189,220 records received census report forms.
All name and address records on the final census mail list were designated to receive a 1997 Census of Agriculture report form. Two different types of census report forms, sample and nonsample, were used to collect data. Sections 1 through 20 and 28 through 32 of the sample form were identical to sections on the nonsample census form. Sample form sections 21 through 27 contained additional questions on usage of fertilizers and chemicals, farm production expenditures, value of machinery and equipment, value of land and buildings, farm-related income, and hired workers. There were 11 regional versions of the nonsample form and 13 regional versions of the sample form with listings of crops varying by region. These different forms were used to reduce the response burden of the census, while providing reliable information on a large number of data items. The sample form was mailed to all mail list records in Alaska, Hawaii, and Rhode Island and to a sample of records in other States selected from the final mail list. Mail list records were selected into the sample with certainty if they (1) were expected to have large total value of agricultural products sold or large acreage, (2) were multi-unit operations (i.e., separate farms producing under one company organization), (3) were in a county with less than 100 farms in 1992, or (4) had other special characteristics. Farms with special characteristics were abnormal farms, such as institutional farms, experimental and research farms, and Indian reservations. Mail list records in counties containing 100 to 199 farms in 1992 were systematically sampled at a rate of 1 in 2; records in counties containing 200 to 299 in 1992 were systematically sampled at a rate of 1 in 4; and records in counties containing 300 or more farms in 1992 were systematically sampled at a rate of 1 in 6. The remaining mail list records not chosen to receive the sample form received the nonsample census form. This differential sampling scheme was used to provide reliable data for the sample sections of the report form for all counties.
The Census of Agriculture Complex Edit and Imputation System is an automated computerized system that performed the following functions: The system performed these and similar functions for more than 900 data keycodes for sample records and approximately 850 data keycodes for nonsample records. For the 1997 Census of Agriculture, as in previous censuses, all reported data were keyed and then edited by computer. The edits were used to determine whether the reports met the minimum criteria to be counted as farms in the census. The complex edit and imputation system provided the basis for deciding to accept, impute (supply), delete, or alter the reported value for each data record item. Whenever possible, edit imputations, deletions, and changes were based on component or related data on the respondent's report form. For some items, such as operator characteristics, data for that record from the previous census were used when available. Values for other missing or unacceptable reported data items were calculated based on reported quantities and known fixed price parameters. When these and similar methods were not available and values had to be supplied, the imputation process used information reported for another farm operation in a geographically adjacent area with characteristics similar to those of the farm operation with incomplete data. For example, a farm operation that reported acres of corn harvested, but did not report quantity of corn harvested, was assigned the same bushels of corn per acre harvested as that of the last nearby farm with similar characteristics that reported acceptable yields during that particular execution of the computer edit. The imputation for missing items in each section of the report form was conducted separately; thus, assigned values for one operation could come from more than one respondent. Prior to the imputation operation, a set of default values and relationships was assigned to the possible imputation variables. The relationships and values varied depending on the item being imputed. For example, different default values were assigned for several Standard Industrial Classifications and total value of sales categories when imputing hired farm labor expenses. These values and item relationships for the possible imputation variables were stored in the computer in a series of matrices. Each execution of the computer edit consisted of records from only one State, sorted by reported State and county. For a given execution of the edit, the stored entries in the various matrices were retained in memory only until a succeeding record having acceptable characteristics for the same sections of the report form was processed by the computer. Then the acceptable responses of the succeeding operation replaced those previously stored. When a record processed through the edit had unreported or unacceptable data, the record was assigned the last acceptable ratio or response from an operation with a similar set of characteristics. Once each execution of the computer edit for a State was completed, the possible imputation variables were reset to the default values and relationships for subsequent executions. An edit run usually consisted of 10,000 or more records. After the initial computer edit, all keyed reports not meeting the census farm definition were reviewed to ensure that the data had been keyed correctly. Edit referrals were generated for 17 percent of the reports included as farms; they were reviewed for keying accuracy and to ensure that the computer edit actions were correct. If the results of the computer edit were not acceptable, corrections were made and the record reedited.
The 1997 Census of Agriculture used two types of statistical estimation procedures to account for whole farm nonresponse and sample data collection. The procedures were necessary because some farm operators did not respond to the census despite numerous attempts to contact them, and estimates for certain data items were based on a sample of farm operators rather than a full enumeration.
Whole Farm Nonresponse Estimation During mail list development, the State Statistical Offices (SSOs), in an effort to reduce respondent burden, identified records that participated in multiple NASS surveys and/or situations where there were special reporting relationships between an enumerator and a respondent. These records were referred to as tagged records. The SSOs had full responsibility for the data collection for these records, including imputation of data for the record if a response was not obtainable. Whole farm nonresponse that occurred within the remaining universe of records was accounted for by a statistical weighting procedure. The weights of the responding farms were adjusted to account for farms that did not respond. The information needed for this process was obtained from the 1997 Nonresponse Survey. The SSO's conducted the nonresponse survey using computer-assisted telephone interviewing (Blaise-CATI) or personal enumeration when telephone contact was not possible. Alaska and Rhode Island were not eligible for the survey because all nonrespondents were subject to extensive follow-up. In these cases, data were collected by telephone or other methods. The nonresponse survey collected information from a sample of census nonrespondents to determine farm status and estimate the proportion of farms in the nonresponse universe. The information was then used to estimate the number of nonresponding farm operations by State and county. The 1997 Nonresponse Survey consisted of a stratified systematic sample of the nonresponse records within each State. The sample was selected near the end of the census follow-up operations. Five strata were defined to be homogeneous on probability of farm status and were based on screener status, total value produced, and list source(s) of the mail list record. Based on survey results, estimates of the proportion of census nonrespondents operating farms were made for each stratum in the State. The estimates were applied to the total number of census nonrespondents in that stratum, providing a State estimate of the number of census nonrespondents that operated farms. The number of census nonrespondents that operated farms was then derived for each county by stratum. This estimation procedure assumed that the distribution of farms in a stratum by county was the same for census nonrespondents as for census respondents. Within each stratum in a county, a noninteger nonresponse weight was calculated and assigned to each eligible respondent farm record. Census respondent farms that were designated as large farms or tagged records or as farms that exhibited "rare" commodities were ineligible to represent nonrespondents farms and were excluded from the nonresponse weighting procedure. These records were assigned nonresponse weights of 1.0. The noninteger nonresponse weight is the ratio of the sum of the estimated number of nonrespondent farms from the nonresponse survey and the number of eligible census respondent farms, divided by the number of eligible census respondent farms. Stratum controls were established to ensure that this weight never exceeded 2.0. For the published tabulations of the complete count items, the noninteger nonresponse weight was randomly rounded to an integer weight of either 1 or 2 for each record. For the sample count items, the noninteger nonresponse weight was used in the calculation of the final sample weight.
Sample Estimation
Each respondent sample farm was assigned a sample weight for use in producing estimates for all sample items. For example, if the weight given to a sample farm had the value 6, all sample data items reported by that farm were multiplied by 6.
The noninteger sample weight is calculated for each respondent sample farm by multiplying the noninteger nonrespondent weight by the sampling factor. For published tabulations of the sample count items, the noninteger sample weight was randomly rounded to an integer weight for each record. For certainty farms, the sampling factor equals 1 so the sample weight is just equal to the nonresponse weight. Sampling factor calculation for noncertainty farms is described below.
Within a county, the weighting procedure for non-certainty farms was performed in three steps using three variables. The first variable contained eight 1997 total value of agricultural production (TVP) groups. The second and third variables, Standard Industrial Classification (SIC) code and farm acreage, contained two groups. The three sets of groups were:
The first step in the estimation procedure classified the sample records into 32 mutually exclusive initial strata formed by the three variable groups. The total and sample farm counts were expanded to account for nonresponse. Each cell containing sample farm records was assigned an initial sample factor equal to the ratio of the total farm count to the sample farm count. This factor was approximately equal to the inverse of the probability of selecting a farm for the census sample.
The second step in the estimation procedure combined, when necessary, the 32 initial strata to increase the reliability of the weighting procedure. Any stratum that contained less than 10 sample farms or had a factor greater than twice the mail sample rate was collapsed with another stratum. The mail sample rate was either 2, 4, or 6, depending on whether the county had a 1 in 2, 1 in 4, or 1 in 6 sample selection rate. The collapsing occurred within the 32 initial strata according to a specified collapsing pattern. After the collapsing process was completed, new total farm counts and sample farm counts were computed from each final strata and used to calculate final sample factors.
The final step calculated the noninteger sample weight as the product of the final sampling factor and the noninteger nonresponse weight. As described previously, the noninteger sample weight for each record is randomly rounded to an integer weight which is used in published tabulations. For example, if the final weight for a farm was 7.2, then the record would be rounded to either 7 or 8.
The sample for the 1997 Census of Agriculture was only one of a large number of possible samples of the same size that could have been selected using the same sample design. In this context, sample refers to the sample for both the nonresponse survey and the selection of farms to receive sample forms.
If all possible samples were selected, each of the samples surveyed under essentially the same conditions, and an estimate and its standard error calculated from each sample, then:
The following example illustrates the computations necessary to produce a confidence statement for an estimate. Assume that the estimate of number of farms for a State is 94,382 and the relative standard error of the estimate is 0.1 percent (0.001). Multiplying 94,382 by 0.001 yields 94, the standard error; therefore, a 90-percent confidence interval is 94,227 to 94,537 (i.e., 94,382 plus or minus 1.65 x 94). If corresponding confidence intervals were constructed for all possible samples of the same size and design, approximately 90 percent of these intervals would contain the true population parameter. Similarly, a 95-percent confidence interval is 94,198 to 94,566 (i.e., 94,382 plus or minus 1.96 x 94).
Census items were classified as either complete count or sample count items. All farm operators were asked the complete count items. Examples of complete count items were: land in farms, harvested cropland, livestock inventory and sales, crop acreage, quantities harvested and crop sales, land use, irrigation, government loans and payments, conservation acreage, type of organization, and operator characteristics.
Only a sample of farm operators were asked the sample count items. These items appeared only in sections 21 through 27 of the sample form. Sample count items were included under the following section headings: commercial fertilizers, chemicals, production expenses, farm machinery and equipment, value of land and buildings, farm-related income, and hired workers.
Variability in the estimates of complete count items was due only to the nonresponse survey estimation procedure. With regard to the estimates of sample count items, variability was due to both the nonresponse survey estimation procedure and the census sample selection and estimation procedure. Therefore, variability in the sample count item estimates tends to be larger than the variability in the complete count item estimates. Percent relative standard error is a common measure of variability.
The accuracy of the census counts are affected jointly by sampling errors (described in the previous section) and nonsampling errors. Extensive efforts were made to compile a complete and accurate mail list for the census, to design an understandable report form with instructions and to minimize processing errors through the use of quality control measures. Nonsampling errors arise from many sources, including incorrect data reporting or incorrect data keying, editing, or imputing for missing data. These nonsampling errors are further discussed in this section. Nonsampling error due to mail list incompleteness and duplication as well as the misclassification of records on the mail list is called coverage error. The next section discusses the evaluation studies conducted to measure the extent of this error in the census.
Respondent and Enumerator Error
Item Nonresponse
Processing Error Developing processing methods absent of errors is complicated by the complex structure of agriculture. Among the complexities are the many places to be included; the variety of arrangements under which farms are operated; the continuing changes in the relationship of operators to the farm operated; the expiration of leases and the initiation or renewal of leases; the problem of obtaining a complete list of agriculture operations; the difficulty of contacting and identifying some types of contractor/contractee relationships; the operator's absence from the farm during the data collection period; and the operator's opinion that part or all of the operation does not qualify and should not be included in the census. During processing, operations underwent a number of quality control checks to ensure as accurate an application as possible, yet some errors were not detected and corrected.
Coverage Overview
According to coverage evaluation results, the past five censuses of agriculture included an average of 92 percent of U.S. farms and 98 percent of agriculture production. Complete enumeration of agricultural operations satisfying the farm definition of $1,000 or more in agricultural sales is complicated by the variety of arrangements under which farms are operated, the multiplicity of names used for an operation, the number of operations in which an operator participates, and the difficulty in classifying those operations just around the $1,000 sales range. In 1997, extensive efforts were made to compile as complete and accurate a mail list as possible, while reducing the duplication and number of nonfarm operations on the list.
The 1997 coverage evaluation program was designed to measure four components of error in the census farm counts. These components include:
Mail list undercount is by far the largest component of coverage error. Duplication, though occurring far less frequently, can involve larger farms and have a larger impact on commodity estimates. The last two components involve the misclassification of farms and nonfarms. Classification error can arise from either reporting or processing errors.
Table G illustrates the effect of coverage adjustments on census farm counts. The coverage total is defined as the difference between both undercounted and overcounted farms. Coverage adjusted totals are shown for total farms by demographic characteristics, land in farms and total value of sales. The relative standard error is shown for the final census coverage adjusted number. The coverage adjustment percentage shows the coverage total as a percentage of total census adjusted farms for that characteristic.
Area Frame Surveys to Measure Mail List Undercoverage
The percentage of farms missed in the census varies considerably by State. In general, farms not on the mail list tended to be small in acreage, production, and sales of agricultural products. Farm operations could be missed for various reasons, including the possibility that the operation started after the mail list was developed, the operation may be so small as not to appear in any agriculture-related source lists, or the operation may have been falsely classified as a nonfarm prior to mailout.
Classification Error Survey to Measure Three Types of Coverage Error
In general, the classification error rate is higher for small farms close to the $1,000 agricultural sales requirement. The misclassification rate is also higher for tenant farms than for full- or part-owner farms, livestock farms than crop farms, and farms with small acreage or sales.
Coverage Estimation
T = C + (ICU + NML) - (ICO + DUP).
Data from the coverage evaluation and area frame surveys were used not only to estimate errors in farm counts, but also to make adjustments in estimates for commodities and farm demographics. Sample sizes used to estimate misclassification of farms producing less common commodities or owned by operators having rare demographic characteristics were based on particularly small sample sizes. Where such small sample sizes occurred, a form of small area estimation was used in which data from similar States contributed to State estimates. These estimates are termed "indirect" estimates, while the traditional estimates, using only survey data from the State in question, are termed "direct" estimates. The published estimates are composite estimates of the indirect and direct estimates. Direct estimates were used (weighted) to the extent possible. This weighting was based on the amount of survey data available for the particular item being estimated.
|
|
|
|
| Mann
Library, Cornell University, Ithaca, NY 14853 Phone: 607-255-5406 Fax: 607-255-0318 Email: mann_ref@cornell.edu Ask a Librarian | Cornell University Library © Copyright 2004 |
![]() |
|