Interessant om hvordan meningsmålere må justere data de samler inn, på grunn av usikkerhetsmomenter og lav svarprosent. Fra NYT/Siena meningsmålingen.
Egentlig er sluttmeldingen helt i det blå, på grunn av alle justeringene som gjennomføres. Tirsdag natt får vi facit.
Fra bunnen av siden:
Weighting (registered voters)
The survey was weighted by The Times using the survey package in R in multiple steps.
First, the sample was adjusted for unequal probability of selection by stratum.
Second, the sample was split by party (party registration if available in the state, else classification based on participation in partisan primaries if available in the state, else classification based on a model of vote choice in prior Times/Siena polls) and each subgroup was weighted to match voter file-based parameters for the characteristics of registered voters for that group.
The following targets were used:
• Race or ethnicity (L2 model)
• Age (self-reported age, or voter-file age if the respondent refused) by gender (L2 data)
• Education (four categories of self-reported education level, weighted to match NYT-based targets derived from Times/Siena polls, census data and the L2 voter file)
• White/nonwhite race by college or noncollege educational attainment (L2 model of race weighted to match NYT-based targets for self-reported education)
• Marital status (L2 model)
• Homeownership (L2 model)
• Turnout history (NYT classifications based on L2 data)
• Method of voting in the 2020 elections (NYT classifications based on L2 data)
• Metropolitan status (2013 NCHS Urban-Rural Classification Scheme for Counties)
• Census tract educational attainment
• National region
• History of participation in party primaries (NYT classifications based on L2 data), if part of the Democratic or Republican group
Third, the sums of the weights were balanced so that each group represented the proper proportion of the poll.
Finally, the two split groups for the sample of respondents who completed all questions in the survey were weighted identically as well as to the result for the general-election horse-race question (including voters leaning a certain way) on the full sample, and the sums of the weights of the split groups were balanced so that each group represented the proper proportion of the sample of respondents who completed the questionnaire.
Weighting (likely electorate)
The survey was weighted by The Times using the R survey package in multiple steps.
First, the samples were adjusted for unequal probability of selection by stratum.
Second, the first-stage weight was adjusted to account for the probability that a registrant would vote in the 2024 election, based on a model of turnout in the 2020 election.
Third, the sample was weighted to match targets for the composition of the likely electorate, using the process and weighting categories described above. The targets for the composition of the likely electorate were derived by aggregating the individual-level turnout estimates described in the previous step for registrants on the L2 voter file.
Fourth, the initial likely electorate weight was adjusted to incorporate self-reported intention to vote. Four-fifths of the final probability that a registrant would vote in the 2024 election was based on the registrant’s ex ante modeled turnout score, and one-fifth was based on self-reported intentions, based on prior Times/Siena polls, including a penalty to account for the tendency of survey respondents to turn out at higher rates than nonrespondents. The final likely electorate weight was equal to the modeled electorate rake weight, multiplied by the final turnout probability and divided by the ex ante modeled turnout probability.
Finally, the sample of respondents who completed all questions in the survey was weighted identically as well as to the result for the general election horse-race question (including leaners) on the full sample.
The margin of error accounts for the survey’s design effect, a measure of the loss of statistical power due to survey design and weighting.
The design effect is 1.32 for the likely electorate and 1.23 for registered voters. The margin of error for the sample of respondents who completed the entire survey is plus or minus 2.5 percentage points for the likely electorate, including a design effect of 1.41, and plus or minus 2.5 percentage points for registered voters, including a design effect of 1.35.
Historically, The Times/Siena Poll’s error at the 95th percentile has been plus or minus 5.1 percentage points in surveys taken over the final three weeks before an election. Real-world error includes sources of error beyond sampling error, such as nonresponse bias, coverage error, late shifts among undecided voters and error in estimating the composition of the electorate.