Data Preliminaries

We begin by cleaning up our data, removing responses where users chose not to include a generalization. A power analysis based on the data from a pilot suggested we would need 90 participants to detect differencs with 80% power. Following our pre-registered analysis plan, we iteratively collected data and excluded datasets based on poor performance. Overall, we recruited 97 participants. One participant was excluded due to the same confidence on all trials, and six were excluded because the majority of their generalizations were not about the data they saw, per our pre-registered exclusion criteria.

We subset from a total of 1941 to 1743, removing a total of 198 generalizations, representing trials where a participant did not provide a generalization for the presented stimuli or generalizations that misinterpreted the presented stimuli.

d <- read.csv("./data/[E1]N=1000-Full-Cleaned.tsv", sep="\t")

d$confidence <- suppressWarnings(as.numeric(as.character(paste(d$confidence))))
d$initSliderValue <- suppressWarnings(as.numeric(as.character(paste(d$initSliderValue))))
sapply(d, class)

df <- subset(d, d$correct!="NA")

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##

## [1] NA

Descriptive Statistics

We look at overall summaries of the aggregation strategy. In addition, we coded each generalization into a class as specified by our pre-registration, and summarize the results.

There are less generalizations made in the aggregation condition. In addition, the most common generalizations were categorized into the mean or shape classes, with the next most frequent being correlation and rank. There were extremely few variance generalizations, likely because of the strict nature of the coding for this generalization class — participants had to explicitly mention the variance of data in a view. On average per participant we encoded 3.13 ± 1.76 correlation, 6.92 ± 4.23 mean, 2.44 ± 2.13 rank, 6.81 ± 5.12 shape, 0.04 ± 0 variance generalizations.

To see this better, we plot the distribution of generalizations across generalization class. We observe that participants made the most shape class generalizations with the disaggregation condition, the most rank class generalizations with the mean condition and the most mean class generalizaitons with the disaggregation with mean condition.

summary(df$aggStrat)

##      disagg disagg+mean        mean 
##         607         608         528

summary(df$insightClass)

##       correlation              mean misinterpretation              rank 
##               283               623                 0               220 
##             shape          variance 
##               613                 4

suppressWarnings(ddply(df, c("aggStrat"), summarise,
         n=nrow(df),
         k=sum(df$aggStrat == aggStrat),
         pbar = k/n,
         se = sqrt(pbar*(1 - pbar)/n)))

##      aggStrat    n   k      pbar         se
## 1      disagg 1743 607 0.3482501 0.01141136
## 2 disagg+mean 1743 608 0.3488239 0.01141573
## 3        mean 1743 528 0.3029260 0.01100675

suppressWarnings(ddply(df, c("insightClass", "workerId"), summarise,
         n=nrow(df),
         k=sum(df$insightClass == insightClass & df$workerId == workerId))) %>%
   ddply(c("insightClass"), summarise,
         generalizationClassTotal=sum(k),
         percentTotal=sum(k)/nrow(df),
         avgNumberPerParticipant=sum(k) / 90,
         sd=sd(k))

##   insightClass generalizationClassTotal percentTotal avgNumberPerParticipant
## 1  correlation                      283  0.162363741              3.14444444
## 2         mean                      623  0.357429719              6.92222222
## 3         rank                      220  0.126219162              2.44444444
## 4        shape                      613  0.351692484              6.81111111
## 5     variance                        4  0.002294894              0.04444444
##         sd
## 1 1.755742
## 2 4.235293
## 3 2.130628
## 4 5.122532
## 5 0.000000

suppressWarnings(ddply(df, c("aggStrat", "insightClass"), summarise,
         n=nrow(df[df$aggStrat == aggStrat,]),
         k=sum(df$aggStrat == aggStrat & df$insightClass == insightClass),
         pbar = k/n,
         se = sqrt(pbar*(1 - pbar)/n),
         min=pbar-1.96*se,
         max=pbar+1.96*se))

##       aggStrat insightClass   n   k        pbar          se          min
## 1       disagg  correlation 607  80 0.131795717 0.013729897  0.104885119
## 2       disagg         mean 607 152 0.250411862 0.017585084  0.215945096
## 3       disagg         rank 607  57 0.093904448 0.011839565  0.070698901
## 4       disagg        shape 607 317 0.522240527 0.020274287  0.482502924
## 5       disagg     variance 607   1 0.001647446 0.001646089 -0.001578888
## 6  disagg+mean  correlation 608  96 0.157894737 0.014788197  0.128909871
## 7  disagg+mean         mean 608 239 0.393092105 0.019808736  0.354266983
## 8  disagg+mean         rank 608  69 0.113486842 0.012863631  0.088274126
## 9  disagg+mean        shape 608 202 0.332236842 0.019102198  0.294796535
## 10 disagg+mean     variance 608   2 0.003289474 0.002322180 -0.001262000
## 11        mean  correlation 528 107 0.202651515 0.017493715  0.168363833
## 12        mean         mean 528 232 0.439393939 0.021599265  0.397059381
## 13        mean         rank 528  94 0.178030303 0.016647841  0.145400536
## 14        mean        shape 528  94 0.178030303 0.016647841  0.145400536
## 15        mean     variance 528   1 0.001893939 0.001892145 -0.001814665
##            max
## 1  0.158706314
## 2  0.284878627
## 3  0.117109995
## 4  0.561978130
## 5  0.004873781
## 6  0.186879603
## 7  0.431917228
## 8  0.138699558
## 9  0.369677149
## 10 0.007840947
## 11 0.236939197
## 12 0.481728498
## 13 0.210660071
## 14 0.210660071
## 15 0.005602544

Accuracy

Accuracy by Aggregation Strategy

We calculate summary statistics for accuracy for each of our aggregation strategies to get a better sense of our data. While the aggregate condition generalizations have a lower frequency, however there is no significant difference between accuracies of aggregation strategy.

df %>%
  ddply(~aggStrat, summarise,
        Correct=sum(correct==TRUE),
        Incorrect=sum(correct==FALSE),
        Accuracy=Correct/(Incorrect+Correct),
        Total=Incorrect+Correct)

##      aggStrat Correct Incorrect  Accuracy Total
## 1      disagg     399       208 0.6573311   607
## 2 disagg+mean     405       203 0.6661184   608
## 3        mean     354       174 0.6704545   528

# Accounting for differences in workers
df_agg_accuracy <- df %>%
   ddply(.(aggStrat, workerId), summarise,
         Correct=sum(correct==TRUE),
         Incorrect=sum(correct==FALSE),
         PercCorrect=Correct/(Incorrect+Correct),
         Total=Incorrect+Correct) %>%
   ddply(~aggStrat, summarise,
         N = sum((Total)),
         meanAcc = mean(PercCorrect),
         sd = sd(PercCorrect),
         se = sd / sqrt(N))
df_agg_accuracy

##      aggStrat   N   meanAcc        sd          se
## 1      disagg 607 0.6563772 0.2049842 0.008320052
## 2 disagg+mean 608 0.6591138 0.2238778 0.009079443
## 3        mean 528 0.7116326 0.2779547 0.012096425

Accuracy by Aggregation Strategy, Faceted by Generalization Class

Next, we look at the the accuracy of aggregation strategy, faceted by generalization class. We also look at the breakdown in percentages per aggregation condition (PercTotalofAggStrat). We plot the results for visual aid.

##       aggStrat insightClass Correct Incorrect Total  Accuracy
## 1       disagg  correlation      56        24    80 0.7000000
## 2       disagg         mean      99        53   152 0.6513158
## 3       disagg         rank      20        37    57 0.3508772
## 4       disagg        shape     223        94   317 0.7034700
## 5       disagg     variance       1         0     1 1.0000000
## 6  disagg+mean  correlation      76        20    96 0.7916667
## 7  disagg+mean         mean     173        66   239 0.7238494
## 8  disagg+mean         rank      21        48    69 0.3043478
## 9  disagg+mean        shape     135        67   202 0.6683168
## 10 disagg+mean     variance       0         2     2 0.0000000
## 11        mean  correlation      92        15   107 0.8598131
## 12        mean         mean     154        78   232 0.6637931
## 13        mean         rank      56        38    94 0.5957447
## 14        mean        shape      51        43    94 0.5425532
## 15        mean     variance       1         0     1 1.0000000
##    PercTotalofAggStrat
## 1          0.131795717
## 2          0.250411862
## 3          0.093904448
## 4          0.522240527
## 5          0.001647446
## 6          0.157894737
## 7          0.393092105
## 8          0.113486842
## 9          0.332236842
## 10         0.003289474
## 11         0.202651515
## 12         0.439393939
## 13         0.178030303
## 14         0.178030303
## 15         0.001893939

Accuracy by Data Type Combination

We analyze accuracy with respect to data type combination (univariate, 1 quantitative x 1 nominal, 2 quantitative)

##   dataTypeCombination Correct Incorrect  Accuracy Total
## 1    nominalBivariate     293       358 0.4500768   651
## 2      quantBivariate     461       149 0.7557377   610
## 3          univariate     404        78 0.8381743   482

##   dataTypeCombination    aggStrat Correct Incorrect  Accuracy Total
## 1    nominalBivariate      disagg      80       141 0.3619910   221
## 2    nominalBivariate disagg+mean      86       137 0.3856502   223
## 3    nominalBivariate        mean     127        80 0.6135266   207
## 4      quantBivariate      disagg     171        40 0.8104265   211
## 5      quantBivariate disagg+mean     158        44 0.7821782   202
## 6      quantBivariate        mean     132        65 0.6700508   197
## 7          univariate      disagg     148        27 0.8457143   175
## 8          univariate disagg+mean     161        22 0.8797814   183
## 9          univariate        mean      95        29 0.7661290   124

##   dataTypeCombination   N   meanAcc        sd          se
## 1    nominalBivariate 651 0.4517140 0.2567745 0.010063784
## 2      quantBivariate 610 0.7966920 0.2291487 0.009277960
## 3          univariate 482 0.8289268 0.1961107 0.008932596

##   dataTypeCombination    aggStrat   N   meanAcc        sd         se
## 1    nominalBivariate      disagg 221 0.3225760 0.3456768 0.02325274
## 2    nominalBivariate disagg+mean 223 0.3554995 0.3736702 0.02502281
## 3    nominalBivariate        mean 207 0.5817370 0.4627837 0.03216569
## 4      quantBivariate      disagg 211 0.8509070 0.3012627 0.02073978
## 5      quantBivariate disagg+mean 202 0.8064374 0.3252616 0.02288533
## 6      quantBivariate        mean 197 0.7357242 0.3919885 0.02792802
## 7          univariate      disagg 175 0.8460034 0.2832715 0.02141332
## 8          univariate disagg+mean 183 0.8437500 0.3130358 0.02314027
## 9          univariate        mean 124 0.7785088 0.3583792 0.03218340

Bayesian Models Setup

We’ll run some Bayesian regressions. First let’s set up the data for the modeling.

Accuracy Bayesian Model

We run a hierarchical logistic regression model to evaluate the impact of aggregation strategy on accuracy as per our pre-registration. We report the results as the distribution of posterior mean estimates for effects of both aggregation strategies and trial and the standard eviation for varying intercepts of participant ID and view ID. We find that there doesn’t seem to be an evidence of effect, as all intervals are centered near 0.

Let’s plot results

## 105 vector or matrix parameters omitted in display. Use depth=2 to show them.

##               Mean StdDev lower 0.95 upper 0.95 n_eff Rhat
## a             0.84   0.33       0.18       1.47  1111    1
## bnoagg       -0.17   0.15      -0.46       0.11  6668    1
## bmean        -0.12   0.15      -0.40       0.17  7553    1
## btrial        0.03   0.01       0.00       0.06  7535    1
## sigma_worker  0.75   0.10       0.57       0.95  2120    1
## sigma_spec    1.07   0.22       0.70       1.51  4401    1

## 105 vector or matrix parameters omitted in display. Use depth=2 to show them.
## 105 vector or matrix parameters omitted in display. Use depth=2 to show them.

##  a_worker[1]  a_worker[2]  a_worker[3]  a_worker[4]  a_worker[5]  a_worker[6] 
##    0.5254999    0.7769337    1.6731374    0.5668260    0.4996277    0.2949677 
##  a_worker[7]  a_worker[8]  a_worker[9] a_worker[10] a_worker[11] a_worker[12] 
##    1.4273466    0.6718786    0.8964898    2.4798704    0.7435681    1.3452745 
## a_worker[13] a_worker[14] a_worker[15] a_worker[16] a_worker[17] a_worker[18] 
##    1.3181030    0.4738298    1.3181567    1.8415908    1.2444036    0.6405744 
## a_worker[19] a_worker[20] a_worker[21] a_worker[22] a_worker[23] a_worker[24] 
##    1.5664999    0.2694058    1.4292765    0.6687203    5.6909185    0.9203934 
## a_worker[25] a_worker[26] a_worker[27] a_worker[28] a_worker[29] a_worker[30] 
##    2.0560940    0.6487823    0.6625672    1.9433761    0.6666385    1.1978313 
## a_worker[31] a_worker[32] a_worker[33] a_worker[34] a_worker[35] a_worker[36] 
##    2.1247549    1.1695404    1.3004574    1.1687971    1.4174942    1.1834207 
## a_worker[37] a_worker[38] a_worker[39] a_worker[40] a_worker[41] a_worker[42] 
##    0.9357123    1.9023398    2.8517602    0.8298821    1.2389077    0.3091390 
## a_worker[43] a_worker[44] a_worker[45] a_worker[46] a_worker[47] a_worker[48] 
##    0.5282128    1.1590770    0.9560430    0.7209489    0.7350940    1.9702879 
## a_worker[49] a_worker[50] a_worker[51] a_worker[52] a_worker[53] a_worker[54] 
##    1.7839562    1.1848855    1.1273089    1.7748883    1.9493087    1.3430219 
## a_worker[55] a_worker[56] a_worker[57] a_worker[58] a_worker[59] a_worker[60] 
##    1.0684683    1.4726984    1.0029128    1.0444521    2.8043671    1.0777341 
## a_worker[61] a_worker[62] a_worker[63] a_worker[64] a_worker[65] a_worker[66] 
##    1.4013273    0.6941974    2.7046391    1.0277860    0.5880429    1.0809941 
## a_worker[67] a_worker[68] a_worker[69] a_worker[70] a_worker[71] a_worker[72] 
##    0.6892810    0.7090654    0.5057780    0.9942274    0.8862223    0.4488369 
## a_worker[73] a_worker[74] a_worker[75] a_worker[76] a_worker[77] a_worker[78] 
##    0.7454162    0.8917071    0.9691866    0.4101154    0.4798583    1.7754696 
## a_worker[79] a_worker[80] a_worker[81] a_worker[82] a_worker[83] a_worker[84] 
##    0.7139392    1.7638034    0.7092945    1.5871459    0.6551029    1.7813546 
## a_worker[85] a_worker[86] a_worker[87] a_worker[88] a_worker[89] a_worker[90] 
##    0.9054145    0.2446092    0.5979112    1.5323368    0.2730194    1.8444700 
##    a_spec[1]    a_spec[2]    a_spec[3]    a_spec[4]    a_spec[5]    a_spec[6] 
##    2.4588018    3.0270586    2.1946182    2.8655694    1.3354744    0.6084346 
##    a_spec[7]    a_spec[8]    a_spec[9]   a_spec[10]   a_spec[11]   a_spec[12] 
##    0.3891960    0.1099008    0.6119618    0.1988181    0.9471993    1.6829296 
##   a_spec[13]   a_spec[14]   a_spec[15]            a       bnoagg        bmean 
##    1.4535131    1.0741174    1.9884529    2.3090839    0.8398876    0.8881068 
##       btrial sigma_worker   sigma_spec 
##    1.0315892    2.1182386    2.9074778

First Trial Accuracy

It is possible that there is a learning effect, or our choice of mark type led participants to focus on a particular stimulus over another. As a result, we analyze accuracy by aggregation strategy and generalization class for the first trial of each particiapnt to better understand the extent of this effect. We find that there appears to be a small difference between the disaggregated and aggregation conditions, although results are not reliably different.

df[df$trial==1,] %>%
   ddply(c("aggStrat", "insightClass"), summarise,
     Correct=sum(correct==TRUE),
     Incorrect=sum(correct==FALSE),
     Accuracy=Correct/(Incorrect+Correct),
     Total=Incorrect+Correct)

##       aggStrat insightClass Correct Incorrect  Accuracy Total
## 1       disagg  correlation       1         0 1.0000000     1
## 2       disagg         mean       2         1 0.6666667     3
## 3       disagg         rank       0         3 0.0000000     3
## 4       disagg        shape      31         7 0.8157895    38
## 5  disagg+mean  correlation       2         0 1.0000000     2
## 6  disagg+mean         mean      13         7 0.6500000    20
## 7  disagg+mean         rank       0         1 0.0000000     1
## 8  disagg+mean        shape      15         3 0.8333333    18
## 9         mean  correlation       3         4 0.4285714     7
## 10        mean         mean      11         6 0.6470588    17
## 11        mean        shape       1         0 1.0000000     1

# Gives us 80 observations, since 10 participants didn't make observations on the first trial
df_firstTrial_facetWorker_stats <- df[df$trial==1,] %>%
   ddply(.(aggStrat, workerId), summarise,
         Correct=sum(correct==TRUE),
         Incorrect=sum(correct==FALSE),
         PercCorrect=Correct/(Incorrect+Correct),
         Total=Incorrect+Correct) %>%
   ddply(~aggStrat, summarise,
         N = sum((Total)),
         meanAcc = mean(PercCorrect),
         sd = sd(PercCorrect),
         se = sd / sqrt(N))
df_firstTrial_facetWorker_stats

##      aggStrat  N   meanAcc        sd         se
## 1      disagg 45 0.7380952 0.4112076 0.06129921
## 2 disagg+mean 41 0.6964286 0.4582431 0.07156555
## 3        mean 25 0.6041667 0.4885464 0.09770927

Confidence

A summary of reported confidence from participants. We first investigate general summary statistics for confidence taking into account individual differences between participants. We find that between aggregation strategies, there are small differences in total mean confidence (disaggregation - 67%, disaggregation+mean - 69%, mean - 72%).

Confidence by Participant - Aggregation Strategy combination

df %>%
   ddply(.(aggStrat, workerId), summarise, ## to get confidence per participant
         N = sum(correct==TRUE, na.rm=TRUE) + sum(correct==FALSE, na.rm=TRUE),
         correct = sum(correct==TRUE),
         meanConf = mean(confidence, na.rm=TRUE),
         sd = sd(confidence),
         se = sd / sqrt(N)) %>%
   ddply(~aggStrat, summarise, ## then to average the average confidence per participant
      Total = sum(N),
      correct = sum(correct, na.rm=TRUE),
      meanTotalConf = mean(meanConf, na.rm=TRUE),
      sdTotal = sd(meanConf),
      seTotal = sdTotal / sqrt(Total))

##      aggStrat Total correct meanTotalConf  sdTotal   seTotal
## 1      disagg   607     399      67.27951 18.54913 0.7528859
## 2 disagg+mean   608     405      68.63591 20.08251 0.8144535
## 3        mean   528     354      71.76873 18.00758 0.7836794

## Warning: Removed 2 rows containing missing values (geom_errorbar).

Confidence Bayesian Model

We examine how aggregation strategy impacts confidence by again running a Bayesian hierarchical model. We find that there is an effect of the mean aggregation condition on confidence compared to the disaggregated and disaggregated with means condition.

##             Mean   StdDev lower 0.95 upper 0.95    n_eff      Rhat
## bnoagg -3.543878 1.285270  -5.878993 -0.8698803 2963.221 0.9999075
## bmean  -1.928507 1.305901  -4.550995  0.5861019 2741.627 0.9997666

## 105 vector or matrix parameters omitted in display. Use depth=2 to show them.

## 105 vector or matrix parameters omitted in display. Use depth=2 to show them.

One potential confound for the effect that we observe in confidence is the intial confidence slider value each participant saw when they were asked to record their confidence. For each trial, we randomized the initial confidence slider position, as giving a default value could influence the confidence a participant gave. However, it is possible that sampling caused a higher rate of a range of values for a given aggregation condition, which could potentially confound any results we found in the confidence effect. We investigate the effects of this potential confound by plotting the distribution of initial confidence slider positions. We select a bin size of 4, resulting in 25 bins to see the data in a somewhat high resolution. Given the histograms of the distribution for each aggregation strategy, we see that though there is variance, generally initial slider value is the same.

df %>%
   ddply(.(aggStrat), summarise,
            total = sum(correct==TRUE) + sum(correct==FALSE),
            meanInitialSliderValue = mean(initSliderValue, na.rm=TRUE),
            sd = sd(initSliderValue),
            se = sd / sqrt(total))

##      aggStrat total meanInitialSliderValue       sd       se
## 1      disagg   607               50.36079 29.32739 1.190362
## 2 disagg+mean   608               50.33717 29.71545 1.205121
## 3        mean   528               50.10227 29.41093 1.279946

Accuracy-Confidence Relationship

We perform an exploratory analysis to investigate the relationship between accuracy, confidence and aggregation strategy. We plot the the average confidence of each worker per aggregation strategy to get a better understanding of the distribution of confidence. Just from plotting all trials and their confidence, we see that consistently, generalizations marked as correct tend to have a higher confidence.

Per our preregistration, we report on how overall accuracy changes with respect to confidence. We see that accuracy stays relatively consistent when we threshold with the exception of when we consider generalizations where participants report 100 confidence. However, it is unclear if this is a reliable effect, as increasing thresholds of confidence reduce the sample size of generalizations (i.e. the subset of all generalizations of 0 reported confidence or greater will be larger than the subset of generalizations with a value of 100 reported confidence). We investigate this difference by calculating the biserial point correlation between accuracy and confidence.

Because there may be individual differences between how participants use confidence, we first find the average point biserial correlation between confidence and accuracy for each participant-aggregation strategy pair. Then we average these correlations across participants for each aggregation strategy. We interpret when a worker gets all of their observations correct (biserial.cor gives NaN) as 0 correlation. We find that the average correlation between accuracy and confidence across aggregation strategies are not reliably different.

ddply(df_worker_CorConf[df_worker_CorConf$biserial!=0,], ~aggStrat, summarize,
      avgConf=mean(conf),
      avgBiserial=mean(biserial),
      sdConf=sd(conf),
      sdCorr=sd(biserial),
      seConf=sdConf/sqrt(length(aggStrat)),
      seCorr=sdCorr/sqrt(length(aggStrat)))

##      aggStrat  avgConf avgBiserial   sdConf    sdCorr   seConf     seCorr
## 1        mean 69.74438 -0.10411885 18.50268 0.5455777 2.472524 0.07290589
## 2      disagg 66.24506 -0.03499143 18.81899 0.4774175 2.117302 0.05371367
## 3 disagg+mean 69.73120 -0.04704603 16.86296 0.5531112 1.987319 0.06518478

First Trial Confidence

As per our pre-registration, we analyze first trial confidence before participants are aware they will be asked for confidence, in order to to compare confidence between aggregation strategy. We find that on the first trial, aggregation as a mean mark is slightly less accurate and less confident in their generalizations. We plot these results for aid.

Effect Magnitude (EM) and Quantitative Prediction (QP) Summary

Descriptive Statistics

Now let’s look at the two new codes for effect magnitude estimate and quantitative predictions. There are 211 EM generalizations and 991 QP generalizations. There is a greater amount of EM generalizations for the aggregation by default condition. Looking at distributions of both, we find that there are more QP generalizations for the shape class types. Most of these are likely because shape class generalizations included those where the participant noted the shape of a distribution or the size of a bin (i.e. “Ages range from 18 to 68.” or “The majority of purchases are between 75 and 175 dollars.”)

df %>%
  ddply(~aggStrat, summarise,
      es = sum(effectSizeMagnitudeRemoveNulls==TRUE, na.rm=TRUE),
      qp = sum(quantitativePrediction==TRUE, na.rm=TRUE),
      es_null = sum(effectSizeMagnitude==TRUE & effectSizeMagnitudeRemoveNulls==FALSE, na.rm=TRUE),
      total = sum(effectSizeMagnitude==TRUE | effectSizeMagnitude==FALSE, na.rm=TRUE),
      PercES = es/total,
      PercQP = qp/total,
      PercES_null = es_null/total,
      PercES_null_ofES = es_null/es)

##      aggStrat es  qp es_null total     PercES    PercQP PercES_null
## 1      disagg 18 374      22   607 0.02965404 0.6161450  0.03624382
## 2 disagg+mean 27 353      45   607 0.04448105 0.5815486  0.07413509
## 3        mean 22 264      77   528 0.04166667 0.5000000  0.14583333
##   PercES_null_ofES
## 1         1.222222
## 2         1.666667
## 3         3.500000

## Using aggStrat as id variables

Per our preregistration, we run models for both QP and EM to see if there is any effect between aggregation condition and either effect. We see a slight effect in the effect magnitude estimates model, where the disagg condition in particular appears to result in less effect magnitude estimates, however the 95% CI slightly crosses 0 so this effect is not entirely reliable. Similarly, we find no reliable difference in aggregation condition for predicting the quantitative prediction code.

Quantitative Predictions Bayesian Model

##             Mean    StdDev lower 0.95 upper 0.95    n_eff      Rhat
## bnoagg 0.8221094 0.1851149  0.4477505  1.1569307 3063.505 0.9998044
## bmean  0.4402947 0.1816723  0.0747635  0.7822615 3684.267 1.0000511

##  a_worker[1]  a_worker[2]  a_worker[3]  a_worker[4]  a_worker[5]  a_worker[6] 
##  30.78829127   0.96019737   1.95213155   1.01920260   0.88433542  36.36157391 
##  a_worker[7]  a_worker[8]  a_worker[9] a_worker[10] a_worker[11] a_worker[12] 
##   0.48305934   0.50551285   0.30587174   1.37984098   0.64069629   3.00512664 
## a_worker[13] a_worker[14] a_worker[15] a_worker[16] a_worker[17] a_worker[18] 
##   0.36025403   0.27918094   0.29617222   0.64930723   3.05985434   1.09990007 
## a_worker[19] a_worker[20] a_worker[21] a_worker[22] a_worker[23] a_worker[24] 
##   0.46346892   2.63761826   2.23523377   3.60846655   0.94353618   0.16654778 
## a_worker[25] a_worker[26] a_worker[27] a_worker[28] a_worker[29] a_worker[30] 
##   1.65855073   0.65551287   0.59022110   0.61422105   0.38599907   0.81313153 
## a_worker[31] a_worker[32] a_worker[33] a_worker[34] a_worker[35] a_worker[36] 
##   0.22427363   0.89675095   0.75492025   0.36229353   0.22066419   0.19255487 
## a_worker[37] a_worker[38] a_worker[39] a_worker[40] a_worker[41] a_worker[42] 
##   5.15856020   1.81980332   0.41265509   0.39140692   0.51509545  76.65077420 
## a_worker[43] a_worker[44] a_worker[45] a_worker[46] a_worker[47] a_worker[48] 
##   8.55109972   1.66947701   1.59515203   0.42143869   1.91153296   0.43313010 
## a_worker[49] a_worker[50] a_worker[51] a_worker[52] a_worker[53] a_worker[54] 
##   3.69067995   2.18063619   1.05267098   0.18117429   3.73991471   0.32364518 
## a_worker[55] a_worker[56] a_worker[57] a_worker[58] a_worker[59] a_worker[60] 
##   0.31624826   1.80140514   0.85356970   0.33055260   0.93875658   1.08006222 
## a_worker[61] a_worker[62] a_worker[63] a_worker[64] a_worker[65] a_worker[66] 
##   0.80989807   2.21856457   1.52191852   0.58872443   1.82593072   0.45575077 
## a_worker[67] a_worker[68] a_worker[69] a_worker[70] a_worker[71] a_worker[72] 
##   0.52774399   1.11527696   0.49238907   1.13818393   0.79374572   1.41150616 
## a_worker[73] a_worker[74] a_worker[75] a_worker[76] a_worker[77] a_worker[78] 
##   2.69248913   0.24991897   0.22176391   1.57730855   0.82619165   0.26533352 
## a_worker[79] a_worker[80] a_worker[81] a_worker[82] a_worker[83] a_worker[84] 
##   3.60174079   0.23944439   6.20088790   1.03860351   0.11962059   0.42049225 
## a_worker[85] a_worker[86] a_worker[87] a_worker[88] a_worker[89] a_worker[90] 
##   0.32021280  44.97205923   0.42180136   2.19589181   3.67855880   2.42468668 
##    a_spec[1]    a_spec[2]    a_spec[3]    a_spec[4]    a_spec[5]    a_spec[6] 
##  60.21403035 241.25743895  58.61470721   4.82421634  13.90000351   0.02951658 
##    a_spec[7]    a_spec[8]    a_spec[9]   a_spec[10]   a_spec[11]   a_spec[12] 
##   0.08802007   0.04133725   0.07159606   0.06116567   1.77998753   0.31799608 
##   a_spec[13]   a_spec[14]   a_spec[15]            a       bnoagg        bmean 
##   0.66913714   0.53785165   0.41353185   2.03622793   2.27529424   1.55316480 
##       btrial sigma_worker   sigma_spec 
##   0.98216444   4.08243150  21.21822003

## 105 vector or matrix parameters omitted in display. Use depth=2 to show them.

## 105 vector or matrix parameters omitted in display. Use depth=2 to show them.

Effect Magnitude Bayesian Model

##             Mean    StdDev lower 0.95 upper 0.95    n_eff      Rhat
## bnoagg -1.916758 0.2726352  -2.469291 -1.4124389 3172.284 1.0001260
## bmean  -1.012913 0.2271087  -1.474882 -0.5725651 4554.330 0.9998404

##  a_worker[1]  a_worker[2]  a_worker[3]  a_worker[4]  a_worker[5]  a_worker[6] 
##   0.10721523   0.99512706   1.93761449   0.31589697   1.96322801   0.38753877 
##  a_worker[7]  a_worker[8]  a_worker[9] a_worker[10] a_worker[11] a_worker[12] 
##   4.47259449   0.90529033   3.15834227   1.10713591   0.30697576   0.33119925 
## a_worker[13] a_worker[14] a_worker[15] a_worker[16] a_worker[17] a_worker[18] 
##   0.90604338   0.71459864   1.75510620   3.33801419   2.58656870   0.14779381 
## a_worker[19] a_worker[20] a_worker[21] a_worker[22] a_worker[23] a_worker[24] 
##   3.87312664   0.19073328   2.15796172   0.19275717   1.36740082   1.02122427 
## a_worker[25] a_worker[26] a_worker[27] a_worker[28] a_worker[29] a_worker[30] 
##   2.14104380   0.81440122   1.02864986   3.05995626   0.26599871   0.70594167 
## a_worker[31] a_worker[32] a_worker[33] a_worker[34] a_worker[35] a_worker[36] 
##   2.65431465   1.15923871   0.72542791   2.23180328   4.78808983   3.30965247 
## a_worker[37] a_worker[38] a_worker[39] a_worker[40] a_worker[41] a_worker[42] 
##   0.30907558   0.44647343  16.93376309   2.89172385   1.70107775   0.11832887 
## a_worker[43] a_worker[44] a_worker[45] a_worker[46] a_worker[47] a_worker[48] 
##   0.43986727   2.60080792   0.95445962   1.32375822   0.85287681   0.19416983 
## a_worker[49] a_worker[50] a_worker[51] a_worker[52] a_worker[53] a_worker[54] 
##   0.21851091   0.16874253   0.21733452  15.53169471   0.22770791   1.95964554 
## a_worker[55] a_worker[56] a_worker[57] a_worker[58] a_worker[59] a_worker[60] 
##   3.49330827   0.70523671   0.92682892   0.82010816   1.55142042   1.42228401 
## a_worker[61] a_worker[62] a_worker[63] a_worker[64] a_worker[65] a_worker[66] 
##   0.58781296   0.19325708   1.39562222  13.15986971   0.19578397   1.46902741 
## a_worker[67] a_worker[68] a_worker[69] a_worker[70] a_worker[71] a_worker[72] 
##   1.55948454   0.93743453   1.53113519   1.19046553   0.89703989   1.80261179 
## a_worker[73] a_worker[74] a_worker[75] a_worker[76] a_worker[77] a_worker[78] 
##   3.60882459   2.45456538   2.02406362   0.18030708   1.06091358   2.44796396 
## a_worker[79] a_worker[80] a_worker[81] a_worker[82] a_worker[83] a_worker[84] 
##   0.71513935   6.38835220   0.45183420   1.07027765   2.54363855   0.65632782 
## a_worker[85] a_worker[86] a_worker[87] a_worker[88] a_worker[89] a_worker[90] 
##   1.59095977   0.21251424   5.77584845   0.67267812   0.15218842   0.38723519 
##    a_spec[1]    a_spec[2]    a_spec[3]    a_spec[4]    a_spec[5]    a_spec[6] 
##   0.02898528   0.02902509   0.03165066   0.02661199   0.03213437   0.60104659 
##    a_spec[7]    a_spec[8]    a_spec[9]   a_spec[10]   a_spec[11]   a_spec[12] 
##  77.78575033   6.94200811   3.27045207  13.43827881  10.43981490   0.71629881 
##   a_spec[13]   a_spec[14]   a_spec[15]            a       bnoagg        bmean 
##  12.16674834  24.97323128   1.02865896   0.03867311   0.14708296   0.36315944 
##       btrial sigma_worker   sigma_spec 
##   1.01935323   4.05029496  22.75104265

## 105 vector or matrix parameters omitted in display. Use depth=2 to show them.

## 105 vector or matrix parameters omitted in display. Use depth=2 to show them.

Effect Magnitude Remove Nulls Bayesian Model

Results

##               Mean    StdDev lower 0.95 upper 0.95    n_eff     Rhat
## bnoagg -0.48345160 0.3693295  -1.241617  0.2316175 4210.976 1.000095
## bmean  -0.01623524 0.3352617  -0.692010  0.6330092 4159.446 1.000071

##  a_worker[1]  a_worker[2]  a_worker[3]  a_worker[4]  a_worker[5]  a_worker[6] 
##   0.50100249   1.44086284   1.61635801   1.02495007   0.79328642   1.18890139 
##  a_worker[7]  a_worker[8]  a_worker[9] a_worker[10] a_worker[11] a_worker[12] 
##   2.12352479   0.59385158   2.08078343   0.61591522   0.75314360   1.23568505 
## a_worker[13] a_worker[14] a_worker[15] a_worker[16] a_worker[17] a_worker[18] 
##   0.63226914   1.46245002   0.72050738   2.00725339   1.21727331   0.55873673 
## a_worker[19] a_worker[20] a_worker[21] a_worker[22] a_worker[23] a_worker[24] 
##   0.97644795   0.60508587   4.07672760   0.53540105   0.96487614   0.61369462 
## a_worker[25] a_worker[26] a_worker[27] a_worker[28] a_worker[29] a_worker[30] 
##   0.64977870   1.99899764   0.73319962   0.79424685   0.57202028   1.47555333 
## a_worker[31] a_worker[32] a_worker[33] a_worker[34] a_worker[35] a_worker[36] 
##   0.79178937   1.47725603   1.25251508   0.76339749   0.62649538   0.60162545 
## a_worker[37] a_worker[38] a_worker[39] a_worker[40] a_worker[41] a_worker[42] 
##   0.49966347   1.08235997   5.16548289   2.46415650   0.56524445   0.40781018 
## a_worker[43] a_worker[44] a_worker[45] a_worker[46] a_worker[47] a_worker[48] 
##   0.53647838   3.80617028   1.63321090   0.81389465   0.64732810   0.58657883 
## a_worker[49] a_worker[50] a_worker[51] a_worker[52] a_worker[53] a_worker[54] 
##   0.54527507   0.53798643   0.65254359   1.37777698   0.55488911   1.07810943 
## a_worker[55] a_worker[56] a_worker[57] a_worker[58] a_worker[59] a_worker[60] 
##   2.37975170   1.49125014   0.62743463   0.57539195   0.62151526   2.79384795 
## a_worker[61] a_worker[62] a_worker[63] a_worker[64] a_worker[65] a_worker[66] 
##   0.62722810   0.60989141   0.46824021   3.04380184   0.65025406   3.81356283 
## a_worker[67] a_worker[68] a_worker[69] a_worker[70] a_worker[71] a_worker[72] 
##   1.43840790   0.65130152   2.91434238   0.58431946   0.59300159   1.64593718 
## a_worker[73] a_worker[74] a_worker[75] a_worker[76] a_worker[77] a_worker[78] 
##   3.17821848   3.39199590   0.58051115   0.52222142   1.23274006   0.62966732 
## a_worker[79] a_worker[80] a_worker[81] a_worker[82] a_worker[83] a_worker[84] 
##   1.36212026   0.63697347   0.76210187   1.29843804   1.75182981   0.64535902 
## a_worker[85] a_worker[86] a_worker[87] a_worker[88] a_worker[89] a_worker[90] 
##   0.59760086   0.55192005   9.09946231   0.98441798   0.54646360   0.73130402 
##    a_spec[1]    a_spec[2]    a_spec[3]    a_spec[4]    a_spec[5]    a_spec[6] 
##   0.18761129   0.17619858   0.20312334   0.15997216   0.19231273   0.71540897 
##    a_spec[7]    a_spec[8]    a_spec[9]   a_spec[10]   a_spec[11]   a_spec[12] 
##  18.44706909   4.98426814   0.69231102   4.62222992   3.83070138   1.70602665 
##   a_spec[13]   a_spec[14]   a_spec[15]            a       bnoagg        bmean 
##   0.48747877   3.32762361   1.61169149   0.01913411   0.61665128   0.98389585 
##       btrial sigma_worker   sigma_spec 
##   0.93758693   2.98347777   5.92455170

Plot

## 105 vector or matrix parameters omitted in display. Use depth=2 to show them.

Dichotomous Descriptive Statistics

We do an exploratory analysis of a code to investigate generalizations that do not include a magnitude of effect, but infer an effect or not (i.e. “Number of time on sites is pretty unrelated to the number of visits” or “Ad Campaign B resulted in the greatest number of purchases”). We find that the mean aggregation strategy has the highest rate of dichotomous thinking, at 38%, vs disaggregation with means at 32%, and finally disaggregation at 30%. This supports our hypothesis that there will be a higher rate of generalizations coded as dichotomous under the mean aggregation condition.

df %>%
  subset(df$dichotomous == TRUE, na.rm=TRUE) %>%
  ddply(.(aggStrat), summarise,
        total=sum(correct==TRUE)+sum(correct==FALSE),
        percentOfTotal=total/618,
        Correct=sum(correct==TRUE),
        acc=Correct/total) # hard coded from nrow of subset(df$dichotomous == TRUE, na.rm=TRUE)

##      aggStrat total percentOfTotal Correct       acc
## 1      disagg   184      0.2977346     103 0.5597826
## 2 disagg+mean   200      0.3236246     121 0.6050000
## 3        mean   234      0.3786408     180 0.7692308

## Mean accuracy with standard error, taking into account differences between workers
df %>%
  subset(df$dichotomous == TRUE, na.rm=TRUE) %>%
  ddply(.(workerId, aggStrat), summarise,
        TotalIsEffect=sum(effectSizeMagnitude==TRUE),
        total=sum(correct==TRUE)+sum(correct==FALSE),
        correct=sum(correct==TRUE),
        accuracy=correct/total) %>%
  ddply(~aggStrat, summarise,
      N = sum((total)),
      meanAccuracy = mean(accuracy),
      sd = sd(accuracy),
      se = sd / sqrt(N))

##      aggStrat   N meanAccuracy        sd         se
## 1      disagg 184    0.5472944 0.3866051 0.02850091
## 2 disagg+mean 200    0.5801282 0.3823680 0.02703750
## 3        mean 234    0.7849307 0.3143897 0.02055230

## Using aggStrat as id variables

Dichotomous Bayesian Model

## 105 vector or matrix parameters omitted in display. Use depth=2 to show them.

##              Mean    StdDev lower 0.95 upper 0.95    n_eff     Rhat
## bnoagg -0.8566891 0.1581353 -1.1685537 -0.5562622 4608.113 1.000128
## bmean  -0.5644486 0.1577958 -0.8762769 -0.2564761 5174.422 1.000734

##             Mean   StdDev lower 0.95 upper 0.95 n_eff     Rhat
## bnoagg 0.4245654 1.171325  0.3108162  0.5733481   Inf 2.718630
## bmean  0.5686736 1.170927  0.4163301  0.7737735   Inf 2.720279

##  a_worker[1]  a_worker[2]  a_worker[3]  a_worker[4]  a_worker[5]  a_worker[6] 
##   0.14157342   2.84402149   2.02056983   0.82161308   3.80030043   0.42449920 
##  a_worker[7]  a_worker[8]  a_worker[9] a_worker[10] a_worker[11] a_worker[12] 
##   2.21518137   1.30905073   1.90027121   0.74562103   0.79227399   0.95354742 
## a_worker[13] a_worker[14] a_worker[15] a_worker[16] a_worker[17] a_worker[18] 
##   2.33272060   3.37762828   2.29886750   1.16813066   2.67251928   5.73893486 
## a_worker[19] a_worker[20] a_worker[21] a_worker[22] a_worker[23] a_worker[24] 
##   4.96133868   0.49007779   0.88957351   0.91469013   1.03435029   1.13001261 
## a_worker[25] a_worker[26] a_worker[27] a_worker[28] a_worker[29] a_worker[30] 
##   1.09561114   1.60376669   1.48539833   1.78881737   2.78552001   0.43352650 
## a_worker[31] a_worker[32] a_worker[33] a_worker[34] a_worker[35] a_worker[36] 
##   1.33906455   1.24983534   0.72334710   1.41242162   2.09764342   2.73002110 
## a_worker[37] a_worker[38] a_worker[39] a_worker[40] a_worker[41] a_worker[42] 
##   0.60396873   0.23053052   1.13512473   1.27548046   0.82286178   0.06027365 
## a_worker[43] a_worker[44] a_worker[45] a_worker[46] a_worker[47] a_worker[48] 
##   0.14854118   0.50511196   1.77080170   0.94569199   1.77011396   1.50826733 
## a_worker[49] a_worker[50] a_worker[51] a_worker[52] a_worker[53] a_worker[54] 
##   0.67860945   0.96179969   0.43404275   2.59343987   0.75074608   1.34063041 
## a_worker[55] a_worker[56] a_worker[57] a_worker[58] a_worker[59] a_worker[60] 
##   0.82993285   0.36169230   2.91050182   0.85130740   1.38948189   0.40885941 
## a_worker[61] a_worker[62] a_worker[63] a_worker[64] a_worker[65] a_worker[66] 
##   1.25015453   0.24077538   0.88079930   1.51756259   0.72081710   0.20436293 
## a_worker[67] a_worker[68] a_worker[69] a_worker[70] a_worker[71] a_worker[72] 
##   1.71889707   0.49528387   0.62774952   1.07180450   1.10820912   0.94002904 
## a_worker[73] a_worker[74] a_worker[75] a_worker[76] a_worker[77] a_worker[78] 
##   0.41168817   1.63532014   3.09004940   0.74483042   1.79618374   3.84255086 
## a_worker[79] a_worker[80] a_worker[81] a_worker[82] a_worker[83] a_worker[84] 
##   0.55338873   2.74246807   0.33935102   0.73121980   0.84268166   3.07434946 
## a_worker[85] a_worker[86] a_worker[87] a_worker[88] a_worker[89] a_worker[90] 
##   2.32832524   0.06677319   0.53647950   1.25349825   0.37563455   0.49321930 
##    a_spec[1]    a_spec[2]    a_spec[3]    a_spec[4]    a_spec[5]    a_spec[6] 
##   0.03861059   0.03769095   0.20156934   0.10810241   0.15730483   4.48918121 
##    a_spec[7]    a_spec[8]    a_spec[9]   a_spec[10]   a_spec[11]   a_spec[12] 
##   4.33769020   6.40467838   4.66579165   4.16231966   1.04367312   4.59556069 
##   a_spec[13]   a_spec[14]   a_spec[15]            a       bnoagg        bmean 
##   2.33362494   3.10906211   3.58113876   0.30243252   0.42456544   0.56867360 
##       btrial sigma_worker   sigma_spec 
##   1.04608106   2.81358175   7.50134184

## 105 vector or matrix parameters omitted in display. Use depth=2 to show them.

##             Mean   StdDev lower 0.95 upper 0.95 n_eff     Rhat
## bnoagg 0.4245654 1.171325  0.3108162  0.5733481   Inf 2.718630
## bmean  0.5686736 1.170927  0.4163301  0.7737735   Inf 2.720279

## 105 vector or matrix parameters omitted in display. Use depth=2 to show them.

Dichotomous Accuracy Rates

##      aggStrat   N   meanAcc        sd         se
## 1      disagg 184 0.5472944 0.3866051 0.02850091
## 2 disagg+mean 200 0.5801282 0.3823680 0.02703750
## 3        mean 234 0.7849307 0.3143897 0.02055230

[E1] Main Analaysis, N=1000 – Effect of Aggregation on Generalizations Study Results

Francis Nguyen, Xiaoli Qiao, Jeffrey Heer, Jessica Hullman

12/22/2019

Data Preliminaries

Descriptive Statistics

Accuracy

Accuracy by Aggregation Strategy

Accuracy by Aggregation Strategy, Faceted by Generalization Class

Accuracy by Data Type Combination

Bayesian Models Setup

Accuracy Bayesian Model

First Trial Accuracy

Confidence

Confidence by Participant - Aggregation Strategy combination

Confidence Bayesian Model

Accuracy-Confidence Relationship

First Trial Confidence

Effect Magnitude (EM) and Quantitative Prediction (QP) Summary

Descriptive Statistics

Quantitative Predictions Bayesian Model

Effect Magnitude Bayesian Model

Effect Magnitude Remove Nulls Bayesian Model

Dichotomous Descriptive Statistics

Dichotomous Bayesian Model

Dichotomous Accuracy Rates