This page documents version 0.9dev of bingwa, which is currently considered experimental.
Introduction. bingwa is a commandline tool used to carry out metaanalysis of genomewide association studies. It is particularly suited for use with the program SNPTEST, but can also be used with other association testing programs such as MMM. Features of bingwa include:
 Direct support for SNPTEST output files.
 Support for fixedeffect frequentist and a flexible bayesian metaanalysis method.
 Support for both univariate and multivariate metaanalysis (including support for metaanalysing tests performed using SNPTEST's multinomial test).
Disclaimer. This page documents version 0.9dev of bingwa which is currently considered experimental. This means we expect some features not to work, or not to work well, or to work wrongly, or to destroy your computer or sanity.
References. If you make use of bingwa, please cite the following paper, which describes the key methodology underlying the program:
 "A novel locus of resistance to severe malaria in a region of ancient balancing selection", Malaria Genomic Epidemiology Network, Nature (2015)
The core ideas underlying bingwa were originally developed and applied in these papers:
 "Genomewide association study identifies a variant in HDAC9 associated with large vessel ischemic stroke", International Stroke Genetics Consortium (ISGC) and Wellcome Trust Case Control Consortium 2 (WTCCC2), Nat. Genetics (2012)
 "Reappraisal of known malaria resistance loci in a large multicentre study", Rockett et al, Nat. Genetics (2014).
 "ImputationBased MetaAnalysis of Severe Malaria in Three African Populations", Band et al, PLOS Genetics (2013)
Acknowledgements. Bingwa was developed at the Wellcome Trust Centre for Human Genetics, Oxford, UK. A full list of people who contributed to the methodology will appear here.
(This page is under construction.)
The general process followed by bingwa is as follows. Briefly, bingwa proceeds by
 Loading association test data from a set of cohorts (e.g. SNPTEST files).
 Matching variants between cohorts.
 Computing frequentist metaanalysis and one or more Bayes factors.
 Writing results.
We'll update this page with more details on the method soon. In the meantime the tutorial has usage examples.
Running bingwa
o sqlite://<filename>
can be used to force sqlite3 output format.
fixed:bf
.
fixed:bf
and independent:bf
columns.
Here, the independent model assumes that true effects are uncorrelated between the cohorts  i.e.
they have diagonal variancecovariance matrix.
In addition to the two modelspecific bayes factors, bingwa will compute an equally weighted modelaveraged bayes factor
over all the models specified, named mean_bf
.
Note: In the above command we use a shorthand version of the prior specification,
identified because it is of the form tau=...
.
Here, the first part (tau) denotes the betweencohort correlation, while the rest now
denotes the prior covariance matrix within each cohort. In this case there is one
effect parameter in each cohort, so the prior covariance is the 1x1 matrix (0.2^2),
and the correlation between cohorts is 1 (for fixedeffect model) or 0 (for independenteffect model).
While the difference in this example are small, this feature comes into its own when many cohorts are analysed. However, we always advise always checking the screen output to debug model definitions.
The priors
option tells bingwa to load prior specifications from a file.
Priors in the file should be specified as on the command line, but with a couple of additional
features:
 Any line starting with # is ignored  this allows models to be commented.
 Line breaks (newlines) within model specifications are ignored if they
follow one of the 'punctuation' characters  i.e. immediately after a
:
,/
, or comma character. This allows long models to be formatted across lines.
Using a priors file has the advantage of allowing priors to be stored and edited seperately from the command lines.
Note: By default all models are equally weighted in the modelaverage bayes factor. Use the priorweights
option to reweight them. Weights are renormalised to sum to one in the mean bayes factor computation  i.e.
the mean Bayes factor is computed as
$$\text{BF}_{\text{avg}} = \frac{\sum_i \text{weight}_i \cdot \text{BF}_i}{\sum_i \text{weight}_i}$$
where $\text{weight}_i$ and $\text{BF}_i$ are the specified weight and Bayes factor for model $i$.
If priorweights
is specified, any model not specified in the argument is given weight zero in the mean Bayes factor.
The argument to priorweights
can also be the name of a file, in which case bingwa will open the file
and read prior weights from it (in the same format, i.e. <model name>=<weight>
).
extracolumns
option tells bingwa to read data from the specified columns in the input file
and include it in the output.
Understanding bingwa sqlite output files
Below we've given some examples of using the shell or various programming languages to extract data from a bingwa
sqlite file.
We assume the file is named bingwa.sqlite
and the metaanalysis results were stored
using the default prefix ('Bingwa') for table names.
Note: The sqlite3
program is installed by default on most UNIXlike systems.
It's quite flexible, and can be used in interactive mode, or to run queries directly from the commandline.
We'll use both methods below.
See the sqlite3 command shell documentation for a full list of options.
Inspecting the contents of the results file
sqlite files can be inspected using the sqlite3 command shell. Open the file by typing:
$ sqlite3 column header bingwa.sqlite
You should see the sqlite prompt (sqlite3>
). Let's see what tables are stored in this file:
sqlite3> .tables Analysis BingwaCounts BingwaMetaView AnalysisProperty BingwaCountsView Variant AnalysisPropertyView BingwaDetail VariantIdentifier AnalysisStatus BingwaDetailView VariantView AnalysisStatusView BingwaMeta
Here, the main metaanalysis results are stored in the BingwaMeta
table,
and bingwa has also created the BingwaMetaView
view which provides a nicer view of the results.
The BingwaCounts
and BingwaDetail
tables contain, respectivelty,
genotype counts taken from the input files, and the raw summary statistics (beta and standard error)
that bingwa operated on.
The Analysis*
tables are used to store metadata about the analyses you have run.
Let's see what's stored in this file:
sqlite> SELECT * FROM Analysis ; id name chunk    1 bingwa analysis NA sqlite> SELECT COUNT(*) AS count FROM BingwaMetaView count  200
So this file stores metaanalysis results for 200 variants from a single analysis called "bingwa analysis".
In a large, real analysis, it's often useful to use the analysisname
and analysischunk
options to give the analysis a
more descriptive name and explain what part of the data it corresponds to. It's possible to store several analyses in the same sqlite file
(or even in the same table, provided analysis columns are compatible), so this information becomes important to understand the results.
Let's have a look at what the results actually look like. Because there are lots of columns we'll switch sqlite3 to linebased output:
sqlite3> .mode line sqlite3> SELECT * FROM BingwaMetaView LIMIT 1 ; rsid = RSID_2 alleleA = A alleleB =  analysis = bingwa analysis analysis_id = 1 variant_id = 2 chromosome = NA position = 2 cohort 1:bf = 0.685505148530258 cohort 2:bf = 0.754326805029774 FixedEffectMetaAnalysis:included_betas = 11 FixedEffectMetaAnalysis:N = 998.000188827515 FixedEffectMetaAnalysis:beta_1:add/bin2=1 = 0.048782749136173 FixedEffectMetaAnalysis:se_1 = 0.130323119437487 FixedEffectMetaAnalysis:wald_pvalue_1 = 0.708165117723267 FixedEffectMetaAnalysis:pvalue = 0.708165117723267 fixed:bf = 0.573458567964815 independent:bf = 0.517094908522291 mean_bf = 0.545276738243553 max_bf_model = fixed max_bf = 0.573458567964815 best_posterior_model = fixed best_posterior = 0.525841767807702 2nd_best_posterior_model = independent 2nd_best_posterior = 0.474158232192298
This analysis contains the results of both a fixedeffect frequentist and a bayesian analysis of two cohorts,
by default called "cohort 1" and "cohort 2". (The cohortnames
option could have been used to change these names).
The first eight columns reflect metadata, that tells us the variant being tested and a link to the Analysis table (which useful if several analyses are stored in the same table).
The chromosome information here is unspecified (the assumechromosome
option could have been used
to fill in chromosome information during the analysis).
The next two columns contain Bayes factors (BFs) computed based on the data in each cohort independently. These are not strictly metaanalysis results, but are stored here for comparison with the metaanalysis Bayes factors below. (This might change in a future version of the software.)
The columns labelled 'FixedEffectMetaAnalysis' reflect the frequentist analysis. They include 'N'
a variable ('included_betas') which here indicates that data from both cohorts informed the analysis, and the
combined effect size estimate and standard error from metaanalysis. Here, because there is only one
effect size being estimed, the two pvalue columns (pvalue
and wald_pvalue_1
) are
the same.
The following columns reflect the bayesian metaanalysis. In order, they represent the BF for each model, the model average BF, the maximum BF and the model with the max BF, details of the model with the highest posterior mass (which might be different from the maximum BF if the weighting is not uniform). Similarly, the last two columns reflect the model with the 2nd highest posterior mass.
These particular results don't look very interesting (i.e. a pvalue of P=0.7, or a Bayes factor less than one, suggest very little evidence that the indel is associated). Nevertheless, if we were interested we could take a look at the data underlying this metaanalysis:
sqlite3> SELECT * FROM BingwaDetail WHERE analysis_id == 1 AND variant_id == 2; analysis_id = 1 variant_id = 2 chromosome = NA position = 2 cohort 1:beta_1:add/bin2=1 = 0.0237287990748882 cohort 1:se_1 = 0.186789005994797 cohort 1:pvalue = 0.898887991905212 cohort 2:beta_1:add/bin2=1 = 0.117560997605324 cohort 2:se_1 = 0.181916996836662 cohort 2:pvalue = 0.517305016517639
Estimated effects have opposite directions in the two cohorts. Or we could look at the genotype counts that SNPTEST reported for this test:
sqlite3> SELECT * FROM BingwaCounts WHERE analysis_id == 1 AND variant_id == 2; analysis_id = 1 variant_id = 2 chromosome = NA position = 2 cohort 1:A = 0.0 cohort 1:B = 0.0 cohort 1:AA = 24.1889991760254 cohort 1:AB = 149.192001342773 cohort 1:BB = 325.618988037109 cohort 1:NULL = 3.98792002025139e13 cohort 1:N = 498.999988555908 cohort 1:B_allele_frequency = 0.802034063901899 cohort 1:trusted = 1 cohort 2:A = 0.0 cohort 2:B = 0.0 cohort 2:AA = 28.2371997833252 cohort 2:AB = 142.957000732422 cohort 2:BB = 327.805999755859 cohort 2:NULL = 5.11590986587707e13 cohort 2:N = 499.000200271606 cohort 2:B_allele_frequency = 0.800169017777426 cohort 2:trusted = 1
Processing bingwa sqlite output files
Below are some example commands that process bingwa output in the commandline or in R. See above for a full description of bingwa's sqlite output.
01
, 02
, etc.
> library( RSQLite )
> db = dbConnect( dbDriver( "SQLite" ), "bingwa.sqlite" )
> D = dbGetQuery( db, "SELECT * FROM BingwaMetaView" )
Input file formats
.txt
, .csv
or .tsv
).
Output file formats
Bingwa can either output to a flat file or to a table in a sqlite database. Both have pros and cons.
Flat files are simple and easy to use. However, they can become difficult to read from for very large analyses (such as genomewide analyses of datasets imputed into recent reference panels) which may require writing special filehandling code.
Flat files are written when the flatfile
option is specified.
Sqlite files are easy to use via the sqlite3 command shell, which is installed on most UNIXlike systems. They can also be read directly from programming languages, e.g. by using the RSQLite package in R or the sqlite3 module in python. An attractive feature is that they are indexed, so that data for specific regions of the genome or specific variants can be easily found. See the examples page for code snippets for processing bingwaproduced sqlite files.
Sqlite is the default output format for bingwa.
Specifying priors
The file format used for the priors
option follows the following rules.
 Blank lines and lines starting with # (comment lines) are ignored.
 Every other line is part of a prior specification.
 Each prior must start with a name specification of the form '
<name>:
'.  Two forms of prior specification are permissible, full and shorthand prior specification.

A full prior specification is of the form
sd=a,b,c,.../cor=1,x,y,...
. Herea,b,c,...,x,y,...
are real numbers. The number of sds specified must equal the combined number of parameters in all cohorts put together, denoted d; the number of correlations specified must then be equal tod×(d+1)/2
. Conceptually, suppose the valuesa,b,c,...
specify the diagonal entries of a matrix denoted σ, and suppose the values1,x,y,...
specify the upper triangle of the correlation matrix Ρ (which should have 1s on the diagonal as in the example). The prior matrix used for analysis is then σ×Ρ×σ. 
A shorthand prior specification is of the form
tau=z/sd=a,b,c,d,.../cor=1,x,y,...
. In this form, the number of sds specified must equal the number of parameters in each cohort (e.g. 1 for univariate analysis), denoted d; the number of correlations specified must then be equal tod×(d+1)/2
.
Bingwa is available as source code.
Binaries
Precompiled binaries are available for the following platforms. (See here for other builds.)
Version  Platform  File 

v0.9dev^{†}  Ubuntu 12.04 x8664  bingwa_v0.9devlinuxx86_64.tgz 
v0.9dev^{†}  CentOS6.5 x8664  bingwa_v0.9devCentOS6.5x86_64.tgz 
v0.9dev^{†}  Mac OS X  bingwa_v0.9devosx.tgz 
^{†}This version of bingwa is considered experimental.
To run bingwa, download the relevant file and extract it as follows.
$ tar xzf bingwa_v0.9dev[machine].tgz $ cd bingwa_v0.9dev[machine] $ ./bingwa_v0.9dev help
Source
The source code to bingwa can be found on the bingwa page on bitbucket.