Loading required namespace: GenomicFiles Using local VCF. File already tabix-indexed. Finding empty VCF columns based on first 10,000 rows. Dropping 1 duplicate column(s). 1 sample detected: ieu-a-1025 Constructing ScanVcfParam object. VCF contains: 154,065 variant(s) x 1 sample(s) Reading VCF file: multi-threaded (4 threads) Dropping 1 duplicate column(s). Dropping 1 duplicate column(s). Dropping 1 duplicate column(s). Dropping 1 duplicate column(s). Renaming ID as SNP. VCF file has -log10 P-values; these will be converted to unadjusted p-values in the 'P' column. No INFO (SI) column detected. Standardising column headers. First line of summary statistics file: SNP chr BP end REF ALT FILTER AF ES LP SE SS P Summary statistics report: - 154,065 rows - 154,065 unique variants - 3,795 genome-wide significant variants (P<5e-8) - 22 chromosomes Checking for multi-GWAS. Checking for multiple RSIDs on one row. Inferring genome build. Loading SNPlocs data. Loading reference genome data. Preprocessing RSIDs. Validating RSIDs of 10,000 SNPs using BSgenome::snpsById... Loading required package: BiocGenerics Attaching package: ‘BiocGenerics’ The following objects are masked from ‘package:stats’: IQR, mad, sd, var, xtabs The following objects are masked from ‘package:base’: anyDuplicated, aperm, append, as.data.frame, basename, cbind, colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply, union, unique, unsplit, which.max, which.min Loading required package: S4Vectors Loading required package: stats4 Attaching package: ‘S4Vectors’ The following objects are masked from ‘package:base’: expand.grid, I, unname BSgenome::snpsById done in 41 seconds. Loading SNPlocs data. Loading reference genome data. Preprocessing RSIDs. Validating RSIDs of 10,000 SNPs using BSgenome::snpsById... BSgenome::snpsById done in 59 seconds. Inferred genome build: GRCH37 Checking SNP RSIDs. Checking for merged allele column. Checking A1 is uppercase Checking A2 is uppercase Checking for incorrect base-pair positions Ensuring all SNPs are on the reference genome. Loading SNPlocs data. Loading reference genome data. Preprocessing RSIDs. Validating RSIDs of 154,065 SNPs using BSgenome::snpsById... BSgenome::snpsById done in 15 seconds. 6,429 SNPs are not on the reference genome. These will be corrected from the reference genome. Loading SNPlocs data. Sorting coordinates with 'data.table'. Writing in tabular format ==> /rds/general/project/neurogenomics-lab/ephemeral/MAGMA_Files_Public/data/GWAS_munged/ieu-a-1025/logs/snp_not_found_from_chr_bp.tsv Writing uncompressed instead of gzipped to enable tabix indexing. Converting full summary stats file to tabix format for fast querying... Reading header. Ensuring file is bgzipped. Tabix-indexing file. Removing temporary .tsv file. Loading SNPlocs data. Loading reference genome data. Preprocessing RSIDs. Validating RSIDs of 147,639 SNPs using BSgenome::snpsById... BSgenome::snpsById done in 14 seconds. Checking for correct direction of A1 (reference) and A2 (alternative allele). There are 2 SNPs where neither A1 nor A2 match the reference genome. These will be removed. Sorting coordinates with 'data.table'. Writing in tabular format ==> /rds/general/project/neurogenomics-lab/ephemeral/MAGMA_Files_Public/data/GWAS_munged/ieu-a-1025/logs/alleles_dont_match_ref_gen.tsv Writing uncompressed instead of gzipped to enable tabix indexing. Converting full summary stats file to tabix format for fast querying... Reading header. Ensuring file is bgzipped. Tabix-indexing file. Removing temporary .tsv file. Reordering so first three column headers are SNP, CHR and BP in this order. Reordering so the fourth and fifth columns are A1 and A2. Checking for missing data. WARNING: 1,554 rows in sumstats file are missing data and will be removed. Sorting coordinates with 'data.table'. Writing in tabular format ==> /rds/general/project/neurogenomics-lab/ephemeral/MAGMA_Files_Public/data/GWAS_munged/ieu-a-1025/logs/missing_data.tsv Writing uncompressed instead of gzipped to enable tabix indexing. Converting full summary stats file to tabix format for fast querying... Reading header. Ensuring file is bgzipped. Tabix-indexing file. Removing temporary .tsv file. Checking for duplicate columns. Ensuring that the N column is all integers. The sumstats N column is not all integers, this could effect downstream analysis. These will be converted to integers. Checking for duplicate SNPs from SNP ID. Checking for SNPs with duplicated base-pair positions. INFO column not available. Skipping INFO score filtering step. Filtering SNPs, ensuring SE>0. Ensuring all SNPs have N<5 std dev above mean. Checking for bi-allelic SNPs. 4,295 SNPs are non-biallelic. These will be removed. Sorting coordinates with 'data.table'. Writing in tabular format ==> /rds/general/project/neurogenomics-lab/ephemeral/MAGMA_Files_Public/data/GWAS_munged/ieu-a-1025/logs/snp_bi_allelic.tsv Writing uncompressed instead of gzipped to enable tabix indexing. Converting full summary stats file to tabix format for fast querying... Reading header. Ensuring file is bgzipped. Tabix-indexing file. Removing temporary .tsv file. Computing Z-score from P using formula: `sign(BETA)*sqrt(stats::qchisq(P,1,lower=FALSE)` N already exists within sumstats_dt. 34,607 SNPs (24.4%) have FRQ values > 0.5. Conventionally the FRQ column is intended to show the minor/effect allele frequency. The FRQ column was mapped from one of the following from the inputted summary statistics file: FRQ, EAF, FREQUENCY, FRQ_U, F_U, MAF, FREQ, FREQ_TESTED_ALLELE, FRQ_TESTED_ALLELE, FREQ_EFFECT_ALLELE, FRQ_EFFECT_ALLELE, EFFECT_ALLELE_FREQUENCY, EFFECT_ALLELE_FREQ, EFFECT_ALLELE_FRQ, A1FREQ, A1FRQ, A2FREQ, A2FRQ, ALLELE_FREQUENCY, ALLELE_FREQ, ALLELE_FRQ, AF, MINOR_AF, EFFECT_AF, A2_AF, EFF_AF, ALT_AF, ALTERNATIVE_AF, INC_AF, A_2_AF, TESTED_AF, AF1, ALLELEFREQ, ALT_FREQ, EAF_HRC, EFFECTALLELEFREQ, FREQ.A1.1000G.EUR, FREQ.A1.ESP.EUR, FREQ.ALLELE1.HAPMAPCEU, FREQ.B, FREQ1, FREQ1.HAPMAP, FREQ_EUROPEAN_1000GENOMES, FREQ_HAPMAP, FREQ_TESTED_ALLELE_IN_HRS, FRQ_A1, FRQ_U_113154, FRQ_U_31358, FRQ_U_344901, FRQ_U_43456, POOLED_ALT_AF, AF_ALT, AF.ALT, AF-ALT, ALT.AF, ALT-AF, A2.AF, A2-AF, AF.EFF, AF_EFF, AF_EFF As frq_is_maf=TRUE, the FRQ column will not be renamed. If the FRQ values were intended to represent major allele frequency, set frq_is_maf=FALSE to rename the column as MAJOR_ALLELE_FRQ and differentiate it from minor/effect allele frequency. Sorting coordinates with 'data.table'. Sorting coordinates with 'data.table'. Writing in tabular format ==> /rds/general/project/neurogenomics-lab/ephemeral/MAGMA_Files_Public/data/GWAS_munged/ieu-a-1025/ieu-a-1025.tsv Writing uncompressed instead of gzipped to enable tabix indexing. Converting full summary stats file to tabix format for fast querying... Reading header. Ensuring file is bgzipped. Tabix-indexing file. Removing temporary .tsv file. Summary statistics report: - 141,788 rows (92% of original 154,065 rows) - 141,788 unique variants - 1,446 genome-wide significant variants (P<5e-8) - 22 chromosomes Done munging in 2.497 minutes. Successfully finished preparing sumstats file, preview: Reading header. SNP CHR BP A1 A2 END FILTER FRQ BETA LP 1: rs61733845 1 1118275 C T 1118275 PASS 0.0414 -0.01192860 0.1180450 2: rs9729550 1 1135242 A C 1135242 PASS 0.2744 0.00099950 0.0141246 3: rs1815606 1 1140435 G T 1140435 PASS 0.3286 0.00299551 0.0584886 4: rs7515488 1 1163804 C T 1163804 PASS 0.1597 0.01980260 0.4190750 5: rs11260562 1 1165310 G A 1165310 PASS 0.0606 0.01093990 0.1174750 SE N P Z 1: 0.0405296 38589 0.7620001 -0.30285541 2: 0.0186028 38589 0.9680001 0.04011669 3: 0.0175476 38589 0.8739999 0.15857981 4: 0.0227407 38589 0.3810000 0.87605521 5: 0.0353122 38589 0.7630008 0.30154255