SeQuiLa: An elastic, fast and scalable SQL-oriented solution for processing and querying genomic intervals

Marek Wiewiórka; Anna Leśniewska; Agnieszka Szmurło; Kacper Stępień; Mateusz Borowiak; Michał J. Okoniewski; Tomasz Gambin

doi:10.1093/bioinformatics/bty940

Scientific Information System of the Poznań University of Technology

PL EN

Main page / Publications / SeQuiLa: An elastic, fast and scalable SQL-oriented solution for processing and querying genomic intervals

Submit a comment

Article

Download BibTeX

Title

SeQuiLa: An elastic, fast and scalable SQL-oriented solution for processing and querying genomic intervals

Authors

Marek Wiewiórka
Anna Leśniewska (WI) ^{[ 1 ][ 2.3 ][ P ]}
Agnieszka Szmurło
Kacper Stępień (WI) ^{[ 1 ][ S ]}
Mateusz Borowiak (WI) ^{[ 1 ][ S ]}
Michał J. Okoniewski ^{[ 2 ]}
Tomasz Gambin

^{[ 1 ]} Instytut Informatyki, Wydział Informatyki, Politechnika Poznańska | ^{[ 2 ]} ETH Zurich | ^{[ P ]} employee | ^{[ S ]} student

Scientific discipline (Law 2.0)

[2.3] Information and communication technology

Year of publication

2019

Published in

Bioinformatics

Journal year: 2019 | Journal volume: vol. 35 | Journal number: iss. 12

Article type

scientific article

Publication language

english

Abstract

EN Efficient processing of large-scale genomic datasets has recently become possible due to the application of ‘big data’ technologies in bioinformatics pipelines. We present SeQuiLa—a distributed, ANSI SQL-compliant solution for speedy querying and processing of genomic intervals that is available as an Apache Spark package. Proposed range join strategy is significantly (∼22×) faster than the default Apache Spark implementation and outperforms other state-of-the-art tools for genomic intervals processing.

Date of online publication

14.11.2018

Pages (from - to)

2156 - 2158

DOI

10.1093/bioinformatics/bty940

URL

https://academic.oup.com/bioinformatics/article/35/12/2156/5182295

Ministry points / journal

200

Ministry points / journal in years 2017-2021

200