Depending on the amount of data to process, file generation may take longer.

If it takes too long to generate, you can limit the data by, for example, reducing the range of years.

Article

Download BibTeX

Title

SeQuiLa: An elastic, fast and scalable SQL-oriented solution for processing and querying genomic intervals

Authors

[ 1 ] Instytut Informatyki, Wydział Informatyki, Politechnika Poznańska | [ 2 ] ETH Zurich | [ P ] employee | [ S ] student

Scientific discipline (Law 2.0)

[2.3] Information and communication technology

Year of publication

2019

Published in

Bioinformatics

Journal year: 2019 | Journal volume: vol. 35 | Journal number: iss. 12

Article type

scientific article

Publication language

english

Abstract

EN Efficient processing of large-scale genomic datasets has recently become possible due to the application of ‘big data’ technologies in bioinformatics pipelines. We present SeQuiLa—a distributed, ANSI SQL-compliant solution for speedy querying and processing of genomic intervals that is available as an Apache Spark package. Proposed range join strategy is significantly (∼22×) faster than the default Apache Spark implementation and outperforms other state-of-the-art tools for genomic intervals processing.

Date of online publication

14.11.2018

Pages (from - to)

2156 - 2158

DOI

10.1093/bioinformatics/bty940

URL

https://academic.oup.com/bioinformatics/article/35/12/2156/5182295

Ministry points / journal

200

Ministry points / journal in years 2017-2021

200

Impact Factor

5,61

This website uses cookies to remember the authenticated session of the user. For more information, read about Cookies and Privacy Policy.