Depending on the amount of data to process, file generation may take longer.

If it takes too long to generate, you can limit the data by, for example, reducing the range of years.


Download BibTeX


DBFE: distribution-based feature extraction from structural variants in whole-genome data


[ 1 ] Instytut Informatyki, Wydział Informatyki i Telekomunikacji, Politechnika Poznańska | [ P ] employee

Scientific discipline (Law 2.0)

[2.3] Information and communication technology

Year of publication


Published in


Journal year: 2022 | Journal volume: vol. 38 | Journal number: iss. 19

Article type

scientific article

Publication language



EN Motivation: Whole-genome sequencing has revolutionized biosciences by providing tools for constructing complete DNA sequences of individuals. With entire genomes at hand, scientists can pinpoint DNA fragments responsible for oncogenesis and predict patient responses to cancer treatments. Machine learning plays a paramount role in this process. However, the sheer volume of whole-genome data makes it difficult to encode the characteristics of genomic variants as features for learning algorithms. Results: In this article, we propose three feature extraction methods that facilitate classifier learning from sets of genomic variants. The core contributions of this work include: (i) strategies for determining features using variant length binning, clustering and density estimation; (ii) a programing library for automating distribution-based feature extraction in machine learning pipelines. The proposed methods have been validated on five real-world datasets using four different classification algorithms and a clustering approach. Experiments on genomes of 219 ovarian, 61 lung and 929 breast cancer patients show that the proposed approaches automatically identify genomic biomarkers associated with cancer subtypes and clinical response to oncological treatment. Finally, we show that the extracted features can be used alongside unsupervised learning methods to analyze genomic samples.

Date of online publication


Pages (from - to)

4466 - 4473




Ministry points / journal


Impact Factor


This website uses cookies to remember the authenticated session of the user. For more information, read about Cookies and Privacy Policy.