Depending on the amount of data to process, file generation may take longer.

If it takes too long to generate, you can limit the data by, for example, reducing the range of years.

Article

Download file Download BibTeX

Title

Subpopulation discovery in epidemiological data with subspace clustering

Authors

Year of publication

2014

Published in

Foundations of Computing and Decision Sciences

Journal year: 2014 | Journal volume: vol. 39 | Journal number: no. 4

Article type

scientific article

Publication language

english

Abstract

EN A prerequisite of personalized medicine is the identification of groups of people who share specific risk factors towards an outcome. We investigate the potential of subspace clustering for finding such groups in epidemiological data. We propose a workflow that encompasses clusterability assessment before cluster discovery and quality assessment after learning the clusters. Epidemiological usually do not have a ground truth for the verification of clusters found in subspaces. Hence, we introduce quality assessment through juxtaposition of the learned models to “models-of-randomness”, i.e. models that do not reflect a true cluster structure. On the basis of this workflow, we select subspace clustering methods, compare and discuss their performance. We use a dataset with hepatic steatosis as outcome, but our findings apply on arbitrary epidemiological cohort data that have tenths of variables and exhibit class skew.

Pages (from - to)

271 - 300

DOI

10.2478/fcds-2014-0015

URL

https://www.sciendo.com/article/10.2478/fcds-2014-0015

License type

CC BY-NC-ND (attribution - noncommercial - no derivatives)

Full text of article

Download file

Access level to full text

public

Ministry points / journal

15

This website uses cookies to remember the authenticated session of the user. For more information, read about Cookies and Privacy Policy.