Get started

Genotype and phenotype data

The evolution of molecular biology analytical technology and the explosive growth of relevant data are anticipating a new era for the understanding of live phenomena. In the future, it will be more important to integrate the vast amount of accumulated data and find meaning in it.

High-density SNP (Single nucleotide polymorphism) array or Next-generation sequencing (NGS) are changing the concept of classical "genotyping" itself. You can obtain the entire genetic information of an individual and a group by identifying tens, hundreds of thousands of locus, and then determining the genome sequence of the entire entity, rather than identifying several genetic loci. This high-resolution genetic information allows you to genetically describe the characteristics of a particular individual.

If you know the high-resolution genetic information of a large number of individuals and their phenotypic information, you can check the genetic loci that represent the phenotype by statistical methods. This is referred to as the Genome-wide association study (GWAS), and a number of studies have been carried out to determine which parts of the genome are expressed to phenotype. Through these studies, we will understand how the genome functionally works and enable a variety of applications.

Not just GWAS. If there is enough phenotypic information associated with the genotypes, the genetic cause of the complex trait can be identified through machine learning, etc., or the phenotype can be predicted systemically. In addition, various analytical techniques such as understanding the genetic characteristics of a population or confirming the kinship relationship in the absence of pedigree information are rapidly being introduced and the application fields are being broadened.

Figure 1. High-density SNP array (Illumina CanineHD 170K) and NGS machine (Illumina HiSeq)

Figure 2. Genotype-phenotype-environment relationship from Borevitz Lab

Figure 3. Example of classification using various machine learning algorithms

What is IncoGWAS?

IncoGWAS is a genotype-phenotype data integration research platform. On this platform, researchers can accumulate, integrate, and perform various association analyzes of their genotype-phenotype data. Anyone can create an account, register their data here and analyze it. If you have any good data to disclose, you can share it for public use.
Data registration and analysis in the IncoGWAS research platform consists of three steps: "Web Folder -> Data Seat -> App".

  • Web folder

    You can freely upload the files of the researcher's PC like web hard. Freely create directories, upload, rename, and delete files.

  • Dataset

    Use the dataset registration function to specify a genotype file and a phenotype file in a web folder. Supports file formats such as Excel (XSLX), PLINK (PED), and VEF. After performing the group assignment function, various analyzes are performed in data set units. You can merge multiple datasets using the dataset merge feature, and if you like, you can share it for public use.

  • App

    IncoGWAS provides a variety of analysis modules in the form of Apps. Performs specific analysis for a specific dataset and visualizes the results on a web interface. Researchers can also share the results of specific App externally.

Figure 4. Process of IncoGWAS from data upload to analysis

Dataset registration requires experimental platform information to provide genotype information. It may be a commercial high-density SNP chip or NGS. Experimental platform information is registered directly by the IncoGWAS development team. If you have a platform you want to use, please contact the development team directly. IncoGWAS targets all species and can be used immediately if the platform is registered in the GEO database.

IncoGWAS features

  • Data accumulation

    Researchers are better able to manage their experimental results in “Dataset" units. If 1000 samples are obtained in the first experiment and 1000 samples are added in the second experiment, the first experiment and the second experiment are made into different Datasets, respectively. Later, using the “Dataset merge” feature, you can perform a integrated analysis with the first and second merged data sets. High-resolution genotyping information is systematically managed as a relational database, and can be conveniently searched or browsed using the web interface. Accumulate data from IncoGWAS. Ensure reliable management and data backup. And, share data from IncoGWAS. Shared data can be gathered to discover new knowledge.

  • Convenient Web Folder

    IncoGWAS provides a free web folder with a defined capacity for each account. Create a directory and upload files freely. It can also be used as storage space for large amount of data.

  • GWAS Analysis

    GWAS analysis of large-scale genotypic and phenotypic information on researchers' PCs is a hassle. R, Python, etc., we had to refine the data, perform statistical analysis directly, or modify it directly into the input format of the PLINK program. IncoGWAS performs GWAS analysis with a few clicks. Analysis results can also be stored in a relational database for easy retrieval and browsing in the web environment.

  • Population Genetic Characterization

    If you have high-resolution genetic information data for each group, you can see how similar the groups are and what characteristics they have across the entire population. IncoGWAS uses a variety of statistical analysis methods to analyze population genetic characteristics.

  • Phenotype Prediction (Breeding Value Estimation)

    Genotype-phenotype information can be machine-learned to predict the phenotype for any genotype. Various machine learning algorithms can be used to check the accuracy of each algorithm and to set an optimal prediction model. You can also estimate the breeding value using the genomic best linear unbiased prediction (gBLUP) method, which is used to estimate the genome-based breeding value in the field of breeding.

Figure 5. Concept of dataset in IncoGWAS

Figure 6. General visualization of GWAS - Manhattan plot

Figure 7. Visualization of ethnic genetic differences by PCA