Dataset statistics
| Number of variables | 6 |
|---|---|
| Number of observations | 1357230 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 62.1 MiB |
| Average record size in memory | 48.0 B |
Variable types
| Categorical | 3 |
|---|---|
| Numeric | 3 |
Gene has a high cardinality: 19670 distinct values | High cardinality |
Gene name has a high cardinality: 19651 distinct values | High cardinality |
Cell line has a high cardinality: 69 distinct values | High cardinality |
TPM is highly correlated with pTPM | High correlation |
pTPM is highly correlated with TPM | High correlation |
TPM is highly skewed (γ1 = 32.03325779) | Skewed |
pTPM is highly skewed (γ1 = 31.89212424) | Skewed |
Gene is uniformly distributed | Uniform |
Gene name is uniformly distributed | Uniform |
Cell line is uniformly distributed | Uniform |
TPM has 388709 (28.6%) zeros | Zeros |
pTPM has 371434 (27.4%) zeros | Zeros |
NX has 374658 (27.6%) zeros | Zeros |
Reproduction
| Analysis started | 2021-04-30 11:01:36.755441 |
|---|---|
| Analysis finished | 2021-04-30 11:02:07.814285 |
| Duration | 31.06 seconds |
| Software version | pandas-profiling v2.11.0 |
| Download configuration | config.yaml |
| Distinct | 19670 |
|---|---|
| Distinct (%) | 1.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 10.4 MiB |
| ENSG00000143171 | 69 |
|---|---|
| ENSG00000284546 | 69 |
| ENSG00000148308 | 69 |
| ENSG00000172568 | 69 |
| ENSG00000080815 | 69 |
| Other values (19665) |
Length
| Max length | 15 |
|---|---|
| Median length | 15 |
| Mean length | 15 |
| Min length | 15 |
Characters and Unicode
| Total characters | 20358450 |
|---|---|
| Distinct characters | 14 |
| Distinct categories | 2 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | ENSG00000000003 |
|---|---|
| 2nd row | ENSG00000000003 |
| 3rd row | ENSG00000000003 |
| 4th row | ENSG00000000003 |
| 5th row | ENSG00000000003 |
| Value | Count | Frequency (%) |
| ENSG00000143171 | 69 | < 0.1% |
| ENSG00000284546 | 69 | < 0.1% |
| ENSG00000148308 | 69 | < 0.1% |
| ENSG00000172568 | 69 | < 0.1% |
| ENSG00000080815 | 69 | < 0.1% |
| ENSG00000239998 | 69 | < 0.1% |
| ENSG00000198695 | 69 | < 0.1% |
| ENSG00000205089 | 69 | < 0.1% |
| ENSG00000278299 | 69 | < 0.1% |
| ENSG00000168944 | 69 | < 0.1% |
| Other values (19660) | 1356540 |
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| ensg00000197345 | 69 | < 0.1% |
| ensg00000188820 | 69 | < 0.1% |
| ensg00000236334 | 69 | < 0.1% |
| ensg00000205639 | 69 | < 0.1% |
| ensg00000149596 | 69 | < 0.1% |
| ensg00000161905 | 69 | < 0.1% |
| ensg00000081377 | 69 | < 0.1% |
| ensg00000256771 | 69 | < 0.1% |
| ensg00000268750 | 69 | < 0.1% |
| ensg00000179088 | 69 | < 0.1% |
| Other values (19660) | 1356540 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 7621740 | |
| 1 | 1676424 | 8.2% |
| E | 1357230 | 6.7% |
| N | 1357230 | 6.7% |
| S | 1357230 | 6.7% |
| G | 1357230 | 6.7% |
| 2 | 814062 | 4.0% |
| 6 | 743337 | 3.7% |
| 8 | 718980 | 3.5% |
| 7 | 712908 | 3.5% |
| Other values (4) | 2642079 | 13.0% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 14929530 | |
| Uppercase Letter | 5428920 | 26.7% |
Most frequent character per category
| Value | Count | Frequency (%) |
| 0 | 7621740 | |
| 1 | 1676424 | 11.2% |
| 2 | 814062 | 5.5% |
| 6 | 743337 | 5.0% |
| 8 | 718980 | 4.8% |
| 7 | 712908 | 4.8% |
| 4 | 690138 | 4.6% |
| 3 | 687723 | 4.6% |
| 5 | 666816 | 4.5% |
| 9 | 597402 | 4.0% |
| Value | Count | Frequency (%) |
| E | 1357230 | |
| N | 1357230 | |
| S | 1357230 | |
| G | 1357230 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 14929530 | |
| Latin | 5428920 | 26.7% |
Most frequent character per script
| Value | Count | Frequency (%) |
| 0 | 7621740 | |
| 1 | 1676424 | 11.2% |
| 2 | 814062 | 5.5% |
| 6 | 743337 | 5.0% |
| 8 | 718980 | 4.8% |
| 7 | 712908 | 4.8% |
| 4 | 690138 | 4.6% |
| 3 | 687723 | 4.6% |
| 5 | 666816 | 4.5% |
| 9 | 597402 | 4.0% |
| Value | Count | Frequency (%) |
| E | 1357230 | |
| N | 1357230 | |
| S | 1357230 | |
| G | 1357230 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 20358450 |
Most frequent character per block
| Value | Count | Frequency (%) |
| 0 | 7621740 | |
| 1 | 1676424 | 8.2% |
| E | 1357230 | 6.7% |
| N | 1357230 | 6.7% |
| S | 1357230 | 6.7% |
| G | 1357230 | 6.7% |
| 2 | 814062 | 4.0% |
| 6 | 743337 | 3.7% |
| 8 | 718980 | 3.5% |
| 7 | 712908 | 3.5% |
| Other values (4) | 2642079 | 13.0% |
| Distinct | 19651 |
|---|---|
| Distinct (%) | 1.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 10.4 MiB |
| ABCF2 | 138 |
|---|---|
| DIABLO | 138 |
| SCO2 | 138 |
| POLR2J3 | 138 |
| CCDC39 | 138 |
| Other values (19646) |
Length
| Max length | 15 |
|---|---|
| Median length | 5 |
| Mean length | 5.55526182 |
| Min length | 2 |
Characters and Unicode
| Total characters | 7539768 |
|---|---|
| Distinct characters | 41 |
| Distinct categories | 5 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | TSPAN6 |
|---|---|
| 2nd row | TSPAN6 |
| 3rd row | TSPAN6 |
| 4th row | TSPAN6 |
| 5th row | TSPAN6 |
| Value | Count | Frequency (%) |
| ABCF2 | 138 | < 0.1% |
| DIABLO | 138 | < 0.1% |
| SCO2 | 138 | < 0.1% |
| POLR2J3 | 138 | < 0.1% |
| CCDC39 | 138 | < 0.1% |
| COG8 | 138 | < 0.1% |
| IGF2 | 138 | < 0.1% |
| ALDOA | 138 | < 0.1% |
| SOD2 | 138 | < 0.1% |
| ATXN7 | 138 | < 0.1% |
| Other values (19641) | 1355850 |
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| ccdc39 | 138 | < 0.1% |
| sco2 | 138 | < 0.1% |
| diablo | 138 | < 0.1% |
| tbce | 138 | < 0.1% |
| txnrd3nb | 138 | < 0.1% |
| pde11a | 138 | < 0.1% |
| atxn7 | 138 | < 0.1% |
| igf2 | 138 | < 0.1% |
| aldoa | 138 | < 0.1% |
| h2bfs | 138 | < 0.1% |
| Other values (19641) | 1355850 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 587604 | 7.8% |
| A | 547308 | 7.3% |
| C | 474375 | 6.3% |
| P | 447465 | 5.9% |
| R | 380742 | 5.0% |
| 2 | 378879 | 5.0% |
| T | 315675 | 4.2% |
| S | 313881 | 4.2% |
| L | 305670 | 4.1% |
| N | 286281 | 3.8% |
| Other values (31) | 3501888 |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 5416776 | |
| Decimal Number | 2006727 | 26.6% |
| Lowercase Letter | 75348 | 1.0% |
| Other Punctuation | 25599 | 0.3% |
| Dash Punctuation | 15318 | 0.2% |
Most frequent character per category
| Value | Count | Frequency (%) |
| A | 547308 | 10.1% |
| C | 474375 | 8.8% |
| P | 447465 | 8.3% |
| R | 380742 | 7.0% |
| T | 315675 | 5.8% |
| S | 313881 | 5.8% |
| L | 305670 | 5.6% |
| N | 286281 | 5.3% |
| M | 275724 | 5.1% |
| D | 273102 | 5.0% |
| Other values (16) | 1796553 |
| Value | Count | Frequency (%) |
| 1 | 587604 | |
| 2 | 378879 | |
| 3 | 235911 | |
| 4 | 172569 | 8.6% |
| 5 | 143796 | 7.2% |
| 6 | 118611 | 5.9% |
| 0 | 104535 | 5.2% |
| 7 | 100395 | 5.0% |
| 8 | 89700 | 4.5% |
| 9 | 74727 | 3.7% |
| Value | Count | Frequency (%) |
| o | 25116 | |
| r | 25116 | |
| f | 25116 |
| Value | Count | Frequency (%) |
| - | 15318 |
| Value | Count | Frequency (%) |
| . | 25599 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 5492124 | |
| Common | 2047644 | 27.2% |
Most frequent character per script
| Value | Count | Frequency (%) |
| A | 547308 | 10.0% |
| C | 474375 | 8.6% |
| P | 447465 | 8.1% |
| R | 380742 | 6.9% |
| T | 315675 | 5.7% |
| S | 313881 | 5.7% |
| L | 305670 | 5.6% |
| N | 286281 | 5.2% |
| M | 275724 | 5.0% |
| D | 273102 | 5.0% |
| Other values (19) | 1871901 |
| Value | Count | Frequency (%) |
| 1 | 587604 | |
| 2 | 378879 | |
| 3 | 235911 | |
| 4 | 172569 | 8.4% |
| 5 | 143796 | 7.0% |
| 6 | 118611 | 5.8% |
| 0 | 104535 | 5.1% |
| 7 | 100395 | 4.9% |
| 8 | 89700 | 4.4% |
| 9 | 74727 | 3.6% |
| Other values (2) | 40917 | 2.0% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 7539768 |
Most frequent character per block
| Value | Count | Frequency (%) |
| 1 | 587604 | 7.8% |
| A | 547308 | 7.3% |
| C | 474375 | 6.3% |
| P | 447465 | 5.9% |
| R | 380742 | 5.0% |
| 2 | 378879 | 5.0% |
| T | 315675 | 4.2% |
| S | 313881 | 4.2% |
| L | 305670 | 4.1% |
| N | 286281 | 3.8% |
| Other values (31) | 3501888 |
| Distinct | 69 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 10.4 MiB |
| U-251 MG | 19670 |
|---|---|
| SiHa | 19670 |
| HAP1 | 19670 |
| AN3-CA | 19670 |
| U-2197 | 19670 |
| Other values (64) |
Length
| Max length | 31 |
|---|---|
| Median length | 6 |
| Mean length | 6.942028986 |
| Min length | 2 |
Characters and Unicode
| Total characters | 9421930 |
|---|---|
| Distinct characters | 50 |
| Distinct categories | 7 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | A-431 |
|---|---|
| 2nd row | A549 |
| 3rd row | AF22 |
| 4th row | AN3-CA |
| 5th row | ASC diff |
| Value | Count | Frequency (%) |
| U-251 MG | 19670 | 1.4% |
| SiHa | 19670 | 1.4% |
| HAP1 | 19670 | 1.4% |
| AN3-CA | 19670 | 1.4% |
| U-2197 | 19670 | 1.4% |
| MOLT-4 | 19670 | 1.4% |
| BJ hTERT+ | 19670 | 1.4% |
| ASC TERT1 | 19670 | 1.4% |
| EFO-21 | 19670 | 1.4% |
| HaCaT | 19670 | 1.4% |
| Other values (59) | 1160530 |
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| bj | 78680 | 4.4% |
| mg | 59010 | 3.3% |
| htert | 59010 | 3.3% |
| sv40 | 39340 | 2.2% |
| large | 39340 | 2.2% |
| t | 39340 | 2.2% |
| asc | 39340 | 2.2% |
| tert1 | 39340 | 2.2% |
| htcepi | 19670 | 1.1% |
| u-2 | 19670 | 1.1% |
| Other values (68) | 1337560 |
Most occurring characters
| Value | Count | Frequency (%) |
| - | 767130 | 8.1% |
| T | 708120 | 7.5% |
| E | 531090 | 5.6% |
| H | 472080 | 5.0% |
| 2 | 432740 | 4.6% |
| 413070 | 4.4% | |
| R | 413070 | 4.4% |
| C | 393400 | 4.2% |
| 1 | 354060 | 3.8% |
| S | 314720 | 3.3% |
| Other values (40) | 4622450 |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 5114200 | |
| Decimal Number | 2026010 | 21.5% |
| Lowercase Letter | 924490 | 9.8% |
| Dash Punctuation | 767130 | 8.1% |
| Space Separator | 413070 | 4.4% |
| Math Symbol | 98350 | 1.0% |
| Other Punctuation | 78680 | 0.8% |
Most frequent character per category
| Value | Count | Frequency (%) |
| T | 708120 | |
| E | 531090 | |
| H | 472080 | 9.2% |
| R | 413070 | 8.1% |
| C | 393400 | 7.7% |
| S | 314720 | 6.2% |
| M | 295050 | 5.8% |
| A | 275380 | 5.4% |
| U | 216370 | 4.2% |
| B | 196700 | 3.8% |
| Other values (13) | 1298220 |
| Value | Count | Frequency (%) |
| a | 216370 | |
| h | 137690 | |
| e | 98350 | |
| i | 78680 | 8.5% |
| d | 59010 | 6.4% |
| f | 59010 | 6.4% |
| r | 59010 | 6.4% |
| p | 59010 | 6.4% |
| g | 39340 | 4.3% |
| s | 39340 | 4.3% |
| Other values (3) | 78680 | 8.5% |
| Value | Count | Frequency (%) |
| 2 | 432740 | |
| 1 | 354060 | |
| 4 | 196700 | |
| 3 | 196700 | |
| 6 | 196700 | |
| 7 | 157360 | 7.8% |
| 0 | 137690 | 6.8% |
| 8 | 137690 | 6.8% |
| 9 | 118020 | 5.8% |
| 5 | 98350 | 4.9% |
| Value | Count | Frequency (%) |
| - | 767130 |
| Value | Count | Frequency (%) |
| 413070 |
| Value | Count | Frequency (%) |
| + | 98350 |
| Value | Count | Frequency (%) |
| / | 78680 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 6038690 | |
| Common | 3383240 |
Most frequent character per script
| Value | Count | Frequency (%) |
| T | 708120 | 11.7% |
| E | 531090 | 8.8% |
| H | 472080 | 7.8% |
| R | 413070 | 6.8% |
| C | 393400 | 6.5% |
| S | 314720 | 5.2% |
| M | 295050 | 4.9% |
| A | 275380 | 4.6% |
| a | 216370 | 3.6% |
| U | 216370 | 3.6% |
| Other values (26) | 2203040 |
| Value | Count | Frequency (%) |
| - | 767130 | |
| 2 | 432740 | |
| 413070 | ||
| 1 | 354060 | |
| 4 | 196700 | 5.8% |
| 3 | 196700 | 5.8% |
| 6 | 196700 | 5.8% |
| 7 | 157360 | 4.7% |
| 0 | 137690 | 4.1% |
| 8 | 137690 | 4.1% |
| Other values (4) | 393400 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 9421930 |
Most frequent character per block
| Value | Count | Frequency (%) |
| - | 767130 | 8.1% |
| T | 708120 | 7.5% |
| E | 531090 | 5.6% |
| H | 472080 | 5.0% |
| 2 | 432740 | 4.6% |
| 413070 | 4.4% | |
| R | 413070 | 4.4% |
| C | 393400 | 4.2% |
| 1 | 354060 | 3.8% |
| S | 314720 | 3.3% |
| Other values (40) | 4622450 |
| Distinct | 15670 |
|---|---|
| Distinct (%) | 1.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 41.02075669 |
|---|---|
| Minimum | 0 |
| Maximum | 42716.3 |
| Zeros | 388709 |
| Zeros (%) | 28.6% |
| Memory size | 10.4 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 3.4 |
| Q3 | 21.7 |
| 95-th percentile | 122.5 |
| Maximum | 42716.3 |
| Range | 42716.3 |
| Interquartile range (IQR) | 21.7 |
Descriptive statistics
| Standard deviation | 281.4168207 |
|---|---|
| Coefficient of variation (CV) | 6.860351768 |
| Kurtosis | 1981.124696 |
| Mean | 41.02075669 |
| Median Absolute Deviation (MAD) | 3.4 |
| Skewness | 32.03325779 |
| Sum | 55674601.6 |
| Variance | 79195.42696 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 388709 | |
| 0.1 | 55925 | 4.1% |
| 0.2 | 32537 | 2.4% |
| 0.3 | 22283 | 1.6% |
| 0.4 | 16956 | 1.2% |
| 0.5 | 13525 | 1.0% |
| 0.6 | 11470 | 0.8% |
| 0.7 | 9903 | 0.7% |
| 0.8 | 8899 | 0.7% |
| 0.9 | 7922 | 0.6% |
| Other values (15660) | 789101 |
| Value | Count | Frequency (%) |
| 0 | 388709 | |
| 0.1 | 55925 | 4.1% |
| 0.2 | 32537 | 2.4% |
| 0.3 | 22283 | 1.6% |
| 0.4 | 16956 | 1.2% |
| Value | Count | Frequency (%) |
| 42716.3 | 1 | |
| 39185.3 | 1 | |
| 28687.5 | 1 | |
| 28333.9 | 1 | |
| 27057.4 | 1 |
| Distinct | 17361 |
|---|---|
| Distinct (%) | 1.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 50.79897158 |
|---|---|
| Minimum | 0 |
| Maximum | 53349.2 |
| Zeros | 371434 |
| Zeros (%) | 27.4% |
| Memory size | 10.4 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 4.2 |
| Q3 | 26.9 |
| 95-th percentile | 151.9 |
| Maximum | 53349.2 |
| Range | 53349.2 |
| Interquartile range (IQR) | 26.9 |
Descriptive statistics
| Standard deviation | 347.020905 |
|---|---|
| Coefficient of variation (CV) | 6.831258473 |
| Kurtosis | 1988.872797 |
| Mean | 50.79897158 |
| Median Absolute Deviation (MAD) | 4.2 |
| Skewness | 31.89212424 |
| Sum | 68945888.2 |
| Variance | 120423.5085 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 371434 | |
| 0.1 | 55282 | 4.1% |
| 0.2 | 32796 | 2.4% |
| 0.3 | 23006 | 1.7% |
| 0.4 | 17234 | 1.3% |
| 0.5 | 13998 | 1.0% |
| 0.6 | 11399 | 0.8% |
| 0.7 | 10041 | 0.7% |
| 0.8 | 8760 | 0.6% |
| 0.9 | 7885 | 0.6% |
| Other values (17351) | 805395 |
| Value | Count | Frequency (%) |
| 0 | 371434 | |
| 0.1 | 55282 | 4.1% |
| 0.2 | 32796 | 2.4% |
| 0.3 | 23006 | 1.7% |
| 0.4 | 17234 | 1.3% |
| Value | Count | Frequency (%) |
| 53349.2 | 1 | |
| 48942.7 | 1 | |
| 35831.9 | 1 | |
| 34597.9 | 1 | |
| 33796.9 | 1 |
| Distinct | 2757 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 8.447667897 |
|---|---|
| Minimum | 0 |
| Maximum | 1186.6 |
| Zeros | 374658 |
| Zeros (%) | 27.6% |
| Memory size | 10.4 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 2.6 |
| Q3 | 11.1 |
| 95-th percentile | 32.2 |
| Maximum | 1186.6 |
| Range | 1186.6 |
| Interquartile range (IQR) | 11.1 |
Descriptive statistics
| Standard deviation | 16.73002829 |
|---|---|
| Coefficient of variation (CV) | 1.980431581 |
| Kurtosis | 177.0465664 |
| Mean | 8.447667897 |
| Median Absolute Deviation (MAD) | 2.6 |
| Skewness | 8.389758854 |
| Sum | 11465428.3 |
| Variance | 279.8938467 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 374658 | |
| 0.1 | 52810 | 3.9% |
| 0.2 | 33142 | 2.4% |
| 0.3 | 24551 | 1.8% |
| 0.4 | 19423 | 1.4% |
| 0.5 | 16181 | 1.2% |
| 0.6 | 13963 | 1.0% |
| 0.7 | 12342 | 0.9% |
| 0.8 | 11089 | 0.8% |
| 0.9 | 9967 | 0.7% |
| Other values (2747) | 789104 |
| Value | Count | Frequency (%) |
| 0 | 374658 | |
| 0.1 | 52810 | 3.9% |
| 0.2 | 33142 | 2.4% |
| 0.3 | 24551 | 1.8% |
| 0.4 | 19423 | 1.4% |
| Value | Count | Frequency (%) |
| 1186.6 | 1 | |
| 1073.7 | 1 | |
| 959.6 | 1 | |
| 885.9 | 1 | |
| 872.7 | 1 |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| Gene | Gene name | Cell line | TPM | pTPM | NX | |
|---|---|---|---|---|---|---|
| 0 | ENSG00000000003 | TSPAN6 | A-431 | 27.8 | 33.9 | 7.9 |
| 1 | ENSG00000000003 | TSPAN6 | A549 | 37.6 | 45.5 | 10.6 |
| 2 | ENSG00000000003 | TSPAN6 | AF22 | 108.1 | 134.5 | 28.7 |
| 3 | ENSG00000000003 | TSPAN6 | AN3-CA | 51.8 | 64.4 | 14.5 |
| 4 | ENSG00000000003 | TSPAN6 | ASC diff | 32.3 | 37.4 | 12.6 |
| 5 | ENSG00000000003 | TSPAN6 | ASC TERT1 | 17.7 | 20.8 | 6.8 |
| 6 | ENSG00000000003 | TSPAN6 | BEWO | 42.7 | 53.5 | 11.6 |
| 7 | ENSG00000000003 | TSPAN6 | BJ | 14.9 | 18.4 | 4.5 |
| 8 | ENSG00000000003 | TSPAN6 | BJ hTERT+ | 22.3 | 26.5 | 7.1 |
| 9 | ENSG00000000003 | TSPAN6 | BJ hTERT+ SV40 Large T+ | 31.8 | 38.4 | 9.0 |
Last rows
| Gene | Gene name | Cell line | TPM | pTPM | NX | |
|---|---|---|---|---|---|---|
| 1357220 | ENSG00000285509 | AP000646.1 | U-138 MG | 0.6 | 0.7 | 2.0 |
| 1357221 | ENSG00000285509 | AP000646.1 | U-2 OS | 0.0 | 0.0 | 0.2 |
| 1357222 | ENSG00000285509 | AP000646.1 | U-2197 | 0.1 | 0.1 | 0.5 |
| 1357223 | ENSG00000285509 | AP000646.1 | U-251 MG | 0.2 | 0.3 | 0.9 |
| 1357224 | ENSG00000285509 | AP000646.1 | U-266/70 | 0.0 | 0.1 | 0.2 |
| 1357225 | ENSG00000285509 | AP000646.1 | U-266/84 | 0.0 | 0.0 | 0.1 |
| 1357226 | ENSG00000285509 | AP000646.1 | U-698 | 0.0 | 0.0 | 0.1 |
| 1357227 | ENSG00000285509 | AP000646.1 | U-87 MG | 0.0 | 0.0 | 0.1 |
| 1357228 | ENSG00000285509 | AP000646.1 | U-937 | 0.0 | 0.0 | 0.0 |
| 1357229 | ENSG00000285509 | AP000646.1 | WM-115 | 0.1 | 0.1 | 0.4 |