Dataset statistics
Number of variables | 6 |
---|---|
Number of observations | 845810 |
Missing cells | 0 |
Missing cells (%) | 0.0% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 38.7 MiB |
Average record size in memory | 48.0 B |
Variable types
Categorical | 3 |
---|---|
Numeric | 3 |
Warnings
Gene has a high cardinality: 19670 distinct values | High cardinality |
Gene name has a high cardinality: 19651 distinct values | High cardinality |
TPM is highly correlated with pTPM | High correlation |
pTPM is highly correlated with TPM | High correlation |
TPM is highly skewed (γ1 = 73.13227558) | Skewed |
pTPM is highly skewed (γ1 = 71.26023242) | Skewed |
NX is highly skewed (γ1 = 41.08755053) | Skewed |
Gene is uniformly distributed | Uniform |
Gene name is uniformly distributed | Uniform |
Tissue is uniformly distributed | Uniform |
TPM has 168391 (19.9%) zeros | Zeros |
pTPM has 156534 (18.5%) zeros | Zeros |
NX has 144803 (17.1%) zeros | Zeros |
Reproduction
Analysis started | 2021-04-30 11:11:23.203440 |
---|---|
Analysis finished | 2021-04-30 11:11:43.978099 |
Duration | 20.77 seconds |
Software version | pandas-profiling v2.11.0 |
Download configuration | config.yaml |
Distinct | 19670 |
---|---|
Distinct (%) | 2.3% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 6.5 MiB |
ENSG00000132854 | 43 |
---|---|
ENSG00000183354 | 43 |
ENSG00000172782 | 43 |
ENSG00000141696 | 43 |
ENSG00000177283 | 43 |
Other values (19665) |
Length
Max length | 15 |
---|---|
Median length | 15 |
Mean length | 15 |
Min length | 15 |
Characters and Unicode
Total characters | 12687150 |
---|---|
Distinct characters | 14 |
Distinct categories | 2 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | ENSG00000000003 |
---|---|
2nd row | ENSG00000000003 |
3rd row | ENSG00000000003 |
4th row | ENSG00000000003 |
5th row | ENSG00000000003 |
Value | Count | Frequency (%) |
ENSG00000132854 | 43 | < 0.1% |
ENSG00000183354 | 43 | < 0.1% |
ENSG00000172782 | 43 | < 0.1% |
ENSG00000141696 | 43 | < 0.1% |
ENSG00000177283 | 43 | < 0.1% |
ENSG00000206052 | 43 | < 0.1% |
ENSG00000133247 | 43 | < 0.1% |
ENSG00000136999 | 43 | < 0.1% |
ENSG00000117152 | 43 | < 0.1% |
ENSG00000242110 | 43 | < 0.1% |
Other values (19660) | 845380 |
Histogram of lengths of the category
Value | Count | Frequency (%) |
ensg00000197345 | 43 | < 0.1% |
ensg00000180999 | 43 | < 0.1% |
ensg00000179431 | 43 | < 0.1% |
ensg00000137078 | 43 | < 0.1% |
ensg00000196743 | 43 | < 0.1% |
ensg00000113328 | 43 | < 0.1% |
ensg00000124196 | 43 | < 0.1% |
ensg00000206069 | 43 | < 0.1% |
ensg00000164199 | 43 | < 0.1% |
ensg00000172785 | 43 | < 0.1% |
Other values (19660) | 845380 |
Most occurring characters
Value | Count | Frequency (%) |
0 | 4749780 | |
1 | 1044728 | 8.2% |
E | 845810 | 6.7% |
N | 845810 | 6.7% |
S | 845810 | 6.7% |
G | 845810 | 6.7% |
2 | 507314 | 4.0% |
6 | 463239 | 3.7% |
8 | 448060 | 3.5% |
7 | 444276 | 3.5% |
Other values (4) | 1646513 | 13.0% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 9303910 | |
Uppercase Letter | 3383240 | 26.7% |
Most frequent character per category
Value | Count | Frequency (%) |
0 | 4749780 | |
1 | 1044728 | 11.2% |
2 | 507314 | 5.5% |
6 | 463239 | 5.0% |
8 | 448060 | 4.8% |
7 | 444276 | 4.8% |
4 | 430086 | 4.6% |
3 | 428581 | 4.6% |
5 | 415552 | 4.5% |
9 | 372294 | 4.0% |
Value | Count | Frequency (%) |
E | 845810 | |
N | 845810 | |
S | 845810 | |
G | 845810 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 9303910 | |
Latin | 3383240 | 26.7% |
Most frequent character per script
Value | Count | Frequency (%) |
0 | 4749780 | |
1 | 1044728 | 11.2% |
2 | 507314 | 5.5% |
6 | 463239 | 5.0% |
8 | 448060 | 4.8% |
7 | 444276 | 4.8% |
4 | 430086 | 4.6% |
3 | 428581 | 4.6% |
5 | 415552 | 4.5% |
9 | 372294 | 4.0% |
Value | Count | Frequency (%) |
E | 845810 | |
N | 845810 | |
S | 845810 | |
G | 845810 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 12687150 |
Most frequent character per block
Value | Count | Frequency (%) |
0 | 4749780 | |
1 | 1044728 | 8.2% |
E | 845810 | 6.7% |
N | 845810 | 6.7% |
S | 845810 | 6.7% |
G | 845810 | 6.7% |
2 | 507314 | 4.0% |
6 | 463239 | 3.7% |
8 | 448060 | 3.5% |
7 | 444276 | 3.5% |
Other values (4) | 1646513 | 13.0% |
Distinct | 19651 |
---|---|
Distinct (%) | 2.3% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 6.5 MiB |
PRSS50 | 86 |
---|---|
PDE11A | 86 |
DIABLO | 86 |
COG8 | 86 |
CCDC39 | 86 |
Other values (19646) |
Length
Max length | 15 |
---|---|
Median length | 5 |
Mean length | 5.55526182 |
Min length | 2 |
Characters and Unicode
Total characters | 4698696 |
---|---|
Distinct characters | 41 |
Distinct categories | 5 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | TSPAN6 |
---|---|
2nd row | TSPAN6 |
3rd row | TSPAN6 |
4th row | TSPAN6 |
5th row | TSPAN6 |
Value | Count | Frequency (%) |
PRSS50 | 86 | < 0.1% |
PDE11A | 86 | < 0.1% |
DIABLO | 86 | < 0.1% |
COG8 | 86 | < 0.1% |
CCDC39 | 86 | < 0.1% |
TBCE | 86 | < 0.1% |
ATXN7 | 86 | < 0.1% |
ALDOA | 86 | < 0.1% |
POLR2J3 | 86 | < 0.1% |
H2BFS | 86 | < 0.1% |
Other values (19641) | 844950 |
Histogram of lengths of the category
Value | Count | Frequency (%) |
igf2 | 86 | < 0.1% |
tmsb15b | 86 | < 0.1% |
txnrd3nb | 86 | < 0.1% |
hspa14 | 86 | < 0.1% |
fam212b | 86 | < 0.1% |
atxn7 | 86 | < 0.1% |
abcf2 | 86 | < 0.1% |
cog8 | 86 | < 0.1% |
ccdc39 | 86 | < 0.1% |
pde11a | 86 | < 0.1% |
Other values (19641) | 844950 |
Most occurring characters
Value | Count | Frequency (%) |
1 | 366188 | 7.8% |
A | 341076 | 7.3% |
C | 295625 | 6.3% |
P | 278855 | 5.9% |
R | 237274 | 5.0% |
2 | 236113 | 5.0% |
T | 196725 | 4.2% |
S | 195607 | 4.2% |
L | 190490 | 4.1% |
N | 178407 | 3.8% |
Other values (31) | 2182336 |
Most occurring categories
Value | Count | Frequency (%) |
Uppercase Letter | 3375672 | |
Decimal Number | 1250569 | 26.6% |
Lowercase Letter | 46956 | 1.0% |
Other Punctuation | 15953 | 0.3% |
Dash Punctuation | 9546 | 0.2% |
Most frequent character per category
Value | Count | Frequency (%) |
A | 341076 | 10.1% |
C | 295625 | 8.8% |
P | 278855 | 8.3% |
R | 237274 | 7.0% |
T | 196725 | 5.8% |
S | 195607 | 5.8% |
L | 190490 | 5.6% |
N | 178407 | 5.3% |
M | 171828 | 5.1% |
D | 170194 | 5.0% |
Other values (16) | 1119591 |
Value | Count | Frequency (%) |
1 | 366188 | |
2 | 236113 | |
3 | 147017 | |
4 | 107543 | 8.6% |
5 | 89612 | 7.2% |
6 | 73917 | 5.9% |
0 | 65145 | 5.2% |
7 | 62565 | 5.0% |
8 | 55900 | 4.5% |
9 | 46569 | 3.7% |
Value | Count | Frequency (%) |
o | 15652 | |
r | 15652 | |
f | 15652 |
Value | Count | Frequency (%) |
- | 9546 |
Value | Count | Frequency (%) |
. | 15953 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 3422628 | |
Common | 1276068 | 27.2% |
Most frequent character per script
Value | Count | Frequency (%) |
A | 341076 | 10.0% |
C | 295625 | 8.6% |
P | 278855 | 8.1% |
R | 237274 | 6.9% |
T | 196725 | 5.7% |
S | 195607 | 5.7% |
L | 190490 | 5.6% |
N | 178407 | 5.2% |
M | 171828 | 5.0% |
D | 170194 | 5.0% |
Other values (19) | 1166547 |
Value | Count | Frequency (%) |
1 | 366188 | |
2 | 236113 | |
3 | 147017 | |
4 | 107543 | 8.4% |
5 | 89612 | 7.0% |
6 | 73917 | 5.8% |
0 | 65145 | 5.1% |
7 | 62565 | 4.9% |
8 | 55900 | 4.4% |
9 | 46569 | 3.6% |
Other values (2) | 25499 | 2.0% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 4698696 |
Most frequent character per block
Value | Count | Frequency (%) |
1 | 366188 | 7.8% |
A | 341076 | 7.3% |
C | 295625 | 6.3% |
P | 278855 | 5.9% |
R | 237274 | 5.0% |
2 | 236113 | 5.0% |
T | 196725 | 4.2% |
S | 195607 | 4.2% |
L | 190490 | 4.1% |
N | 178407 | 3.8% |
Other values (31) | 2182336 |
Distinct | 43 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 6.5 MiB |
stomach | 19670 |
---|---|
appendix | 19670 |
lung | 19670 |
adipose tissue | 19670 |
epididymis | 19670 |
Other values (38) |
Length
Max length | 17 |
---|---|
Median length | 9 |
Mean length | 9.906976744 |
Min length | 4 |
Characters and Unicode
Total characters | 8379420 |
---|---|
Distinct characters | 30 |
Distinct categories | 5 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | adipose tissue |
---|---|
2nd row | adrenal gland |
3rd row | appendix |
4th row | B-cells |
5th row | bone marrow |
Value | Count | Frequency (%) |
stomach | 19670 | 2.3% |
appendix | 19670 | 2.3% |
lung | 19670 | 2.3% |
adipose tissue | 19670 | 2.3% |
epididymis | 19670 | 2.3% |
small intestine | 19670 | 2.3% |
monocytes | 19670 | 2.3% |
dendritic cells | 19670 | 2.3% |
gallbladder | 19670 | 2.3% |
testis | 19670 | 2.3% |
Other values (33) | 649110 |
Histogram of lengths of the category
Value | Count | Frequency (%) |
gland | 78680 | 6.7% |
muscle | 59010 | 5.0% |
parathyroid | 19670 | 1.7% |
epididymis | 19670 | 1.7% |
vesicle | 19670 | 1.7% |
tissue | 19670 | 1.7% |
bladder | 19670 | 1.7% |
smooth | 19670 | 1.7% |
small | 19670 | 1.7% |
lung | 19670 | 1.7% |
Other values (45) | 885150 |
Most occurring characters
Value | Count | Frequency (%) |
e | 944160 | 11.3% |
l | 747460 | 8.9% |
a | 668780 | 8.0% |
s | 609770 | 7.3% |
n | 531090 | 6.3% |
r | 511420 | 6.1% |
i | 491750 | 5.9% |
t | 472080 | 5.6% |
d | 432740 | 5.2% |
o | 432740 | 5.2% |
Other values (20) | 2537430 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 7887670 | |
Space Separator | 334390 | 4.0% |
Uppercase Letter | 78680 | 0.9% |
Dash Punctuation | 59010 | 0.7% |
Other Punctuation | 19670 | 0.2% |
Most frequent character per category
Value | Count | Frequency (%) |
e | 944160 | |
l | 747460 | 9.5% |
a | 668780 | 8.5% |
s | 609770 | 7.7% |
n | 531090 | 6.7% |
r | 511420 | 6.5% |
i | 491750 | 6.2% |
t | 472080 | 6.0% |
d | 432740 | 5.5% |
o | 432740 | 5.5% |
Other values (13) | 2045680 |
Value | Count | Frequency (%) |
B | 19670 | |
N | 19670 | |
K | 19670 | |
T | 19670 |
Value | Count | Frequency (%) |
334390 |
Value | Count | Frequency (%) |
- | 59010 |
Value | Count | Frequency (%) |
, | 19670 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 7966350 | |
Common | 413070 | 4.9% |
Most frequent character per script
Value | Count | Frequency (%) |
e | 944160 | |
l | 747460 | 9.4% |
a | 668780 | 8.4% |
s | 609770 | 7.7% |
n | 531090 | 6.7% |
r | 511420 | 6.4% |
i | 491750 | 6.2% |
t | 472080 | 5.9% |
d | 432740 | 5.4% |
o | 432740 | 5.4% |
Other values (17) | 2124360 |
Value | Count | Frequency (%) |
334390 | ||
- | 59010 | 14.3% |
, | 19670 | 4.8% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 8379420 |
Most frequent character per block
Value | Count | Frequency (%) |
e | 944160 | 11.3% |
l | 747460 | 8.9% |
a | 668780 | 8.0% |
s | 609770 | 7.3% |
n | 531090 | 6.3% |
r | 511420 | 6.1% |
i | 491750 | 5.9% |
t | 472080 | 5.6% |
d | 432740 | 5.2% |
o | 432740 | 5.2% |
Other values (20) | 2537430 |
Distinct | 10993 |
---|---|
Distinct (%) | 1.3% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 40.17736099 |
---|---|
Minimum | 0 |
Maximum | 105008.5 |
Zeros | 168391 |
Zeros (%) | 19.9% |
Memory size | 6.5 MiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 0 |
Q1 | 0.2 |
median | 4 |
Q3 | 17.3 |
95-th percentile | 89.2 |
Maximum | 105008.5 |
Range | 105008.5 |
Interquartile range (IQR) | 17.1 |
Descriptive statistics
Standard deviation | 551.1248134 |
---|---|
Coefficient of variation (CV) | 13.7172975 |
Kurtosis | 8332.463025 |
Mean | 40.17736099 |
Median Absolute Deviation (MAD) | 4 |
Skewness | 73.13227558 |
Sum | 33982413.7 |
Variance | 303738.5599 |
Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
0 | 168391 | 19.9% |
0.1 | 33328 | 3.9% |
0.2 | 21579 | 2.6% |
0.3 | 16337 | 1.9% |
0.4 | 13191 | 1.6% |
0.5 | 10937 | 1.3% |
0.6 | 9675 | 1.1% |
0.7 | 8686 | 1.0% |
0.8 | 7994 | 0.9% |
0.9 | 7333 | 0.9% |
Other values (10983) | 548359 |
Value | Count | Frequency (%) |
0 | 168391 | |
0.1 | 33328 | 3.9% |
0.2 | 21579 | 2.6% |
0.3 | 16337 | 1.9% |
0.4 | 13191 | 1.6% |
Value | Count | Frequency (%) |
105008.5 | 1 | |
93205.5 | 1 | |
85341.6 | 1 | |
81278.3 | 1 | |
79765.2 | 1 |
Distinct | 12507 |
---|---|
Distinct (%) | 1.5% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 52.58360885 |
---|---|
Minimum | 0 |
Maximum | 130893 |
Zeros | 156534 |
Zeros (%) | 18.5% |
Memory size | 6.5 MiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 0 |
Q1 | 0.3 |
median | 5.3 |
Q3 | 22.9 |
95-th percentile | 117.5 |
Maximum | 130893 |
Range | 130893 |
Interquartile range (IQR) | 22.6 |
Descriptive statistics
Standard deviation | 698.6151603 |
---|---|
Coefficient of variation (CV) | 13.28579714 |
Kurtosis | 8085.80981 |
Mean | 52.58360885 |
Median Absolute Deviation (MAD) | 5.3 |
Skewness | 71.26023242 |
Sum | 44475742.2 |
Variance | 488063.1422 |
Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
0 | 156534 | 18.5% |
0.1 | 31311 | 3.7% |
0.2 | 20680 | 2.4% |
0.3 | 15443 | 1.8% |
0.4 | 12782 | 1.5% |
0.5 | 10540 | 1.2% |
0.6 | 9155 | 1.1% |
0.7 | 8198 | 1.0% |
0.8 | 7436 | 0.9% |
0.9 | 6807 | 0.8% |
Other values (12497) | 566924 |
Value | Count | Frequency (%) |
0 | 156534 | |
0.1 | 31311 | 3.7% |
0.2 | 20680 | 2.4% |
0.3 | 15443 | 1.8% |
0.4 | 12782 | 1.5% |
Value | Count | Frequency (%) |
130893 | 1 | |
118266.5 | 1 | |
115132.4 | 1 | |
106351.6 | 1 | |
102991.6 | 1 |
Distinct | 2506 |
---|---|
Distinct (%) | 0.3% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 9.227443279 |
---|---|
Minimum | 0 |
Maximum | 3646.7 |
Zeros | 144803 |
Zeros (%) | 17.1% |
Memory size | 6.5 MiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 0 |
Q1 | 0.4 |
median | 4.3 |
Q3 | 12.3 |
95-th percentile | 31.3 |
Maximum | 3646.7 |
Range | 3646.7 |
Interquartile range (IQR) | 11.9 |
Descriptive statistics
Standard deviation | 21.11759345 |
---|---|
Coefficient of variation (CV) | 2.288563886 |
Kurtosis | 4194.428102 |
Mean | 9.227443279 |
Median Absolute Deviation (MAD) | 4.3 |
Skewness | 41.08755053 |
Sum | 7804663.8 |
Variance | 445.9527531 |
Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
0 | 144803 | 17.1% |
0.1 | 22776 | 2.7% |
0.2 | 17337 | 2.0% |
0.3 | 14591 | 1.7% |
0.4 | 12992 | 1.5% |
0.5 | 11714 | 1.4% |
0.6 | 10842 | 1.3% |
0.7 | 10129 | 1.2% |
0.8 | 9266 | 1.1% |
0.9 | 9021 | 1.1% |
Other values (2496) | 582339 |
Value | Count | Frequency (%) |
0 | 144803 | |
0.1 | 22776 | 2.7% |
0.2 | 17337 | 2.0% |
0.3 | 14591 | 1.7% |
0.4 | 12992 | 1.5% |
Value | Count | Frequency (%) |
3646.7 | 1 | |
3207.8 | 1 | |
3057.4 | 1 | |
2541.2 | 1 | |
2537.1 | 1 |
A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
Gene | Gene name | Tissue | TPM | pTPM | NX | |
---|---|---|---|---|---|---|
0 | ENSG00000000003 | TSPAN6 | adipose tissue | 31.5 | 37.7 | 9.8 |
1 | ENSG00000000003 | TSPAN6 | adrenal gland | 26.4 | 32.7 | 7.6 |
2 | ENSG00000000003 | TSPAN6 | appendix | 9.2 | 14.5 | 2.1 |
3 | ENSG00000000003 | TSPAN6 | B-cells | 0.1 | 0.2 | 0.3 |
4 | ENSG00000000003 | TSPAN6 | bone marrow | 0.7 | 0.9 | 0.1 |
5 | ENSG00000000003 | TSPAN6 | breast | 53.3 | 68.1 | 16.2 |
6 | ENSG00000000003 | TSPAN6 | cerebral cortex | 18.5 | 22.8 | 4.1 |
7 | ENSG00000000003 | TSPAN6 | cervix, uterine | 54.2 | 66.0 | 14.2 |
8 | ENSG00000000003 | TSPAN6 | colon | 48.5 | 70.6 | 17.8 |
9 | ENSG00000000003 | TSPAN6 | dendritic cells | 0.1 | 0.1 | 0.2 |
Last rows
Gene | Gene name | Tissue | TPM | pTPM | NX | |
---|---|---|---|---|---|---|
845800 | ENSG00000285509 | AP000646.1 | skin | 0.0 | 0.0 | 0.0 |
845801 | ENSG00000285509 | AP000646.1 | small intestine | 0.0 | 0.0 | 0.0 |
845802 | ENSG00000285509 | AP000646.1 | smooth muscle | 0.2 | 0.2 | 0.2 |
845803 | ENSG00000285509 | AP000646.1 | spleen | 0.1 | 0.1 | 0.0 |
845804 | ENSG00000285509 | AP000646.1 | stomach | 0.0 | 0.0 | 0.0 |
845805 | ENSG00000285509 | AP000646.1 | T-cells | 0.0 | 0.0 | 0.2 |
845806 | ENSG00000285509 | AP000646.1 | testis | 1.0 | 1.3 | 1.7 |
845807 | ENSG00000285509 | AP000646.1 | thyroid gland | 0.1 | 0.1 | 0.0 |
845808 | ENSG00000285509 | AP000646.1 | tonsil | 0.0 | 0.0 | 0.0 |
845809 | ENSG00000285509 | AP000646.1 | urinary bladder | 0.2 | 0.4 | 0.5 |