Overview

Dataset statistics

Number of variables6
Number of observations845810
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory38.7 MiB
Average record size in memory48.0 B

Variable types

Categorical3
Numeric3

Warnings

Gene has a high cardinality: 19670 distinct values High cardinality
Gene name has a high cardinality: 19651 distinct values High cardinality
TPM is highly correlated with pTPMHigh correlation
pTPM is highly correlated with TPMHigh correlation
TPM is highly skewed (γ1 = 73.13227558) Skewed
pTPM is highly skewed (γ1 = 71.26023242) Skewed
NX is highly skewed (γ1 = 41.08755053) Skewed
Gene is uniformly distributed Uniform
Gene name is uniformly distributed Uniform
Tissue is uniformly distributed Uniform
TPM has 168391 (19.9%) zeros Zeros
pTPM has 156534 (18.5%) zeros Zeros
NX has 144803 (17.1%) zeros Zeros

Reproduction

Analysis started2021-04-30 11:11:23.203440
Analysis finished2021-04-30 11:11:43.978099
Duration20.77 seconds
Software versionpandas-profiling v2.11.0
Download configurationconfig.yaml

Variables

Gene
Categorical

HIGH CARDINALITY
UNIFORM

Distinct19670
Distinct (%)2.3%
Missing0
Missing (%)0.0%
Memory size6.5 MiB
ENSG00000132854
 
43
ENSG00000183354
 
43
ENSG00000172782
 
43
ENSG00000141696
 
43
ENSG00000177283
 
43
Other values (19665)
845595 

Length

Max length15
Median length15
Mean length15
Min length15

Characters and Unicode

Total characters12687150
Distinct characters14
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowENSG00000000003
2nd rowENSG00000000003
3rd rowENSG00000000003
4th rowENSG00000000003
5th rowENSG00000000003
ValueCountFrequency (%)
ENSG0000013285443
 
< 0.1%
ENSG0000018335443
 
< 0.1%
ENSG0000017278243
 
< 0.1%
ENSG0000014169643
 
< 0.1%
ENSG0000017728343
 
< 0.1%
ENSG0000020605243
 
< 0.1%
ENSG0000013324743
 
< 0.1%
ENSG0000013699943
 
< 0.1%
ENSG0000011715243
 
< 0.1%
ENSG0000024211043
 
< 0.1%
Other values (19660)845380
99.9%
2021-04-30T20:11:44.258136image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ensg0000019734543
 
< 0.1%
ensg0000018099943
 
< 0.1%
ensg0000017943143
 
< 0.1%
ensg0000013707843
 
< 0.1%
ensg0000019674343
 
< 0.1%
ensg0000011332843
 
< 0.1%
ensg0000012419643
 
< 0.1%
ensg0000020606943
 
< 0.1%
ensg0000016419943
 
< 0.1%
ensg0000017278543
 
< 0.1%
Other values (19660)845380
99.9%

Most occurring characters

ValueCountFrequency (%)
04749780
37.4%
11044728
 
8.2%
E845810
 
6.7%
N845810
 
6.7%
S845810
 
6.7%
G845810
 
6.7%
2507314
 
4.0%
6463239
 
3.7%
8448060
 
3.5%
7444276
 
3.5%
Other values (4)1646513
 
13.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number9303910
73.3%
Uppercase Letter3383240
 
26.7%

Most frequent character per category

ValueCountFrequency (%)
04749780
51.1%
11044728
 
11.2%
2507314
 
5.5%
6463239
 
5.0%
8448060
 
4.8%
7444276
 
4.8%
4430086
 
4.6%
3428581
 
4.6%
5415552
 
4.5%
9372294
 
4.0%
ValueCountFrequency (%)
E845810
25.0%
N845810
25.0%
S845810
25.0%
G845810
25.0%

Most occurring scripts

ValueCountFrequency (%)
Common9303910
73.3%
Latin3383240
 
26.7%

Most frequent character per script

ValueCountFrequency (%)
04749780
51.1%
11044728
 
11.2%
2507314
 
5.5%
6463239
 
5.0%
8448060
 
4.8%
7444276
 
4.8%
4430086
 
4.6%
3428581
 
4.6%
5415552
 
4.5%
9372294
 
4.0%
ValueCountFrequency (%)
E845810
25.0%
N845810
25.0%
S845810
25.0%
G845810
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII12687150
100.0%

Most frequent character per block

ValueCountFrequency (%)
04749780
37.4%
11044728
 
8.2%
E845810
 
6.7%
N845810
 
6.7%
S845810
 
6.7%
G845810
 
6.7%
2507314
 
4.0%
6463239
 
3.7%
8448060
 
3.5%
7444276
 
3.5%
Other values (4)1646513
 
13.0%

Gene name
Categorical

HIGH CARDINALITY
UNIFORM

Distinct19651
Distinct (%)2.3%
Missing0
Missing (%)0.0%
Memory size6.5 MiB
PRSS50
 
86
PDE11A
 
86
DIABLO
 
86
COG8
 
86
CCDC39
 
86
Other values (19646)
845380 

Length

Max length15
Median length5
Mean length5.55526182
Min length2

Characters and Unicode

Total characters4698696
Distinct characters41
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTSPAN6
2nd rowTSPAN6
3rd rowTSPAN6
4th rowTSPAN6
5th rowTSPAN6
ValueCountFrequency (%)
PRSS5086
 
< 0.1%
PDE11A86
 
< 0.1%
DIABLO86
 
< 0.1%
COG886
 
< 0.1%
CCDC3986
 
< 0.1%
TBCE86
 
< 0.1%
ATXN786
 
< 0.1%
ALDOA86
 
< 0.1%
POLR2J386
 
< 0.1%
H2BFS86
 
< 0.1%
Other values (19641)844950
99.9%
2021-04-30T20:11:44.557134image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
igf286
 
< 0.1%
tmsb15b86
 
< 0.1%
txnrd3nb86
 
< 0.1%
hspa1486
 
< 0.1%
fam212b86
 
< 0.1%
atxn786
 
< 0.1%
abcf286
 
< 0.1%
cog886
 
< 0.1%
ccdc3986
 
< 0.1%
pde11a86
 
< 0.1%
Other values (19641)844950
99.9%

Most occurring characters

ValueCountFrequency (%)
1366188
 
7.8%
A341076
 
7.3%
C295625
 
6.3%
P278855
 
5.9%
R237274
 
5.0%
2236113
 
5.0%
T196725
 
4.2%
S195607
 
4.2%
L190490
 
4.1%
N178407
 
3.8%
Other values (31)2182336
46.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter3375672
71.8%
Decimal Number1250569
 
26.6%
Lowercase Letter46956
 
1.0%
Other Punctuation15953
 
0.3%
Dash Punctuation9546
 
0.2%

Most frequent character per category

ValueCountFrequency (%)
A341076
 
10.1%
C295625
 
8.8%
P278855
 
8.3%
R237274
 
7.0%
T196725
 
5.8%
S195607
 
5.8%
L190490
 
5.6%
N178407
 
5.3%
M171828
 
5.1%
D170194
 
5.0%
Other values (16)1119591
33.2%
ValueCountFrequency (%)
1366188
29.3%
2236113
18.9%
3147017
11.8%
4107543
 
8.6%
589612
 
7.2%
673917
 
5.9%
065145
 
5.2%
762565
 
5.0%
855900
 
4.5%
946569
 
3.7%
ValueCountFrequency (%)
o15652
33.3%
r15652
33.3%
f15652
33.3%
ValueCountFrequency (%)
-9546
100.0%
ValueCountFrequency (%)
.15953
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin3422628
72.8%
Common1276068
 
27.2%

Most frequent character per script

ValueCountFrequency (%)
A341076
 
10.0%
C295625
 
8.6%
P278855
 
8.1%
R237274
 
6.9%
T196725
 
5.7%
S195607
 
5.7%
L190490
 
5.6%
N178407
 
5.2%
M171828
 
5.0%
D170194
 
5.0%
Other values (19)1166547
34.1%
ValueCountFrequency (%)
1366188
28.7%
2236113
18.5%
3147017
11.5%
4107543
 
8.4%
589612
 
7.0%
673917
 
5.8%
065145
 
5.1%
762565
 
4.9%
855900
 
4.4%
946569
 
3.6%
Other values (2)25499
 
2.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII4698696
100.0%

Most frequent character per block

ValueCountFrequency (%)
1366188
 
7.8%
A341076
 
7.3%
C295625
 
6.3%
P278855
 
5.9%
R237274
 
5.0%
2236113
 
5.0%
T196725
 
4.2%
S195607
 
4.2%
L190490
 
4.1%
N178407
 
3.8%
Other values (31)2182336
46.4%

Tissue
Categorical

UNIFORM

Distinct43
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.5 MiB
stomach
 
19670
appendix
 
19670
lung
 
19670
adipose tissue
 
19670
epididymis
 
19670
Other values (38)
747460 

Length

Max length17
Median length9
Mean length9.906976744
Min length4

Characters and Unicode

Total characters8379420
Distinct characters30
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowadipose tissue
2nd rowadrenal gland
3rd rowappendix
4th rowB-cells
5th rowbone marrow
ValueCountFrequency (%)
stomach19670
 
2.3%
appendix19670
 
2.3%
lung19670
 
2.3%
adipose tissue19670
 
2.3%
epididymis19670
 
2.3%
small intestine19670
 
2.3%
monocytes19670
 
2.3%
dendritic cells19670
 
2.3%
gallbladder19670
 
2.3%
testis19670
 
2.3%
Other values (33)649110
76.7%
2021-04-30T20:11:44.827137image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
gland78680
 
6.7%
muscle59010
 
5.0%
parathyroid19670
 
1.7%
epididymis19670
 
1.7%
vesicle19670
 
1.7%
tissue19670
 
1.7%
bladder19670
 
1.7%
smooth19670
 
1.7%
small19670
 
1.7%
lung19670
 
1.7%
Other values (45)885150
75.0%

Most occurring characters

ValueCountFrequency (%)
e944160
 
11.3%
l747460
 
8.9%
a668780
 
8.0%
s609770
 
7.3%
n531090
 
6.3%
r511420
 
6.1%
i491750
 
5.9%
t472080
 
5.6%
d432740
 
5.2%
o432740
 
5.2%
Other values (20)2537430
30.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter7887670
94.1%
Space Separator334390
 
4.0%
Uppercase Letter78680
 
0.9%
Dash Punctuation59010
 
0.7%
Other Punctuation19670
 
0.2%

Most frequent character per category

ValueCountFrequency (%)
e944160
12.0%
l747460
 
9.5%
a668780
 
8.5%
s609770
 
7.7%
n531090
 
6.7%
r511420
 
6.5%
i491750
 
6.2%
t472080
 
6.0%
d432740
 
5.5%
o432740
 
5.5%
Other values (13)2045680
25.9%
ValueCountFrequency (%)
B19670
25.0%
N19670
25.0%
K19670
25.0%
T19670
25.0%
ValueCountFrequency (%)
334390
100.0%
ValueCountFrequency (%)
-59010
100.0%
ValueCountFrequency (%)
,19670
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin7966350
95.1%
Common413070
 
4.9%

Most frequent character per script

ValueCountFrequency (%)
e944160
11.9%
l747460
 
9.4%
a668780
 
8.4%
s609770
 
7.7%
n531090
 
6.7%
r511420
 
6.4%
i491750
 
6.2%
t472080
 
5.9%
d432740
 
5.4%
o432740
 
5.4%
Other values (17)2124360
26.7%
ValueCountFrequency (%)
334390
81.0%
-59010
 
14.3%
,19670
 
4.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII8379420
100.0%

Most frequent character per block

ValueCountFrequency (%)
e944160
 
11.3%
l747460
 
8.9%
a668780
 
8.0%
s609770
 
7.3%
n531090
 
6.3%
r511420
 
6.1%
i491750
 
5.9%
t472080
 
5.6%
d432740
 
5.2%
o432740
 
5.2%
Other values (20)2537430
30.3%

TPM
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct10993
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean40.17736099
Minimum0
Maximum105008.5
Zeros168391
Zeros (%)19.9%
Memory size6.5 MiB
2021-04-30T20:11:44.961133image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10.2
median4
Q317.3
95-th percentile89.2
Maximum105008.5
Range105008.5
Interquartile range (IQR)17.1

Descriptive statistics

Standard deviation551.1248134
Coefficient of variation (CV)13.7172975
Kurtosis8332.463025
Mean40.17736099
Median Absolute Deviation (MAD)4
Skewness73.13227558
Sum33982413.7
Variance303738.5599
MonotocityNot monotonic
2021-04-30T20:11:45.097137image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0168391
 
19.9%
0.133328
 
3.9%
0.221579
 
2.6%
0.316337
 
1.9%
0.413191
 
1.6%
0.510937
 
1.3%
0.69675
 
1.1%
0.78686
 
1.0%
0.87994
 
0.9%
0.97333
 
0.9%
Other values (10983)548359
64.8%
ValueCountFrequency (%)
0168391
19.9%
0.133328
 
3.9%
0.221579
 
2.6%
0.316337
 
1.9%
0.413191
 
1.6%
ValueCountFrequency (%)
105008.51
< 0.1%
93205.51
< 0.1%
85341.61
< 0.1%
81278.31
< 0.1%
79765.21
< 0.1%

pTPM
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct12507
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean52.58360885
Minimum0
Maximum130893
Zeros156534
Zeros (%)18.5%
Memory size6.5 MiB
2021-04-30T20:11:45.233134image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10.3
median5.3
Q322.9
95-th percentile117.5
Maximum130893
Range130893
Interquartile range (IQR)22.6

Descriptive statistics

Standard deviation698.6151603
Coefficient of variation (CV)13.28579714
Kurtosis8085.80981
Mean52.58360885
Median Absolute Deviation (MAD)5.3
Skewness71.26023242
Sum44475742.2
Variance488063.1422
MonotocityNot monotonic
2021-04-30T20:11:45.369138image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0156534
 
18.5%
0.131311
 
3.7%
0.220680
 
2.4%
0.315443
 
1.8%
0.412782
 
1.5%
0.510540
 
1.2%
0.69155
 
1.1%
0.78198
 
1.0%
0.87436
 
0.9%
0.96807
 
0.8%
Other values (12497)566924
67.0%
ValueCountFrequency (%)
0156534
18.5%
0.131311
 
3.7%
0.220680
 
2.4%
0.315443
 
1.8%
0.412782
 
1.5%
ValueCountFrequency (%)
1308931
< 0.1%
118266.51
< 0.1%
115132.41
< 0.1%
106351.61
< 0.1%
102991.61
< 0.1%

NX
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct2506
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.227443279
Minimum0
Maximum3646.7
Zeros144803
Zeros (%)17.1%
Memory size6.5 MiB
2021-04-30T20:11:45.513101image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10.4
median4.3
Q312.3
95-th percentile31.3
Maximum3646.7
Range3646.7
Interquartile range (IQR)11.9

Descriptive statistics

Standard deviation21.11759345
Coefficient of variation (CV)2.288563886
Kurtosis4194.428102
Mean9.227443279
Median Absolute Deviation (MAD)4.3
Skewness41.08755053
Sum7804663.8
Variance445.9527531
MonotocityNot monotonic
2021-04-30T20:11:45.642134image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0144803
 
17.1%
0.122776
 
2.7%
0.217337
 
2.0%
0.314591
 
1.7%
0.412992
 
1.5%
0.511714
 
1.4%
0.610842
 
1.3%
0.710129
 
1.2%
0.89266
 
1.1%
0.99021
 
1.1%
Other values (2496)582339
68.8%
ValueCountFrequency (%)
0144803
17.1%
0.122776
 
2.7%
0.217337
 
2.0%
0.314591
 
1.7%
0.412992
 
1.5%
ValueCountFrequency (%)
3646.71
< 0.1%
3207.81
< 0.1%
3057.41
< 0.1%
2541.21
< 0.1%
2537.11
< 0.1%

Interactions

2021-04-30T20:11:40.116101image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-30T20:11:40.464134image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-30T20:11:40.799134image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-30T20:11:41.136134image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-30T20:11:41.478102image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-30T20:11:41.820099image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2021-04-30T20:11:45.762101image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-30T20:11:45.909132image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-30T20:11:46.035134image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-30T20:11:46.161136image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Missing values

2021-04-30T20:11:42.540140image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-04-30T20:11:43.004135image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

GeneGene nameTissueTPMpTPMNX
0ENSG00000000003TSPAN6adipose tissue31.537.79.8
1ENSG00000000003TSPAN6adrenal gland26.432.77.6
2ENSG00000000003TSPAN6appendix9.214.52.1
3ENSG00000000003TSPAN6B-cells0.10.20.3
4ENSG00000000003TSPAN6bone marrow0.70.90.1
5ENSG00000000003TSPAN6breast53.368.116.2
6ENSG00000000003TSPAN6cerebral cortex18.522.84.1
7ENSG00000000003TSPAN6cervix, uterine54.266.014.2
8ENSG00000000003TSPAN6colon48.570.617.8
9ENSG00000000003TSPAN6dendritic cells0.10.10.2

Last rows

GeneGene nameTissueTPMpTPMNX
845800ENSG00000285509AP000646.1skin0.00.00.0
845801ENSG00000285509AP000646.1small intestine0.00.00.0
845802ENSG00000285509AP000646.1smooth muscle0.20.20.2
845803ENSG00000285509AP000646.1spleen0.10.10.0
845804ENSG00000285509AP000646.1stomach0.00.00.0
845805ENSG00000285509AP000646.1T-cells0.00.00.2
845806ENSG00000285509AP000646.1testis1.01.31.7
845807ENSG00000285509AP000646.1thyroid gland0.10.10.0
845808ENSG00000285509AP000646.1tonsil0.00.00.0
845809ENSG00000285509AP000646.1urinary bladder0.20.40.5