variantplaner ¶
VariantPlaner, a tool kit to manage many variants without many cpu and ram resource.
Convert a vcf in parquet, convert annotations in parquet, convert parquet in vcf.
But also build a file struct to get a fast variant database interrogations time.
Modules:
-
cli
–Module contains command line entry point function.
-
exception
–Exception could be generate by VariantPlanner.
-
extract
–Extract information of polars.LazyFrame produce from raw vcf file parsing.
-
generate
–Function to generate information.
-
io
–Module manage input parsing and output serializing.
-
normalization
–Function use to normalize data.
-
objects
–Module to store variantplaner object.
-
struct
–Generated data structures for easy integration.
Classes:
-
Annotations
–Object to manage lazyframe as Annotations.
-
ContigsLength
–Store contigs -> length information.
-
Genotypes
–Object to manage lazyframe as Genotypes.
-
Pedigree
–Object to manage lazyframe as Variants.
-
Variants
–Object to manage lazyframe as Variants.
-
Vcf
–Object to manage lazyframe as Vcf.
-
VcfHeader
–Object that parse and store vcf information.
-
VcfParsingBehavior
–Enumeration use to control behavior of IntoLazyFrame.
Annotations ¶
Annotations()
Bases: LazyFrame
Object to manage lazyframe as Annotations.
Methods:
-
minimal_schema
–Get minimal schema of genotypes polars.LazyFrame.
Source code in src/variantplaner/objects/annotations.py
15 16 17 |
|
minimal_schema classmethod
¶
Get minimal schema of genotypes polars.LazyFrame.
Source code in src/variantplaner/objects/annotations.py
19 20 21 22 23 24 |
|
ContigsLength ¶
ContigsLength()
Store contigs -> length information.
Methods:
-
from_path
–Fill object with file point by pathlib.Path.
-
from_vcf_header
–Fill a object with VcfHeader.
Source code in src/variantplaner/objects/contigs_length.py
31 32 33 34 35 36 37 38 39 |
|
from_path ¶
Fill object with file point by pathlib.Path.
Argument: path: path of input file
Returns: Number of contigs line view
Source code in src/variantplaner/objects/contigs_length.py
66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
|
from_vcf_header ¶
Fill a object with VcfHeader.
Argument
header: VcfHeader
Returns: Number of contigs line view
Source code in src/variantplaner/objects/contigs_length.py
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
|
Genotypes ¶
Genotypes(data: LazyFrame | None = None)
Bases: LazyFrame
Object to manage lazyframe as Genotypes.
Methods:
-
minimal_schema
–Get minimal schema of genotypes polars.LazyFrame.
-
samples_names
–Get list of sample name.
Source code in src/variantplaner/objects/genotypes.py
15 16 17 18 19 20 |
|
minimal_schema classmethod
¶
Get minimal schema of genotypes polars.LazyFrame.
Source code in src/variantplaner/objects/genotypes.py
26 27 28 29 30 31 32 |
|
samples_names ¶
Get list of sample name.
Source code in src/variantplaner/objects/genotypes.py
22 23 24 |
|
Pedigree ¶
Pedigree()
Bases: LazyFrame
Object to manage lazyframe as Variants.
Methods:
-
from_path
–Read a pedigree file in polars.LazyFrame.
-
minimal_schema
–Get schema of variants polars.LazyFrame.
-
to_path
–Write pedigree polars.LazyFrame in ped format.
Source code in src/variantplaner/objects/pedigree.py
19 20 21 |
|
from_path ¶
from_path(input_path: Path) -> None
Read a pedigree file in polars.LazyFrame.
Parameters:
-
input_path
(Path
) –Path to pedigree file.
Returns:
-
None
–A polars.LazyFrame that contains ped information ('family_id', 'personal_id', 'father_id', 'mother_id', 'sex', 'affected')
Source code in src/variantplaner/objects/pedigree.py
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
|
minimal_schema classmethod
¶
Get schema of variants polars.LazyFrame.
Source code in src/variantplaner/objects/pedigree.py
62 63 64 65 66 67 68 69 70 71 72 73 74 |
|
to_path ¶
to_path(output_path: Path) -> None
Write pedigree polars.LazyFrame in ped format.
Warning: This function performs polars.LazyFrame.collect before write csv, this can have a significant impact on memory usage
Parameters:
-
lf
–LazyFrame contains pedigree information.
-
output_path
(Path
) –Path where write pedigree information.
Returns:
-
None
–None
Source code in src/variantplaner/objects/pedigree.py
48 49 50 51 52 53 54 55 56 57 58 59 60 |
|
Variants ¶
Variants(data: LazyFrame | None = None)
Bases: LazyFrame
Object to manage lazyframe as Variants.
Methods:
-
minimal_schema
–Get schema of variants polars.LazyFrame.
Source code in src/variantplaner/objects/variants.py
15 16 17 18 19 20 |
|
minimal_schema classmethod
¶
Get schema of variants polars.LazyFrame.
Source code in src/variantplaner/objects/variants.py
22 23 24 25 26 27 28 29 30 31 |
|
Vcf ¶
Vcf()
Object to manage lazyframe as Vcf.
Methods:
-
add_genotypes
–Add genotypes information in vcf.
-
annotations
–Get annotations of vcf.
-
from_path
–Populate Vcf object with vcf file.
-
genotypes
–Get genotype of vcf.
-
schema
–Get schema of Vcf polars.LazyFrame.
-
set_variants
–Set variants of vcf.
-
variants
–Get variants of vcf.
Source code in src/variantplaner/objects/vcf.py
50 51 52 53 54 |
|
add_genotypes ¶
add_genotypes(genotypes_lf: Genotypes) -> None
Add genotypes information in vcf.
Source code in src/variantplaner/objects/vcf.py
157 158 159 160 161 162 163 164 165 166 167 |
|
annotations ¶
annotations(
select_info: set[str] | None = None,
) -> Annotations
Get annotations of vcf.
Source code in src/variantplaner/objects/vcf.py
169 170 171 172 173 |
|
from_path ¶
from_path(
path: Path,
chr2len_path: Path | None,
behavior: VcfParsingBehavior = NOTHING,
) -> None
Populate Vcf object with vcf file.
Source code in src/variantplaner/objects/vcf.py
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 |
|
genotypes ¶
Get genotype of vcf.
Source code in src/variantplaner/objects/vcf.py
108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 |
|
schema classmethod
¶
Get schema of Vcf polars.LazyFrame.
Source code in src/variantplaner/objects/vcf.py
175 176 177 178 179 180 181 182 183 184 185 186 187 |
|
VcfHeader ¶
VcfHeader()
Object that parse and store vcf information.
Methods:
-
column_name
–Get an iterator of correct column name.
-
format_parser
–Generate a list of polars.Expr to extract genotypes information.
-
from_files
–Populate VcfHeader object with content of only header file.
-
from_lines
–Extract all header information of vcf lines.
-
info_parser
–Generate a list of polars.Expr to extract variants information.
Attributes:
-
contigs
(Iterator[str]
) –Get an iterator of line contains chromosomes information.
-
samples_index
(dict[str, int] | None
) –Read vcf header to generate an association map between sample name and index.
Source code in src/variantplaner/objects/vcf_header.py
27 28 29 |
|
contigs cached
property
¶
Get an iterator of line contains chromosomes information.
Returns: String iterator
samples_index cached
property
¶
Read vcf header to generate an association map between sample name and index.
Args: header: Header string.
Returns: Map that associate a sample name to is sample index.
Raises: NotVcfHeaderError: If all line not start by '#CHR'
column_name ¶
Get an iterator of correct column name.
Returns: String iterator
Source code in src/variantplaner/objects/vcf_header.py
225 226 227 228 229 230 231 232 233 234 235 236 |
|
format_parser ¶
Generate a list of polars.Expr to extract genotypes information.
Warning: Float values can't be converted for the moment they are stored as String to keep information
Args: header: Line of vcf header. input_path: Path to vcf file. select_format: List of target format field.
Returns: A dict to link format id to pipeable function with Polars.Expr
Raises: NotVcfHeaderError: If all line not start by '#CHR'
Source code in src/variantplaner/objects/vcf_header.py
130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 |
|
from_files ¶
from_files(path: Path) -> None
Populate VcfHeader object with content of only header file.
Args: path: Path of file
Returns: None
Source code in src/variantplaner/objects/vcf_header.py
31 32 33 34 35 36 37 38 39 40 41 42 43 |
|
from_lines ¶
Extract all header information of vcf lines.
Line between start of file and first line start with '#CHROM' or not start with '#'
Args: lines: Iterator of line
Returns: None
Raises: NotAVcfHeader: If a line not starts with '#' NotAVcfHeader: If no line start by '#CHROM'
Source code in src/variantplaner/objects/vcf_header.py
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
|
info_parser ¶
Generate a list of polars.Expr to extract variants information.
Args: header: Line of vcf header input_path: Path to vcf file. select_info: List of target info field
Returns: List of polars.Expr to parse info columns.
Raises: NotVcfHeaderError: If all line not start by '#CHR'
Source code in src/variantplaner/objects/vcf_header.py
73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 |
|
VcfParsingBehavior ¶
Bases: IntFlag
Enumeration use to control behavior of IntoLazyFrame.
Attributes: