Class DataFrame

java.lang.Object
org.snpsift.annotate.mem.dataFrame.DataFrame
All Implemented Interfaces:
Serializable
Direct Known Subclasses:
DataFrameDel, DataFrameIns, DataFrameMixed, DataFrameMnp, DataFrameOther, DataFrameSnp

public class DataFrame extends Object implements Serializable
A set of DataColumns, indexed by position. This class is used to store data for a chromosome. The DataFrame class manages a collection of data columns, each represented by a DataFrameColumn object. It provides methods to add rows, retrieve rows, and perform various operations on the data. The data is indexed by position using a PosIndex object, and can optionally include reference and alternative alleles. The class also supports creating columns based on VCF header information and resizing the data for memory optimization. The main components of the DataFrame class are: - VariantTypeCounter variantTypeCounter: Keeps track of variant types and their counts. - VariantCategory variantCategory: Represents the category of variants. - int currentIdx: The current index for adding new rows. - PosIndex posIndex: Indexes the data by chromosome position. - StringArray refs: Stores reference alleles. - StringArray alts: Stores alternative alleles. - Mapinvalid input: '<'String, DataFrameColumninvalid input: '<'?>> columns: A map of column names to DataFrameColumn objects. - Fields fields: Represents the fields to create or annotate. The class provides the following key methods: - add(String name, DataFrameColumninvalid input: '<'?> column): Adds a column to the DataFrame. - add(DataFrameRow row): Adds a row to the DataFrame. - check(): Checks the integrity of the data. - columnNames(): Returns an iterable of column names. - createColumn(VcfHeaderInfo vcfHeaderInfo): Creates a column based on VCF header information. - createColumns(): Creates columns based on the fields. - eq(int idx, int pos, String ref, String alt): Checks if the entry at the given index matches the specified position, reference, and alternative alleles. - get(String columnName, int idx): Retrieves data from a column by index. - getColumn(String name): Retrieves a column by name. - getRow(int pos, String ref, String alt): Retrieves a row based on position, reference, and alternative alleles. - find(int pos, String ref, String alt): Finds the index of a row based on position, reference, and alternative alleles. - hasEntry(int pos, String ref, String alt): Checks if an entry exists for the specified position, reference, and alternative alleles. - resize(): Resizes and optimizes the memory usage of the data. - set(String columnName, int idx, Object value): Sets data in a column. - sizeBytes(): Returns the memory size of the DataFrame. - stringArrayMemSize(VariantCategory variantCategory, String field): Calculates the memory size for a string array. - toString(): Returns a string representation of the DataFrame.
See Also:
  • Field Details

  • Constructor Details

  • Method Details

    • add

      public void add(DataFrameRow row)
      Add a row to the data frame
    • check

      public void check()
    • columnNames

      public Iterable<String> columnNames()
    • createColumn

      protected DataFrameColumn<?> createColumn(org.snpeff.vcf.VcfHeaderInfo vcfHeaderInfo)
      Create a column of a given type
    • createColumns

      protected void createColumns()
      Create columns based on fields
    • eq

      protected boolean eq(int idx, int pos, String ref, String alt)
      Does the entry at possition 'idx' match the given (pos, ref, alt) values?
    • get

      protected Object get(String columnName, int idx)
      Get data from a column by searching by position, reference and alternative alleles. Note: The value can be null
    • getColumn

      public DataFrameColumn<?> getColumn(String name)
      Get a column
    • getRow

      public DataFrameRow getRow(int pos, String ref, String alt)
      Get a 'row' from the data frame.
      Parameters:
      pos - : Position
      ref - : Reference allele
      alt - : Alternative allele
      Returns:
      A data frame row if found, or null if not found
    • find

      protected int find(int pos, String ref, String alt)
      Get data from a column index by searching by position, reference and alternative alleles.
      Returns:
      The index of the row in the data frame, or -1 if not found
    • hasEntry

      public boolean hasEntry(int pos, String ref, String alt)
      Get data from a column by searching by position, reference and alternative alleles. Note: The value can be null
    • resize

      public void resize()
      Resize and memory optimize the data
    • set

      protected void set(String columnName, int idx, Object value)
      Set data in a column
    • sizeBytes

      public long sizeBytes()
      Memory size of this object
    • toString

      public String toString()
      Overrides:
      toString in class Object