Skip to main content

Pipelines

Data Preprocessing

The preprocessing pipeline is responsible for transforming raw input data (pycognaize.Snapshot) into processed data suitable for training our models.

  • Data ingestion: Loading raw data from various sources, such as databases, files, or APIs.
  • Data cleaning: Handling missing values, outliers, and inconsistencies in the data.
  • Feature engineering: Extracting relevant features from the data, including scaling, normalization, and encoding categorical variables.

Training

The training pipeline is responsible for training and optimizing our machine learning models.

  • Model training: Iteratively updating the model's parameters using training data to minimize the loss function.
  • Hyperparameter tuning: Optimizing the model's hyperparameters to achieve better performance.
  • Model versioning: Tracking and comparing different versions of the trained models for iterative improvement.

Data Postprocessing

The postprocessing pipeline is responsible for refining the outputs generated by our trained models.

The pipeline performs the following tasks:

  • Result filtering: Identifying and excluding irrelevant or low-confidence predictions.
  • Normalization: Scaling or transforming predictions to a common range or format.
  • Aggregation: Combining multiple predictions or model outputs to generate a consolidated result.