본문 바로가기
카테고리 없음

MLOps 실습자 가이드 : 구글2021

by bents 2022. 7. 29.

## 스타트업 현실에 맞는 정보만 엑기스 요약


## ML-enabled system 이란?

: composed of data engineering, ML engineering, and application engineering tasks

  • Data engineering involves ingesting, integrating, curating, and refining data.
  • ML engineering provides automated and streamlined ML process that handles the unique complexities of the practical applications of ML : building, deploying, and operationalizing ML systems.
  • MLOps is a methodology for ML engineering that unifies ML system development (the ML element) with ML system operations (the Ops element).

## MLOps workflow(process)

  • It is not a waterfall workflow that has to sequentially pass through all the process. The processes can be skipped, or the flow can repeat a given phase or a subsequence of the processes.
  • Many organizations start by focusing on the processes for ML development, model deployment, and prediction serving.
    • Model development : data preparation, transformation to model training and evaluation
      • creating labeled datasets, using features and other reusable ML artifacts
      • output : data preprocessing, model architecture, and model training settings.
    • Model deployment : packaging, testing, and deploying a model to a serving Env. for production serving.
      • annotating, reviewing, and approving registered models for release , and deploying them to a production Env.
      • output : serving package
    • Prediction serving : serving the model that is deployed in production for inference.
      • serving predictions using the deployment pattern that you have specified : online, batch, or streaming predictions
      • output : generating model explanations and capturing serving logs
    • Data & model management : central+cross-cutting function for governing ML artifacts
      • Feature management : a central definition of features, avoiding training-serving skew, providing a way of defining and sharing new entities and features
      • Dataset management : maintaining scripts for creating datasets and splits, dataset definition and realization ; splits(train-eval-test) and filtering conditions, metadata and annotation(label), providing reproducibility and lineage tracking
      • ML metadata tracking : producing and managing artifacts ; experimentation parameters and pipeline (run) configuration for tracking, comparing models, and understanding issues
      • Model governance : managing whether ML models are ready to go to production ; registering, validating, reviewing, and approving models for deployment and reporting on performance

## Core MLOps technical capabilities

  • Foundational capabilities
    1. Supporting IT workload - infra, security, privacy
    2. Source & artifact repositories + CI&CD
  • 통합/분리가능한 데이터 영역
    1. Dataset & feature repository [Model Dev.] [Data&Model]
      1. Unify the definition and the storage of the ML data assets.
      2. Provide data consistency[+freshness,high-quality] for training and inference
      3. Enable shareability, discoverability, reusability, and versioning of data assets.
    2. ML metadata & artifact repository [Model Dev.] [Data&Model]
      1. Provide traceability and lineage tracking of ML artifacts.
      2. Share and track experimentation and pipeline parameter configurations.
      3. Store, access, investigate, visualize, download, and archive ML artifacts.
      4. Integrate with all other MLOps capabilities.
  • Core MLOps capabilites
    1. Experimentation [Model Dev.]
    2. Data processing [Model Dev.]
    3. Model training [Model Dev.]
    4. Model evaluation
    5. (생략) Model serving, Online experimentation, Model monitoring,
    6. ML pipelines
    7. Model registry [Model Dev.] [Data&Model]

 

 

source: https://services.google.com/fh/files/misc/practitioners_guide_to_mlops_whitepaper.pdf