## 스타트업 현실에 맞는 정보만 엑기스 요약
## ML-enabled system 이란?
: composed of data engineering, ML engineering, and application engineering tasks
- Data engineering involves ingesting, integrating, curating, and refining data.
- ML engineering provides automated and streamlined ML process that handles the unique complexities of the practical applications of ML : building, deploying, and operationalizing ML systems.
- MLOps is a methodology for ML engineering that unifies ML system development (the ML element) with ML system operations (the Ops element).
## MLOps workflow(process)
- It is not a waterfall workflow that has to sequentially pass through all the process. The processes can be skipped, or the flow can repeat a given phase or a subsequence of the processes.
- Many organizations start by focusing on the processes for ML development, model deployment, and prediction serving.
- Model development : data preparation, transformation to model training and evaluation
- creating labeled datasets, using features and other reusable ML artifacts
- output : data preprocessing, model architecture, and model training settings.
- Model deployment : packaging, testing, and deploying a model to a serving Env. for production serving.
- annotating, reviewing, and approving registered models for release , and deploying them to a production Env.
- output : serving package
- Prediction serving : serving the model that is deployed in production for inference.
- serving predictions using the deployment pattern that you have specified : online, batch, or streaming predictions
- output : generating model explanations and capturing serving logs
- Data & model management : central+cross-cutting function for governing ML artifacts
- Feature management : a central definition of features, avoiding training-serving skew, providing a way of defining and sharing new entities and features
- Dataset management : maintaining scripts for creating datasets and splits, dataset definition and realization ; splits(train-eval-test) and filtering conditions, metadata and annotation(label), providing reproducibility and lineage tracking
- ML metadata tracking : producing and managing artifacts ; experimentation parameters and pipeline (run) configuration for tracking, comparing models, and understanding issues
- Model governance : managing whether ML models are ready to go to production ; registering, validating, reviewing, and approving models for deployment and reporting on performance
- Model development : data preparation, transformation to model training and evaluation
## Core MLOps technical capabilities
- Foundational capabilities
- Supporting IT workload - infra, security, privacy
- Source & artifact repositories + CI&CD
- 통합/분리가능한 데이터 영역
- Dataset & feature repository [Model Dev.] [Data&Model]
- Unify the definition and the storage of the ML data assets.
- Provide data consistency[+freshness,high-quality] for training and inference
- Enable shareability, discoverability, reusability, and versioning of data assets.
- ML metadata & artifact repository [Model Dev.] [Data&Model]
- Provide traceability and lineage tracking of ML artifacts.
- Share and track experimentation and pipeline parameter configurations.
- Store, access, investigate, visualize, download, and archive ML artifacts.
- Integrate with all other MLOps capabilities.
- Dataset & feature repository [Model Dev.] [Data&Model]
- Core MLOps capabilites
- Experimentation [Model Dev.]
- Data processing [Model Dev.]
- Model training [Model Dev.]
- Model evaluation
- (생략) Model serving, Online experimentation, Model monitoring,
- ML pipelines
- Model registry [Model Dev.] [Data&Model]
source: https://services.google.com/fh/files/misc/practitioners_guide_to_mlops_whitepaper.pdf