Skip to content

added dockerfile and pip dependencies for OSD and LLM model compression workloads

Lanxiang Hu requested to merge llm_specd into main

added dockerfiles and pip dependencies for:

  1. Speculative decoding workload for efficient LLM inference.
  2. Model post-training model compression workload (quantization for now) for LLM deployment.

Merge request reports

Loading