πŸ“‹ Model Description


license: other language:
  • en
pipeline_tag: text-generation inference: false tags:
  • transformers
  • gguf
  • imatrix
  • Phi-4-mini-instruct

Quantizations of https://huggingface.co/microsoft/Phi-4-mini-instruct

Note: you will need llama.cpp b4792 or later to run the model.

Inference Clients/UIs


From original readme

Phi-4-mini-instruct is a lightweight open model built upon synthetic data and filtered publicly available websites - with a focus on high-quality, reasoning dense data. The model belongs to the Phi-4 model family and supports 128K token context length. The model underwent an enhancement process, incorporating both supervised fine-tuning and direct preference optimization to support precise instruction adherence and robust safety measures.

πŸ“° Phi-4-mini Microsoft Blog

πŸ“– Phi-4-mini Technical Report

πŸ‘©β€πŸ³ Phi Cookbook

🏑 Phi Portal

πŸ–₯️ Try It Azure, Huggingface

Phi-4:
[mini-instruct | onnx];
multimodal-instruct;

Usage

Tokenizer

Phi-4-mini-instruct supports a vocabulary size of up to 200064 tokens. The tokenizer files already provide placeholder tokens that can be used for downstream fine-tuning, but they can also be extended up to the model's vocabulary size.

Input Formats

Given the nature of the training data, the Phi-4-mini-instruct
model is best suited for prompts using specific formats.
Below are the two primary formats:

#### Chat format

This format is used for general conversation and instructions:

<|system|>Insert System Message<|end|><|user|>Insert User Message<|end|><|assistant|>

#### Tool-enabled function-calling format

This format is used when the user wants the model to provide function calls based on the given tools. The user should provide the available tools in the system prompt, wrapped by <|tool|> and <|/tool|> tokens. The tools should be specified in JSON format, using a JSON dump structure. Example:


<|system|>You are a helpful assistant with some tools.<|tool|>[{"name": "getweatherupdates", "description": "Fetches weather updates for a given city using the RapidAPI Weather API.", "parameters": {"city": {"description": "The name of the city for which to retrieve weather information.", "type": "str", "default": "London"}}}]<|/tool|><|end|><|user|>What is the weather like in Paris today?<|end|><|assistant|>

Inference with vLLM

#### Requirements

List of required packages:

flash_attn==2.7.4.post1
torch==2.6.0
vllm>=0.7.2

#### Example

To perform inference using vLLM, you can use the following code snippet:

from vllm import LLM, SamplingParams

llm = LLM(model="microsoft/Phi-4-mini-instruct", trustremotecode=True)

messages = [
{"role": "system", "content": "You are a helpful AI assistant."},
{"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
{"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."},
{"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]

sampling_params = SamplingParams(
max_tokens=500,
temperature=0.0,
)

output = llm.chat(messages=messages, samplingparams=samplingparams)
print(output[0].outputs[0].text)

Inference with Transformers

#### Requirements

Phi-4 family has been integrated in the 4.49.0 version of transformers. The current transformers version can be verified with: pip list | grep transformers.

List of required packages:

flash_attn==2.7.4.post1
torch==2.6.0
transformers==4.49.0
accelerate==1.3.0

Phi-4-mini-instruct is also available in [Azure AI Studio]()

#### Example

After obtaining the Phi-4-mini-instruct model checkpoints, users can use this sample code for inference.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

torch.random.manual_seed(0)

model_path = "microsoft/Phi-4-mini-instruct"

model = AutoModelForCausalLM.from_pretrained(
model_path,
device_map="auto",
torch_dtype="auto",
trustremotecode=True,
)
tokenizer = AutoTokenizer.frompretrained(modelpath)

messages = [
{"role": "system", "content": "You are a helpful AI assistant."},
{"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
{"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."},
{"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]

pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
)

generation_args = {
"maxnewtokens": 500,
"returnfulltext": False,
"temperature": 0.0,
"do_sample": False,
}

output = pipe(messages, generation_args)
print(output[0]['generated_text'])

πŸ“‚ GGUF File List

πŸ“ Filename πŸ“¦ Size ⚑ Download
Phi-4-mini-instruct-IQ1_M.gguf
LFS
1.08 GB Download
Phi-4-mini-instruct-IQ1_S.gguf
LFS
1.02 GB Download
Phi-4-mini-instruct-IQ2_M.gguf
LFS Q2
1.4 GB Download
Phi-4-mini-instruct-IQ2_S.gguf
LFS Q2
1.32 GB Download
Phi-4-mini-instruct-IQ2_XS.gguf
LFS Q2
1.27 GB Download
Phi-4-mini-instruct-IQ2_XXS.gguf
LFS Q2
1.18 GB Download
Phi-4-mini-instruct-IQ3_M.gguf
LFS Q3
1.88 GB Download
Phi-4-mini-instruct-IQ3_S.gguf
LFS Q3
1.77 GB Download
Phi-4-mini-instruct-IQ3_XS.gguf
LFS Q3
1.71 GB Download
Phi-4-mini-instruct-IQ3_XXS.gguf
LFS Q3
1.56 GB Download
Phi-4-mini-instruct-IQ4_NL.gguf
LFS Q4
2.17 GB Download
Phi-4-mini-instruct-IQ4_XS.gguf
LFS Q4
2.07 GB Download
Phi-4-mini-instruct-Q2_K.gguf
LFS Q2
1.57 GB Download
Phi-4-mini-instruct-Q2_K_S.gguf
LFS Q2
1.48 GB Download
Phi-4-mini-instruct-Q3_K_L.gguf
LFS Q3
2.1 GB Download
Phi-4-mini-instruct-Q3_K_M.gguf
LFS Q3
1.97 GB Download
Phi-4-mini-instruct-Q3_K_S.gguf
LFS Q3
1.77 GB Download
Phi-4-mini-instruct-Q4_0.gguf
Recommended LFS Q4
2.17 GB Download
Phi-4-mini-instruct-Q4_1.gguf
LFS Q4
2.35 GB Download
Phi-4-mini-instruct-Q4_K_M.gguf
LFS Q4
2.32 GB Download
Phi-4-mini-instruct-Q4_K_S.gguf
LFS Q4
2.18 GB Download
Phi-4-mini-instruct-Q5_0.gguf
LFS Q5
2.55 GB Download
Phi-4-mini-instruct-Q5_1.gguf
LFS Q5
2.73 GB Download
Phi-4-mini-instruct-Q5_K_M.gguf
LFS Q5
2.65 GB Download
Phi-4-mini-instruct-Q5_K_S.gguf
LFS Q5
2.54 GB Download
Phi-4-mini-instruct-Q6_K.gguf
LFS Q6
2.94 GB Download
Phi-4-mini-instruct-Q8_0.gguf
LFS Q8
3.8 GB Download