πŸ“‹ Model Description


base_model:
  • google/embeddinggemma-300M

embeddinggemma-300M GGUF

Recommended way to run this model:

llama-server -hf ggml-org/embeddinggemma-300M-GGUF --embeddings

Then the endpoint can be accessed at http://localhost:8080/embedding, for
example using curl:

curl --request POST \
--url http://localhost:8080/embedding \
--header "Content-Type: application/json" \
--data '{"input": "Hello embeddings"}' \
--silent

Alternatively, the llama-embedding command line tool can be used:

llama-embedding -hf ggml-org/embeddinggemma-300M-GGUF --verbose-prompt -p "Hello embeddings"

#### embd_normalize
When a model uses pooling, or the pooling method is specified using --pooling,
the normalization can be controlled by the embd_normalize parameter.

The default value is 2 which means that the embeddings are normalized using
the Euclidean norm (L2). Other options are:

  • -1 No normalization
  • 0 Max absolute
  • 1 Taxicab
  • 2 Euclidean/L2
  • \>2 P-Norm

This can be passed in the request body to llama-server, for example:

--data '{"input": "Hello embeddings", "embd_normalize": -1}' \

And for llama-embedding, by passing --embd-normalize , for example:

llama-embedding -hf ggml-org/embeddinggemma-300M-GGUF  --embd-normalize -1 -p "Hello embeddings"

πŸ“‚ GGUF File List

πŸ“ Filename πŸ“¦ Size ⚑ Download
embeddinggemma-300M-Q8_0.gguf
Recommended LFS Q8
313.36 MB Download