AI on demand: Qwen/Qwen3.5-35B-A3B-FP8

From MediaWiki
Jump to navigation Jump to search

Calling the model

# Set your personal key:
STONEY_KEY=sk-...

# Set the desired model:
MODEL=Qwen/Qwen3.5-35B-A3B-FP8

# Set your prompt:
PROMPT='Hello.'

# Set maximum amount of tokens:
MAX_TOKENS=100

curl https://llm.stoney-cloud.com/v1/chat/completions \
        --silent --fail --show-error \
        --header "Authorization: Bearer $STONEY_KEY" \
        --header 'Content-Type: application/json'   \
        --data '{
                "model": "'"$MODEL"'",
                "messages": [
                        {"role": "user", "content": "'"$PROMPT"'"}
                ],
                "max_tokens": '"$MAX_TOKENS"'
        }' \
        | jq

Example output:

{
  "id": "chatcmpl-a8634d242bc04923",
  "object": "chat.completion",
  "created": 1778156754,
  "model": "Qwen/Qwen3.5-35B-A3B-FP8",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Thinking Process:\n\n1.  **Analyze the Input:**\n    *   Input: \"Hello.\"\n    *   Context: This is a greeting.\n    * Intent: The user is initiating a conversation.\n    * Tone: Friendly, polite, neutral.\n\n2.  **Determine the Appropriate Response:**\n    *   Acknowledge the greeting.\n    Offer assistance.\n Keep it friendly and open-ended.\n\n3.  **Drafting Options:**\n",
        "refusal": null,
        "annotations": null,
        "audio": null,
        "function_call": null,
        "tool_calls": [],
        "reasoning": null
      },
      "logprobs": null,
      "finish_reason": "length",
      "stop_reason": null,
      "token_ids": null
    }
  ],
  "service_tier": null,
  "system_fingerprint": null,
  "usage": {
    "prompt_tokens": 12,
    "total_tokens": 112,
    "completion_tokens": 100,
    "prompt_tokens_details": null
  },
  "prompt_logprobs": null,
  "prompt_token_ids": null,
  "kv_transfer_params": null
}