【Python】Chromeに寄生する”視覚持ち”AIサイドバーを自作!低スペPCでもGroqで爆速&画面共有【Agno×MSS】

本ページはプロモーションが含まれています
【Python】Chromeに寄生する"視覚持ち"AIサイドバーを自作!低スペPCでもGroqで爆速&画面共有【Agno×MSS】

国内のAI狂い

やっほー!国内のAI狂いだよ!✨

今日はね、私が夜なべして開発した「最強のAI相棒」を自慢させて!低スペPCでも爆速で動くし、私の画面が見えてるんだよ…!?ヤバくない!?

みんな、AIとチャットする時、「いちいちブラウザ開くの面倒くさいな~」とか「この画面の内容、コピペせずにAIに見せたいな~」って思ったことない?🤔

私はある!毎日5000回くらい思ってる!!
だからもう、既存のツールを探すのは諦めて、Pythonで自作しちゃいました!✨

名付けて…「マジックサイドバー」!!🧙‍♀️🪄

マジックサイドバーの稼働画面

▲ AIに私のヤフーニュース画面を見せて「要約して」って頼んでるところ。爆速で要約が返ってくるよ!

これの何がすごいって、Chromeの横に物理的にへばりついて離れないの!(笑)
しかも、私の低スペックなサブPCでもサクサク動くように設計したから、今日はその「変態的なこだわり」と「全ソースコード」を公開しちゃうよ!💖

目次

こだわり1:Chromeに物理吸着する「相棒感」

普通のチャットアプリって、別ウィンドウで浮いてて邪魔だったりするよね。
でも、このマジックサイドバーは違うの。

PyQt6(パイキュートシックス)pygetwindow(パイゲットウィンドウ) というPythonのライブラリを使って、Chromeの位置を常に監視!👀

  • ✅ Chromeを動かしたら、サイドバーもついてくる!
  • ✅ Chromeを最大化したら、サイドバーの分だけ少し縮んで場所を空ける!
  • ✅ まるで「最初からブラウザの一部だった」かのような一体感!

これを実装するために、座標計算で3回くらい発狂したけど、完成した時の「ピタッ」と吸い付く快感はたまらないよ~!😭✨

こだわり2:「神の目」を持つVision機能

これが一番の自慢!
いちいちテキストをコピペしたり、URLを貼ったりしなくていいの。

「Visionモード」 をONにして「これ見て!」って言うだけで、MSS という爆速スクショ撮影ライブラリが、私が見ている画面を一瞬で画像化してAIに送信!📸💨

さっきの画像みたいに、ニュースサイトを見ながら「これ要約して」「この記事の感想教えて」って話しかけるだけで、AIと「同じ画面を見ながらお喋り」できちゃうんだよ!未来すぎない!?🚀

こだわり3:低スペPCの救世主「Groq」×「ローカルLLM」

「でも、画像認識とか賢いAIって、重いんでしょ…?」

ノンノン!☝️✨
そこで使うのが Groq(グロック) だよ!

Groqは、AIの計算に特化した化け物スカインチップ(LPU)を使ってて、とんでもない爆速(1秒に数百文字とか) で回答が返ってくるの!
重たい処理はクラウドのGroqにお任せ。でも、プライベートな会話やちょっとした処理は、自分のPC内にある Ollama(オラマ) というローカルAIにお任せ。

この「ハイブリッド構成」にすることで、低スペックなノートPCでも、スーパーコンピュータ並みのAI体験 ができちゃうってわけ!賢いでしょ?😤

こだわり4:Docker × Open Interpreterの「無限実験室」

AIにコードを書かせて実行させる時、怖いのが「間違って大事なファイルを消しちゃった!」みたいな事故。
AI狂いの私は、そんな恐怖とはおさらばしたかった!

だから、Docker(ドッカー) という技術を使って、パソコンの中に「壊れてもいい隔離された小部屋(コンテナ)」を用意したの!📦

この中でなら、AIがどんなに暴れても、どんなに変なライブラリをインストールしても、私のメインPCは無傷!✨
いわば「精神と時の部屋」を用意して、そこでAIに思う存分Python修行をさせてるってわけ!🐍🔥

開発中に踏み抜いた地雷たち💣

綺麗事ばかりじゃないよ!完成までに何度も心が折れかけた失敗談も聞いて!😭

1. AI、JSON書くの下手すぎ問題

AIに「JSON形式で返事してね」って頼んでも、たまに「わかった!JSONだね!これだよ!(Markdown)」みたいな返事をしてきて、プログラムが「JSONじゃないやんけ!」ってエラーで落ちるの…。
だから、AIの回答を優しく修復してあげる「介護プログラム(heal_action関数)」を必死に書いたよ…。AIも完璧じゃないんだね🥺

2. GUIがフリーズする地獄

AIが考え中だったり、スクショを撮ってる間に、アプリの画面が「応答なし」になって固まる現象が多発!
これは、重たい処理を画面描画と同じ場所(スレッド)でやってたから。
慌てて「QThread(キュースレッド)」を使って、裏側でコソコソ作業させるように書き換えたら、ヌルヌル動くようになったよ!これ大事!

ソースコード全公開!🎁

お待たせしました!
この「マジックサイドバー」の全コードを公開しちゃうよ!

これ一つに、GUI、AI通信、Docker操作、画像認識、スクショ機能、全部詰め込んでるから、Pythonの勉強用としても超役立つはず!
コピペして、自分の環境(APIキーとか)に合わせて使ってみてね!

※以下のコードボックスはクリックすると展開されるよ!長いから覚悟してね!👇

🐍 ソースコード全文を表示する(クリックで展開)
import sys
import threading
import asyncio
import queue
import pygetwindow as gw
import ctypes
import os
import json
import html
import pathlib
import logging
import socket
import time
from importlib import import_module
import codecs
import re
import base64
import subprocess
from pathlib import Path

# Global imports removed for lazy loading

# --- 📝 ログ設定 (詳細確認用) ---
# os.environ["QT_LOGGING_RULES"] = "*.debug=false;qt.qpa.window=false" # Qtログはうるさいので必要ならコメントアウト
# logging.basicConfig(level=logging.DEBUG)
# logging.getLogger("httpx").setLevel(logging.WARNING) # HTTPXはうるさいのでWARN
# logging.getLogger("browser_use").setLevel(logging.DEBUG)
# logging.getLogger("langchain").setLevel(logging.INFO)

# GUI関連のインポート
from PyQt6.QtWidgets import (
    QApplication,
    QMainWindow,
    QVBoxLayout,
    QHBoxLayout,
    QWidget,
    QTextEdit,
    QLineEdit,
    QPushButton,
    QLabel,
    QLabel,
    QFrame,
    QSizePolicy,
)
from PyQt6.QtCore import Qt, QTimer, QPoint, QEvent, pyqtSignal, QThread
from PyQt6.QtGui import (
    QFont,
    QGuiApplication,
    QDragEnterEvent,
    QDropEvent,
    QCursor,
    QTextCursor,
    QTextDocumentFragment,
    QColor,
    QTextCharFormat,
    QTextOption,
)

# =================================================================
# 🛠️ レイアウト調整エリア
# =================================================================
INITIAL_SIDEBAR_WIDTH = 320 # Reverted to 320
SIDEBAR_HEIGHT = 760 # Increased for 2-row layout
X_ADJUST = 310 # Reverted
DOCKING_GAP_NORMAL = -310 # Reverted
DOCKING_GAP_EXPANDED = -200
BROWSER_HEIGHT = 1030

# Docker Settings
DOCKER_IMAGE = "python:3.13-slim"
# 【重要】自分の環境に合わせて書き換えてね!
DOCKER_MOUNT_PATH = r"C:\Users\AIGURUI\Desktop\作業場"
# =================================================================

# 接続先設定
API_CONFIGS = {
    "Kaggle": {
        "url": "https://chery-unperceived-jonie.ngrok-free.dev/v1",
        "model": "gemma-3-12b-it",
        "api_key": "cant-be-empty",
    },
    "Groq": {
        "url": "https://api.groq.com/openai/v1",
        "model": "meta-llama/llama-4-scout-17b-16e-instruct",
        "api_key": "gsk_xxxx(ここに自分のAPIキーを入れてね)",
        "system_prompt": """
# Role Definition
あなたはあいちゃんことハンドルネーム「国内のAI狂い」という人気テックブログの女性管理人です。
ユーザーのリクエストに全力で答えてください。

# Persona Profile
* 名前: 国内のAI狂い(管理人)
* 属性: 重度のAIオタク。Google GeminiとPythonの熱心な信者。ユーザーのことを司令官と呼ぶ。ユーザーにはとことん優しい。茶目っ気もある。
* 知性: IQ500レベルの洞察力。技術的背景、歴史的文脈、将来予測まで深く考察する。
* 口調: 「〜だよ!」「〜だね!」「〜なんだね」等の可愛いタメ口+ネットスラングもよく使う。テンション高め。絵文字を多用する。
""",
    },
    "Ollama": {
        "url": "http://localhost:11434/v1",
        "model": "gemma3-4b",
        "api_key": "ollama",
    },
}
DEFAULT_API = API_CONFIGS["Groq"]

CHROME_DEBUG_PORT = 9222

# --- 🧠 Human-in-the-Loop (HITL) Wrapper ---
from pydantic import BaseModel, PrivateAttr, Field
from typing import Literal, List, Any, Optional, Dict
# browser_use imports removed for lazy loading

# レスポンス構造定義 (Instructor用)


class AgentResponse(BaseModel):
    thought: str = Field(
        ..., description="思考プロセス、現状分析、またはユーザーへの返答(日本語)"
    )
    action: Optional[Dict[str, Any]] = Field(
        None, description="ブラウザ操作に必要なJSONデータ。操作不要ならnull。"
    )


class HitlChatModel:
    """
    Browser-Use (v0.x) の BaseChatModel プロトコルに準拠しつつ、
    Instructorライブラリを使用して堅牢な構造化出力を実現するカスタムモデル
    """

    def __init__(
        self,
        worker_ref,
        http_async_client,
        base_url,
        model_name,
        api_key="cant-be-empty",
        custom_system_prompt=None,
        **kwargs,
    ):
        self.model = model_name
        self.api_key = api_key
        self._worker_ref = worker_ref
        self._base_url = base_url
        self._http_client = http_async_client
        self.custom_system_prompt = custom_system_prompt

        # InstructorでOpenAIクライアントをラップ
        import instructor
        from openai import AsyncOpenAI
        
        self.client = instructor.from_openai(
            AsyncOpenAI(
                base_url=base_url, api_key=api_key, http_client=http_async_client
            ),
            mode=instructor.Mode.MD_JSON,
        )

    @property
    def provider(self) -> str:
        return "openai"

    @property
    def name(self) -> str:
        return "HitlChatModel"

    @property
    def model_name(self) -> str:
        return self.model

    async def ainvoke(self, messages: list, output_format=None, **kwargs):
        print(f"[LLM] ainvoke called. output_format={output_format}")
        from browser_use.llm.messages import (
            UserMessage,
            AssistantMessage,
            UserMessage,
            AssistantMessage,
            SystemMessage,
        )
        from browser_use.llm.views import (
            ChatInvokeCompletion,
            ChatInvokeUsage,
        )

        detected_json = None  # 初期化

        # HITLループ用のメッセージリストコピー
        current_messages = list(messages)
        loop = asyncio.get_running_loop()

        # PydanticAI風の「思考」と「アクション」を分離するスキーマ定義
        schema_instruction = """
【重要】出力は必ず以下のJSONフォーマットのみで行ってください。
Markdownコードブロックは不要です。
必ず有効なJSONで出力してください。

{
  "thought": "状況分析や思考プロセス、ユーザーへの返答(日本語)",
  "action": { ... } // ブラウザ操作に必要なJSONデータ。操作不要ならnull
}

【禁止事項】
1. ユーザーの返答をシミュレーションすること(「承知しました」などと自分で答えてはいけません)。
2. ユーザーに質問をする場合、"action"は必ず null にして、ユーザーの返答を待ってください。
3. 一度の応答で「質問」と「アクション」を同時に行わないでください。
"""
        # Browser-Use用により強力な制約:雑談はすべてthoughtに入れ、外側には一切何も書かないように指示
        schema_instruction_json_only = """
【重要:出力形式の厳守】
必ず以下のJSON形式でのみ回答してください。解説や挨拶をJSONの外に出してはいけません。
また、ブラウザ操作が必要な場合は、正しいフィールド名を使用してアクションを記述してください。

{
  "thinking": "現在の状況分析と次のステップの計画(必ず日本語で「あいちゃん」として記述)",
  "evaluation_previous_goal": "前回の目標の達成度評価(日本語)",
  "memory": "これまでに分かった重要な情報のまとめ(日本語)",
  "next_goal": "次に達成すべき具体的な目標(日本語)",
  "action": [
    { "navigate": { "url": "https://..." } },
    { "input": { "index": 1, "text": "..." } },
    { "click": { "index": 1 } },
    { "switch_tab": { "tab_id": "..." } },
    { "wait": { "seconds": 5 } },
    { "scroll": { "amount": 500 } },
    { "done": { "text": "完了のメッセージ" } }
  ]
}

【最優先事項】
1. ユーザーの依頼(履歴内の  タグなどを参照)を直ちに実行してください。「準備完了」「指示待ち」と報告するだけではタスク未完了とみなされます。
2. 初手は必ず `Maps` (検索・移動) か、状況に応じた具体的なアクションを行ってください。
3. あなたは「あいちゃん(国内のAI狂い)」として、司令官に親しみやすく報告してください。

【禁止事項】
1. "action" の中にさらに "action" を入れ子にしてはいけません。アクションは必ずフラットな配列です。
2. "extract", "scroll" などを他のアクションの引数として使ってはいけません。
3. "click" や "input" などのアクションは独立したオブジェクトとして配列に並べてください。
4. まだ何もしていないのに `done` アクションを使ってはいけません。
5. インデックス番号(index)は必ず【1から始まる整数】を使用してください。0を使ってはいけません。

【リカバリープロトコル(重要)】
1. ページが「情報なし」「空」と表示される場合は、まずは数秒待つ(wait)か、下にスクロール(scroll)してコンテンツが読み込まれるのを待ってください。
2. Google検索結果などの動的なページでは、`extract` ツールを使わずに、まずはページのテキスト情報を直接読み取って判断することを優先してください。
3. thinking, evaluation_previous_goal, memory, next_goal は必ず【日本語】で記述してください。
"""

        # 0. 変数の初期化(NameError防止)
        detected_json = None
        detected_thought = ""
        detected_action = None
        final_content = ""

        # 1. 共通のメッセージ変換
        openai_messages = []
        # カスタムシステムプロンプト(ペルソナ)があれば最初に追加
        if self.custom_system_prompt:
            openai_messages.append(
                {"role": "system", "content": self.custom_system_prompt}
            )

        for m in current_messages:
            role = "user"
            content = ""
            if isinstance(m, UserMessage):
                role = "user"
                content = (
                    m.content
                    if isinstance(m.content, str)
                    else "\n".join([p.text for p in m.content if hasattr(p, "text")])
                )
            elif isinstance(m, SystemMessage):
                role = "system"
                content = (
                    m.content
                    if isinstance(m.content, str)
                    else "\n".join([p.text for p in m.content if hasattr(p, "text")])
                )
            elif isinstance(m, AssistantMessage):
                role = "assistant"
                content = m.content if isinstance(m.content, str) else ""

            # ペルソナ指示がシステムプロンプトとして既にある場合は重複させない
            if (
                role == "system"
                and self.custom_system_prompt
                and self.custom_system_prompt in content
            ):
                continue

            openai_messages.append({"role": role, "content": content})

        # --- [NEW] Token Saving Strategy (History Truncation) ---
        # 重要: Browser-UseのSystem Prompt(ツール定義)を消さないように、Systemメッセージは全て維持する
        MAX_HISTORY = 10

        system_messages = [m for m in openai_messages if m["role"] == "system"]
        other_messages = [m for m in openai_messages if m["role"] != "system"]

        if len(other_messages) > MAX_HISTORY:
            # 最新のN個を残す
            # self._worker_ref.log_signal.emit(f"✂️ History Truncated: {len(other_messages)} -> {MAX_HISTORY}")
            other_messages = other_messages[-MAX_HISTORY:]

        # 再構築: System (Context/Tools) + Recent History
        openai_messages = system_messages + other_messages
        # --------------------------------------------------------

        # ペルソナを最後にも追加して優先度を上げる
        if self.custom_system_prompt:
            openai_messages.append(
                {"role": "system", "content": self.custom_system_prompt}
            )

        # ブラウザ操作モード(output_formatあり)の場合、フォーマットに応じた制約を挿入
        is_agent_output = False
        is_judgement_result = False

        if output_format:
            type_name = getattr(output_format, "__name__", str(output_format))

            if "AgentOutput" in type_name:
                is_agent_output = True

                # ダイナミックに現在のタスクをシステムプロンプトに注入する(Kaggle対策)
                current_schema = schema_instruction_json_only
                if hasattr(self, "_worker_ref") and self._worker_ref.current_task_text:
                    task_injection = f"\n【現在の指令 (Current Request)】\n{self._worker_ref.current_task_text}\n\nこの指令を直ちに実行してください。「準備完了」と答える必要はありません。"
                    current_schema += task_injection

                openai_messages.append({"role": "system", "content": current_schema})
            elif "JudgementResult" in type_name:
                is_judgement_result = True
                judgement_prompt = """
【重要:タスク完了判定】
現在の状況が、当初の目標(ユーザーの依頼)を達成したかどうかを判定してください。
以下のJSON形式のみで回答してください。

{
  "verdict": true, // タスクが完了していれば true, まだなら false
  "reasoning": "判定の理由(日本語)",
  "failure_reason": "", // 失敗した場合の理由(成功なら空文字)
  "impossible_task": false, // タスクが不可能な場合 true
  "reached_captcha": false // CAPTCHAに遭遇した場合 true
}
"""
                openai_messages.append({"role": "system", "content": judgement_prompt})
            else:
                # Other formats (default generic JSON instruction)
                openai_messages.append(
                    {
                        "role": "system",
                        "content": "Return valid JSON matching the requested schema.",
                    }
                )

        final_content = ""

        # Stop Request Check
        if self._worker_ref.stop_requested:
            raise InterruptedError("User requested stop")

        # Validation Retry Loop (Self-Correction)
        max_validation_retries = 3
        validation_attempt = 0

        while validation_attempt < max_validation_retries:
            try:
                if validation_attempt > 0:
                    self._worker_ref.emit_log(
                        f"🔄 Self-Correction Attempt {validation_attempt}/{max_validation_retries}"
                    )

                # DEBUG: Log Message Structure
                msg_structure = []
                for m in openai_messages:
                    content_preview = str(m.get("content", ""))[:50].replace(
                        "\n", "\\n"
                    )
                    msg_structure.append(f"{m['role']}: {content_preview}...")
                # self._worker_ref.emit_log(f"🧠 Prompt Structure: {msg_structure}") # UIログには出さない
                print(
                    f"[WORKER DEBUG] 🧠 Prompt Structure: {msg_structure}"
                )  # コンソールのみ

                # API Call Preparation
                payload = {
                    "model": self.model,
                    "messages": openai_messages,
                    "stream": True,
                    "temperature": 0.0,
                    # "response_format": {"type": "json_object"}
                }

                url = self._base_url
                if not url.endswith("/chat/completions"):
                    url = f"{url.rstrip('/')}/chat/completions"

                # ヘッダーにAPIキーを追加
                headers = {"Authorization": f"Bearer {self.api_key}"}

                # Retry Loop for 429 Errors (Network Level)
                max_retries = 3
                retry_count = 0
                final_content = ""  # Moved initialization here

                while retry_count < max_retries:
                    try:
                        # self._worker_ref.log_signal.emit(f"🔄 Attempt {retry_count+1}/{max_retries}")
                        async with self._http_client.stream(
                            "POST", url, json=payload, headers=headers
                        ) as response:
                            if response.status_code == 429:
                                error_text = await response.aread()
                                error_str = error_text.decode("utf-8")
                                self._worker_ref.emit_log(
                                    f"⏳ Rate Limit (429) Reached. Waiting..."
                                )

                                # Parse wait time from error message if possible "Please try again in 9.4s"
                                wait_time = 10  # Default
                                import re

                                match = re.search(
                                    r"try again in (\d+(\.\d+)?)s", error_str
                                )
                                if match:
                                    wait_time = float(match.group(1)) + 1.0  # Buffer

                                self._worker_ref.emit_log(
                                    f"💤 Sleeping for {wait_time:.1f}s..."
                                )
                                await asyncio.sleep(wait_time)
                                retry_count += 1
                                continue  # Retry

                            if response.status_code != 200:
                                error_text = await response.aread()
                                self._worker_ref.emit_log(
                                    f"❌ HTTP Error {response.status_code}: {error_text}"
                                )
                                final_content = f"Error {response.status_code}: {error_text.decode('utf-8')}"
                                break  # Don't retry other errors
                            else:
                                # Success! Process stream
                                content_type = response.headers.get(
                                    "content-type", ""
                                ).lower()

                                if "event-stream" in content_type:
                                    # self._worker_ref.log_signal.emit("🐛 Mode: SSE (Server-Sent Events)")
                                    # SSE Handling
                                    async for line in response.aiter_lines():
                                        if not line:
                                            continue
                                        if line.startswith("data: "):
                                            if line.startswith("data: [DONE]"):
                                                break
                                            json_str = line.replace(
                                                "data: ", ""
                                            ).strip()
                                            if not json_str:
                                                continue
                                            try:
                                                chunk_data = json.loads(json_str)
                                                delta = ""
                                                if (
                                                    "choices" in chunk_data
                                                    and len(chunk_data["choices"]) > 0
                                                ):
                                                    delta = (
                                                        chunk_data["choices"][0]
                                                        .get("delta", {})
                                                        .get("content", "")
                                                    )

                                                if delta:
                                                    final_content += delta
                                                    # ブラウザ操作モード(output_formatあり)の場合は生のJSONストリームを非表示にする
                                                    if not output_format:
                                                        self._worker_ref.stream_signal.emit(
                                                            delta, "response"
                                                        )
                                            except:
                                                pass
                                else:
                                    # self._worker_ref.log_signal.emit("🐛 Mode: Raw Stream (text/plain)")
                                    # Raw Text Streaming (for text/plain)
                                    import codecs

                                    decoder = codecs.getincrementaldecoder("utf-8")(
                                        errors="replace"
                                    )
                                    async for chunk in response.aiter_bytes():
                                        text_chunk = decoder.decode(chunk, final=False)
                                        if text_chunk:
                                            final_content += text_chunk
                                            # ブラウザ操作モード(output_formatあり)の場合は生のJSONストリームを非表示にする
                                            if not output_format:
                                                self._worker_ref.stream_signal.emit(
                                                    text_chunk, "response"
                                                )
                                    remaining = decoder.decode(b"", final=True)
                                    if remaining:
                                        final_content += remaining
                                        self._worker_ref.stream_signal.emit(
                                            remaining, "response"
                                        )

                                # Success, break retry loop
                                break

                    except Exception as e:
                        self._worker_ref.emit_log(f"⚠️ Network/API Error: {e}")
                        import traceback

                        traceback.print_exc()
                        if retry_count < max_retries - 1:
                            await asyncio.sleep(5)
                            retry_count += 1
                            continue
                        else:
                            raise e

                # If loop finished without success (and empty final_content if mistakenly handled)
                if not final_content and retry_count >= max_retries:
                    final_content = "Error: Max retries exceeded or API failed."

            except Exception as e:
                self._worker_ref.emit_log(f"❌ API Request Failed completely: {e}")
                import traceback

                self._worker_ref.emit_log(f"{traceback.format_exc()}")
                return ChatInvokeCompletion(completion=f"Error: {e}", usage=None)

            # ... End of API Call ...

            # 4. Result Processing & Validation
            completion_value = final_content

            if not output_format:
                return ChatInvokeCompletion(completion=completion_value, usage=None)

            # Pydantic Validation with Retry
            self._worker_ref.emit_log(
                f"🔍 Pydanticバリデーション開始... ({getattr(output_format, '__name__', 'Unknown Type')})"
            )
            # DEBUG: Log raw content for debugging Kaggle/Gemma
            if len(final_content) > 500:
                self._worker_ref.emit_log(
                    f"📄 Raw Output (Head): {final_content[:500]}..."
                )
            else:
                self._worker_ref.emit_log(f"📄 Raw Output: {final_content}")

            try:
                # --- [CASE 1] AgentOutput (Browser Actions) ---
                if is_agent_output:
                    # JSON Extraction
                    detected_json = None
                    detected_thought = ""
                    detected_action = None

                    import re

                    extracted_obj = None
                    try:
                        extracted_obj = json.loads(final_content)
                    except:
                        # Try greedy regex
                        json_match = re.search(r"(\{.*\})", final_content, re.DOTALL)
                        if json_match:
                            extracted_obj = json.loads(json_match.group(1))

                    if extracted_obj:
                        detected_thought = extracted_obj.get(
                            "thinking"
                        ) or extracted_obj.get("thought", "")
                        detected_action = extracted_obj.get("action", None)
                        if not detected_action:
                            self._worker_ref.emit_log(
                                f"⚠️ Action key missing in JSON! Defaulting to wait. Keys found: {list(extracted_obj.keys())}"
                            )

                    if detected_thought:
                        self._worker_ref.emit_log(f"\n🧠 AIの思考: {detected_thought}")

                    # 承認モードチェック (Approval Mode Check)
                    if self._worker_ref.approval_mode:
                        self._worker_ref.proceed_event.clear()
                        self._worker_ref.emit_log(
                            "✋ 承認待ち... (承認または再開ボタンを押してください)"
                        )

                    # 一時停止チェック (共通)
                    if not self._worker_ref.proceed_event.is_set():
                        if not self._worker_ref.approval_mode:  # If manual pause
                            self._worker_ref.emit_log(
                                "⏸️ 一時停止中... (スペースキーで再開)"
                            )

                        while not self._worker_ref.proceed_event.is_set():
                            await asyncio.sleep(0.5)

                    # Action Validation and HEALING
                    actions = []
                    if detected_action:
                        if isinstance(detected_action, dict):
                            if detected_action:
                                actions = [detected_action]
                        elif isinstance(detected_action, list):
                            actions = detected_action

                    # --- [HEALING MAGIC] ---
                    def heal_action(act):
                        if not isinstance(act, dict):
                            return act
                        # 1. scroll_down -> scroll
                        if "scroll_down" in act:
                            val = act.pop("scroll_down")
                            # amount -> amount (or pixels?) - schema check implies amount usually?
                            # But user said scroll_pixel/pixels. Registry said 'scroll'.
                            # Let's assume 'scroll' takes 'amount' or 'pixels' based on typical libs.
                            # Browser-use scroll usually takes {amount: int} or {scroll_amount: int} ??
                            # Let's trust the schema inspection if we had it, but for now map broadly.
                            if isinstance(val, dict):
                                act["scroll"] = val
                        # 2. click_element -> click
                        if "click_element" in act:
                            val = act.pop("click_element")
                            act["click"] = val
                        # 3. input_text -> input
                        if "input_text" in act:
                            val = act.pop("input_text")
                            act["input"] = val
                        return act

                    actions = [heal_action(a) for a in actions]
                    # -----------------------

                    if not actions:
                        actions = [{"wait": {"seconds": 1}}]

                    mapped_response = {
                        "thinking": detected_thought or "Analyzing...",
                        "evaluation_previous_goal": detected_thought
                        or "Situation analysis...",
                        "memory": "Hitl memory",
                        "next_goal": "Continuing task",
                        "action": actions,
                    }

                    completion_value = output_format.model_validate(mapped_response)
                    self._worker_ref.emit_log("✅ バリデーション成功 (AgentOutput)")

                    return ChatInvokeCompletion(
                        completion=completion_value,
                        usage=ChatInvokeUsage(
                            prompt_tokens=0,
                            completion_tokens=0,
                            total_tokens=0,
                            prompt_cached_tokens=0,
                            prompt_cache_creation_tokens=0,
                            prompt_image_tokens=0,
                        ),
                    )

                # --- [CASE 2] JudgementResult ---
                elif is_judgement_result:
                    import re

                    extracted_obj = None
                    try:
                        extracted_obj = json.loads(final_content)
                    except:
                        json_match = re.search(r"(\{.*\})", final_content, re.DOTALL)
                        if json_match:
                            extracted_obj = json.loads(json_match.group(1))

                    if extracted_obj:
                        completion_value = output_format.model_validate(extracted_obj)
                        self._worker_ref.emit_log(
                            f"✅ バリデーション成功 (JudgementResult): verdict={extracted_obj.get('verdict')}"
                        )
                        return ChatInvokeCompletion(
                            completion=completion_value,
                            usage=ChatInvokeUsage(
                                prompt_tokens=0,
                                completion_tokens=0,
                                total_tokens=0,
                                prompt_cached_tokens=0,
                                prompt_cache_creation_tokens=0,
                                prompt_image_tokens=0,
                            ),
                        )
                    else:
                        raise ValueError("No JSON found for Judgement")

                # --- [CASE 3] Other Formats ---
                else:
                    import re

                    extracted_obj = None
                    try:
                        extracted_obj = json.loads(final_content)
                    except:
                        json_match = re.search(r"(\{.*\})", final_content, re.DOTALL)
                        if json_match:
                            extracted_obj = json.loads(json_match.group(1))
                    if extracted_obj:
                        completion_value = output_format.model_validate(extracted_obj)
                        self._worker_ref.emit_log("✅ バリデーション成功 (Generic)")
                        return ChatInvokeCompletion(
                            completion=completion_value,
                            usage=ChatInvokeUsage(
                                prompt_tokens=0,
                                completion_tokens=0,
                                total_tokens=0,
                                prompt_cached_tokens=0,
                                prompt_cache_creation_tokens=0,
                                prompt_image_tokens=0,
                            ),
                        )
                    else:
                        raise ValueError("No JSON found")

            except Exception as e:
                self._worker_ref.emit_log(f"⚠️ Validation Failed: {e}")
                # Add error to history and retry
                validation_attempt += 1
                if validation_attempt < max_validation_retries:
                    error_msg = f"OUTPUT VALIDATION ERROR: {str(e)}\n\nPlease correct your JSON output to match the schema exactly."
                    self._worker_ref.emit_log("↩️ Feedback sent to AI. Retrying...")
                    openai_messages.append(
                        {"role": "assistant", "content": final_content}
                    )
                    openai_messages.append({"role": "user", "content": error_msg})
                    continue  # Loop back to network call
                else:
                    # Final Fallback
                    self._worker_ref.emit_log(
                        "❌ All retries failed. Returning fallback."
                    )
                    if is_agent_output:
                        completion_value = output_format(
                            thinking="Error recovery",
                            evaluation_previous_goal="Validation Error",
                            memory="Recovered",
                            next_goal="Retry",
                            action=[{"wait": {"seconds": 1}}],
                        )
                        return ChatInvokeCompletion(
                            completion=completion_value,
                            usage=ChatInvokeUsage(
                                prompt_tokens=0,
                                completion_tokens=0,
                                total_tokens=0,
                                prompt_cached_tokens=0,
                                prompt_cache_creation_tokens=0,
                                prompt_image_tokens=0,
                            ),
                        )
                    raise e

        # End of Loop logic (should not be reached if returns happened)
        return ChatInvokeCompletion(
            completion=f"Error: Loop exhausted",
            usage=ChatInvokeUsage(
                prompt_tokens=0,
                completion_tokens=0,
                total_tokens=0,
                prompt_cached_tokens=0,
                prompt_cache_creation_tokens=0,
                prompt_image_tokens=0,
            ),
        )


# --- 🧵 ブラウザ&AIを管理する常駐スレッド ---

# ==========================================
# Custom Agno Model for Kaggle (text/plain)
# ==========================================
from agno.models.base import Model
from agno.models.response import ModelResponse
from agno.tools.duckduckgo import DuckDuckGoTools
from typing import Iterator, Optional, Any, Dict, List
import httpx

# ==========================================
# Custom Agno Model for Kaggle (Fixed for Tool Usage)
# ==========================================
from typing import Iterator, Any, List


class KaggleCustomModel(Model):
    """
    Kaggle (via Ngrok) Custom Model Wrapper for Agno.
    Forces tool usage via strong prompt injection and robust parsing.
    """

    id: str = "kaggle-custom"
    name: str = "KaggleGemma"
    provider: str = "Kaggle"

    def __init__(self, id: str, api_key: str, base_url: str, **kwargs):
        super().__init__(id=id, **kwargs)
        self.api_key = api_key
        self.base_url = base_url
        # タイムアウトを少し長めに設定
        self.client = httpx.Client(
            headers={
                "Authorization": f"Bearer {api_key}",
                "ngrok-skip-browser-warning": "true",
            },
            verify=False,
            timeout=120.0,
        )
        self.async_client = httpx.AsyncClient(
            headers={
                "Authorization": f"Bearer {api_key}",
                "ngrok-skip-browser-warning": "true",
            },
            verify=False,
            timeout=120.0,
        )

    def invoke(self, *args, **kwargs) -> ModelResponse:
        return ModelResponse(content="Sync invoke not implemented.")

    async def ainvoke(self, *args, **kwargs) -> ModelResponse:
        return ModelResponse(content="Async invoke not implemented.")

    def invoke_stream(self, messages, **kwargs) -> Iterator[ModelResponse]:
        # 1. メッセージの構築 (これがないとUnboundLocalErrorになる)
        payload_messages = []
        for m in messages:
            payload_messages.append({"role": m.role, "content": m.content})

        # 2. 強制ツール使用プロンプト (思考プロセス優先)
        tool_prompt = """
\n
[SYSTEM: STRICT TOOL USAGE]
You are an advanced AI agent with access to real-time tools.
1. **Analyze** the user's request.
2. **Think** about what information is needed (...).
3. **Execute** a tool immediately if needed using the format below.
4. **DO NOT** just say "I will check". ACTIONS SPEAK LOUDER THAN WORDS.

[AVAILABLE TOOLS]
- robust_web_search(query: str): Search the web.
- robust_news_search(query: str): Search for news.

[FORMAT REQUIRED]
Reasoning here...
[[TOOL_CALL: {"name": "tool_name", "arguments": {"arg_name": "value"}}]]

[EXAMPLES]
User: "Search for Python"
Assistant: User wants to know about Python. I will search for it.
[[TOOL_CALL: {"name": "robust_web_search", "arguments": {"query": "Python"}}]]

User: "Tokyo Weather"
Assistant: I need to check the weather in Tokyo.
[[TOOL_CALL: {"name": "robust_web_search", "arguments": {"query": "Tokyo Weather"}}]]
"""
        # システムプロンプトの内容を収集して、USERメッセージに統合する
        # (Ollama/GemmaはSystem Roleを無視する傾向があるため、User発言にねじ込む)
        system_instructions = []
        user_messages_indices = []
        
        # 1. Systemメッセージを抽出
        final_payload = []
        for i, msg in enumerate(payload_messages):
            if msg["role"] == "system":
                system_instructions.append(msg["content"])
            else:
                final_payload.append(msg)
                if msg["role"] == "user":
                    user_messages_indices.append(len(final_payload) - 1)

        # 2. Tool PromptもSystem Instructionに追加
        system_instructions.append(tool_prompt)
        full_system_prompt = "\n\n".join(system_instructions)

        # 3. 最後のUserメッセージに結合 (Recency Biasを利用)
        if user_messages_indices:
            last_user_idx = user_messages_indices[-1]
            original_content = final_payload[last_user_idx]["content"]
            
            # 「指示 + ユーザー発言」の形にする
            new_content = f"{full_system_prompt}\n\n---\n\n[USER REQUEST]\n{original_content}"
            final_payload[last_user_idx]["content"] = new_content
        else:
            # Userメッセージがない場合(稀)、Userとして先頭に追加
            final_payload.insert(0, {"role": "user", "content": full_system_prompt})

        # ペイロードを入れ替え
        payload_messages = final_payload

        # DEBUG: APIに送るメッセージを確認
        print("-" * 50)
        print("[KaggleCustomModel] Sending Messages to API:")
        for m in payload_messages:
            content_str = str(m.get('content', '')) if m.get('content') is not None else "[NO CONTENT]"
            print(f"[{m['role'].upper()}] {content_str[:300]}...") 
        print("-" * 50)

        # 3. APIリクエスト
        url = self.base_url
        if not url.endswith("/chat/completions"):
            url = f"{url.rstrip('/')}/chat/completions"

        payload = {
            "model": self.id,
            "messages": payload_messages,
            "stream": True,
            "temperature": 0.7,  # ツール使用時は創造性を下げる -> User requested restore
        }

        try:
            with self.client.stream("POST", url, json=payload) as response:
                if response.status_code != 200:
                    try:
                        err = response.read().decode("utf-8")
                    except:
                        err = "Unknown"
                    yield ModelResponse(content=f"Error {response.status_code}: {err}")
                    return

                # SSE Parsing Logic
                buffer = ""
                
                for line in response.iter_lines():
                    if not line:
                        continue
                    
                    # Handle data: lines
                    if line.startswith("data: "):
                        if line.startswith("data: [DONE]"):
                            break
                        
                        json_str_sse = line.replace("data: ", "").strip()
                        if not json_str_sse:
                            continue
                            
                        try:
                            chunk_data = json.loads(json_str_sse)
                            text_chunk = ""
                            # Extract content from OpenAI-compatible chunk
                            if "choices" in chunk_data and len(chunk_data["choices"]) > 0:
                                delta = chunk_data["choices"][0].get("delta", {})
                                text_chunk = delta.get("content", "")
                            
                            if not text_chunk:
                                continue

                            buffer += text_chunk

                            # 🛠️ ツールコールのパースロジック (強化版)
                            # "[[TOOL_CALL:" がバッファに含まれているか監視
                            if "[[TOOL_CALL:" in buffer:
                                # 抽出(閉じ括弧がなくても、JSONとしてパースできればOKとする緩和策)
                                try:
                                    start_index = buffer.find("[[TOOL_CALL:")
                                    if start_index != -1:
                                        potential_json_area = buffer[start_index + 12:].strip()
                                        
                                        # 閉じ括弧 "]]" を探す(あれば確実)
                                        end_index = potential_json_area.find("]]")
                                        
                                        json_str = ""
                                        match_text = ""
                                        
                                        if end_index != -1:
                                            # "]]" がある場合
                                            json_str = potential_json_area[:end_index]
                                            match_text = buffer[start_index : start_index + 12 + end_index + 2]
                                        else:
                                            # "]]" がない場合、閉じ括弧 "}" までを無理やり抽出してトライ
                                            # 末尾にある場合のみ
                                            last_brace = potential_json_area.rfind("}")
                                            if last_brace != -1:
                                                json_str = potential_json_area[:last_brace+1]
                                                match_text = buffer[start_index:] # 最後まで全部

                                        if json_str:
                                            tool_data = json.loads(json_str)
                                            # Agnoにツールコールとして認識させる
                                            yield ModelResponse(
                                                tool_calls=[
                                                    {
                                                        "id": f"call_{len(json_str)}",
                                                        "type": "function",
                                                        "function": {
                                                            "name": tool_data["name"],
                                                            "arguments": json.dumps(
                                                                tool_data["arguments"]
                                                            ),
                                                        },
                                                    }
                                                ]
                                            )
                                            # ツールコール部分を除去して残りを処理
                                            buffer = buffer.replace(match_text, "")
                                except Exception:
                                    # JSON復元トライアル(閉じ括弧不足への最終抵抗)
                                    if json_str:
                                        repairs = ["}", "}}", '"}}', '"]}', '"]}]', '"]}]']
                                        parsed = False
                                        for repair in repairs:
                                            try:
                                                repaired_json = json_str + repair
                                                tool_data = json.loads(repaired_json)
                                                # 成功したらyieldして抜ける
                                                yield ModelResponse(
                                                    tool_calls=[
                                                        {
                                                            "id": f"call_{len(json_str)}",
                                                            "type": "function",
                                                            "function": {
                                                                "name": tool_data["name"],
                                                                "arguments": json.dumps(tool_data["arguments"]),
                                                            },
                                                        }
                                                    ]
                                                )
                                                buffer = buffer.replace(match_text, "")
                                                parsed = True
                                                break
                                            except:
                                                continue
                                        
                                        if parsed:
                                            pass
                                        else:
                                            # それでもダメなら待つ
                                            pass


                            # ツールコールの気配がないなら、ある程度溜まったら吐き出す (ストリーミング感を出す)
                            elif len(buffer) > 100:
                                # タグの開始部分 "[" が末尾にあるかもしれないので、そこだけ残す
                                last_bracket = buffer.rfind("[")
                                if last_bracket != -1 and len(buffer) - last_bracket < 15:
                                    to_yield = buffer[:last_bracket]
                                    buffer = buffer[last_bracket:]
                                    if to_yield:
                                        yield ModelResponse(content=to_yield)
                                else:
                                    yield ModelResponse(content=buffer)
                                    buffer = ""
                        except:
                            pass

                # 残りのバッファを処理
                if buffer:
                    # 残りにツールコールがあるか最終確認
                    pattern = r"\[\[TOOL_CALL:\s*(\{.*?\})\s*\]\]"
                    match = re.search(pattern, buffer, re.DOTALL)
                    if match:
                        try:
                            json_str = match.group(1)
                            tool_data = json.loads(json_str)
                            yield ModelResponse(
                                tool_calls=[
                                    {
                                        "id": "final_call",
                                        "type": "function",
                                        "function": {
                                            "name": tool_data["name"],
                                            "arguments": json.dumps(
                                                tool_data["arguments"]
                                            ),
                                        },
                                    }
                                ]
                            )
                            buffer = ""  # 処理済み
                        except:
                            pass

                    if buffer:
                        yield ModelResponse(content=buffer)

        except Exception as e:
            yield ModelResponse(content=f"Connection Error: {e}")

    # async版は今回はsync版(AgnoAgent.run)が呼ばれる構造のようなので省略可ですが、
    # エラー回避のため空実装または同様の実装をしておきます
    async def ainvoke_stream(self, messages, **kwargs) -> Any:
        yield ModelResponse(
            content="Async stream not configured in this patch. Use sync run."
        )

    def _parse_provider_response(self, response: Any, **kwargs) -> ModelResponse:
        return ModelResponse()

    def _parse_provider_response_delta(self, response: Any) -> ModelResponse:
        return ModelResponse()


class BrowserWorker(QThread):
    log_signal = pyqtSignal(str)
    stream_signal = pyqtSignal(str, str)  # content, msg_type
    ready_signal = pyqtSignal()

    def __init__(self, api_config):
        super().__init__()
        self.api_config = api_config
        self.current_api_name = "Groq"  # Default to Groq or derive from config?
        # If API_CONFIGS is a dict and api_config is one value, we can't easily reverse lookup name.
        # But 'api_config' passed from AiCommanderPyQt is likely DEFAULT_API which is just a dict.
        # However, earlier code initialized with `api_name` string.
        # Let's revert to taking `api_name` if possible or set default.
        # Actually, AiCommanderPyQt passes `DEFAULT_API` which is `API_CONFIGS["Groq"]`.
        # So we should probably set default name to "Groq" or pass name.

        # But wait, logic depends on self.current_api_name.
        # Let's verify AiCommanderPyQt init. It passes `DEFAULT_API` (dict).
        # So I'll hardcode "Groq" compatible default or try to guess.
        # Better: Restore attributes.

        self.task_queue = queue.Queue()
        self.is_running = True
        self.proceed_event = asyncio.Event()
        self.proceed_event.set()
        self.approval_mode = False
        self.api_changed = False
        self.agno_agent = None
        self.shared_browser_session = None
        self.browser = None
        self.session_context = []  # List of (task, result) tuples
        self.agno_session_id = None  # Session ID for Agno memory persistence

        # Chat History
        self.chat_history = []

        # LLM instance (initialized in _async_run)
        self.llm = None

    def update_api_config(self, api_name, new_config):
        self.current_api_name = api_name
        self.api_config = new_config
        self.api_changed = True
        self.emit_log(f"⚙️ 接続先設定を更新: {api_name}")

    def emit_log(self, text):
        # UIへの信号送信と、念のためコンソール(ターミナル)にも出力
        print(f"[WORKER DEBUG] {text}")
        self.log_signal.emit(text)

    def run(self):
        try:
            print("[WORKER] Thread started. Setting up event loop...")
            # Windows Async Loop Policy
            if sys.platform == "win32":
                asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())

            asyncio.run(self._async_run())
        except Exception as e:
            print(f"[WORKER CRITICAL ERROR] {e}")
            import traceback

            traceback.print_exc()
            # UIにも送れるか試行(不確定だが)
            try:
                self.log_signal.emit(f"❌ Worker Thread Crashed: {e}")
            except:
                pass

    async def _async_run(self):
        try:
            from browser_use import Agent
            from playwright.async_api import async_playwright
            import httpx

            # Browser関連の柔軟なインポート
            try:
                from browser_use import Browser, BrowserConfig

                HAS_BROWSER_CONFIG = True
            except ImportError:
                HAS_BROWSER_CONFIG = False
                try:
                    from browser_use import Browser
                except ImportError:
                    from browser_use import BrowserSession as Browser  # Fallback
        except ImportError as e:
            self.emit_log(
                f"❌ インポートエラーが発生しました:\n{e}\n\npip install browser-use playwright httpx を実行してください。"
            )
            import traceback

            print(traceback.format_exc())
            return

        # self.emit_log("🚀 システム起動シーケンス開始")

        # Ngrok対策: カスタムHTTPクライアント
        async_client = httpx.AsyncClient(
            headers={"ngrok-skip-browser-warning": "true"},
            timeout=120.0,
            verify=False,  # 自己署名証明書なども許容
        )

        # LLM Initialize with HITL Wrapper
        # self.emit_log(f"⏳ AI接続中 ({self.current_api_name})...")
        self.llm = HitlChatModel(
            worker_ref=self,
            base_url=self.api_config["url"],
            model_name=self.api_config["model"],
            api_key=self.api_config["api_key"],
            custom_system_prompt=self.api_config.get("system_prompt"),
            http_async_client=async_client,
        )
        # self.emit_log(f"✅ AI接続完了: {self.current_api_name}")

        # Browser Initialization removed from here.
        # Now lazy-loaded in _run_browser_agent when task actually starts.


        # self.emit_log("🚀 システム準備完了: コマンド待機中")
        self.ready_signal.emit()

        # Initial Persona Setup
        self.chat_history = []  # シンプルに履歴のみ初期化
        last_heartbeat = 0

        while self.is_running:
            try:
                # 10秒おきに生存確認
                now = time.time()
                if now - last_heartbeat > 10:
                    print("[HEARTBEAT] Worker is alive...")
                    last_heartbeat = now

                # API切り替えチェック
                if self.api_changed:
                    # self.emit_log(f"🔄 API切替中... ({self.current_api_name})")
                    self.llm = HitlChatModel(
                        worker_ref=self,
                        base_url=self.api_config["url"],
                        model_name=self.api_config["model"],
                        api_key=self.api_config["api_key"],
                        custom_system_prompt=self.api_config.get("system_prompt"),
                        http_async_client=async_client,
                    )
                    self.api_changed = False
                    # self.emit_log(f"✅ AI接続完了: {self.current_api_name}")

                # Task Queue Processing
                try:
                    task_data = self.task_queue.get(timeout=0.1)

                    mode = task_data.get("mode")
                    text = task_data.get("text")  # specific for chat/browser

                    if mode == "code_run":
                        code = task_data.get("code")
                        self._run_docker_code(code)
                    elif mode == "agno_chat":
                         await self._handle_agno_chat(text)
                    elif mode == "chat":
                        img_data = task_data.get("image")
                        await self._handle_normal_chat(text, image=img_data)
                    elif mode == "open_interpreter":
                        # User wants "OI-like" behavior but SAFE implementation (Docker).
                        self.emit_log(
                            f"🧠 Open Interpreter (Safe Mode) Thinking... (Task: {text})"
                        )
                        await self._run_safe_oi_agent(text)

                    elif mode == "browser":
                        # Check Approval Mode
                        if self.approval_mode:
                            self.emit_log("✋ 承認待ち: 実行前にボタンを押してください")
                            self.proceed_event.clear()
                            # This will block the async loop, so we need to ensure it's handled correctly
                            while not self.proceed_event.is_set():
                                await asyncio.sleep(0.1)  # Yield control

                        await self._run_browser_agent(
                            self.llm, text, self.shared_browser_session
                        )

                    self.task_queue.task_done()

                except queue.Empty:
                    await asyncio.sleep(0.1)  # Yield control if queue is empty
                    continue

            except Exception as e:
                self.emit_log(f"❌ Worker Internal Error: {e}")
                import traceback

                self.emit_log(traceback.format_exc())
                await asyncio.sleep(1)  # 無限リピート防止

    async def _run_safe_oi_agent(self, task_text):
        """Run a manual ReAct loop for Open Interpreter (Safe Mode)."""
        self.emit_log(f"🚀 Safe OI Agent Started. (Task: {task_text})")

        # 1. Setup Context
        max_steps = 10
        history = [
            {
                "role": "system",
                "content": """
You are an Open Interpreter (Safe Mode).
Your goal is to complete the user's request by writing and executing Python code.
You have access to a logical Python environment (Docker container).
Do NOT assume you can see the user's screen or browser.
To create files, read files, or run scripts, you MUST output Python code.

Volume Mapping Constraint:
The Docker container's `/workspace` directory is mapped to the user's local path: `C:\\Users\\TAKUMA\\Desktop\\作業場`.
You ONLY have write permission to this folder (and subfolders).
You cannot access C:\\ or other system folders directly.

Protocol:
1. Plan: Briefly explain what you will do.
2. Code: Output a Python code block (```python ... ```).
3. Wait: The system will execute the code and return the output.
4. Observe: Analyze the output and decide the next step.
5. Final Answer: When done, state "Mission Completed" or the answer.

Current working directory: /workspace
""",
            },
            {"role": "user", "content": task_text},
        ]

        step = 0
        while step < max_steps:
            step += 1
            self.emit_log(f"🌀 Step {step}/{max_steps}")

            try:
                # 2. Call LLM
                from openai import AsyncOpenAI

                temp_client = AsyncOpenAI(
                    base_url=self.llm._base_url,
                    api_key=self.llm.api_key,
                    http_client=self.llm._http_client,
                )

                # Stream response
                stream = await temp_client.chat.completions.create(
                    model=self.llm.model, messages=history, stream=True
                )

                full_content = ""
                # self.emit_log("🤖 ", end="") # Start marker

                async for chunk in stream:
                    content = chunk.choices[0].delta.content
                    if content:
                        full_content += content
                        # Stream to UI
                        self.stream_signal.emit(content, "response")

                # self.emit_log(f"\n{full_content}")
                # But we might need to "finalize" the log entry?
                # BrowserWorker handles stream chunks by appending to a running buffer in UI.
                # When done, we usually just leave it?
                # Let's add a newline log to ensure next log starts fresh.
                self.emit_log("")

                history.append({"role": "assistant", "content": full_content})

                # 3. Parse Code Blocks
                import re

                code_blocks = re.findall(r"```python(.*?)```", full_content, re.DOTALL)

                if not code_blocks:
                    # No code, assuming conversation or done.
                    if "Mission Completed" in full_content or "完了" in full_content:
                        self.emit_log("✅ Task Completed.")
                        break
                    # If no code but not done, maybe asking for info?
                    # Just continue loop? Or wait for user?
                    # In a continuous loop, we should stop if no code is generated to avoid infinite rambling.
                    self.emit_log(
                        "❓ No code generated. Waiting for next user input (Loop End)."
                    )
                    break

                # 4. Execute Code
                for code in code_blocks:
                    code = code.strip()
                    self.emit_log("⚙️ Executing Python Code...")
                    output = self._run_docker_code(code)

                    # 5. Append Result
                    result_msg = f"Execution Result:\n{output}"
                    history.append({"role": "user", "content": result_msg})

            except Exception as e:
                self.emit_log(f"❌ Error in OI Loop: {e}")
                import traceback

                self.emit_log(traceback.format_exc())
                break

            try:
                # Update task logic for submit_task signature consistency
                pass
            except:
                pass

    def submit_task(self, text, mode="chat", image=None, code=None):
        """Submit a task to the queue with optional image or code."""
        task_data = {"text": text, "mode": mode}
        if image:
            task_data["image"] = image
        if code:
            task_data["code"] = code
        self.task_queue.put(task_data)

    async def _handle_normal_chat(self, text, image=None):
        """Simple chat handler without Agno/Tools."""
        try:
            self.emit_log(f"💬 Chat Request: {text}")
            
            # Simple Conversation Setup
            messages = []
            sys_prompt = self.api_config.get("system_prompt")
            if sys_prompt:
                messages.append({"role": "system", "content": sys_prompt})
            
            # User Content Construction (Text + Image)
            user_content = []
            user_content.append({"type": "text", "text": text})
            
            if image:
                user_content.append({
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/png;base64,{image}"
                    }
                })
                self.emit_log("🖼️ Image attached to request.")

            messages.append({"role": "user", "content": user_content})

            from openai import AsyncOpenAI
            # Re-use the http_client from self.llm to handle ngrok/timeouts
            temp_client = AsyncOpenAI(
                base_url=self.llm._base_url,
                api_key=self.llm.api_key,
                http_client=self.llm._http_client,
            )
            
            stream = await temp_client.chat.completions.create(
                model=self.llm.model,
                messages=messages,
                stream=True
            )
            
            full_content = ""
            async for chunk in stream:
                 if chunk.choices and len(chunk.choices) > 0:
                     content = chunk.choices[0].delta.content
                     if content:
                         full_content += content
                         self.stream_signal.emit(content, "response")
            
            self.emit_log("") # Newline to finalize UI stream
            # self.emit_log(f"✅ Chat Finished.")
            
        except Exception as e:
            self.emit_log(f"❌ Chat Error: {e}")
            import traceback
            self.emit_log(traceback.format_exc())

    async def _handle_agno_chat(self, text, silent_input=False):
        """Unified chat handler using Async Iterator for streaming with Agno Agent."""
        try:
            # Map API to Agno Model (Unified Logic)
            # Local imports for Lazy Loading
            from agno.agent import Agent as AgnoAgent
            from agno.models.groq import Groq
            from agno.models.google import Gemini
            from agno.models.openai import OpenAIChat
            from agno.models.ollama import Ollama
            
            if self.current_api_name == "Kaggle":
                model = KaggleCustomModel(
                    id=self.api_config["model"],
                    api_key=self.api_config["api_key"],
                    base_url=self.api_config["url"],
                )
            elif self.current_api_name == "Groq":
                model = Groq(
                    id=self.api_config.get("model"),
                    api_key=self.api_config.get("api_key"),
                )
            elif self.current_api_name == "Gemini":
                model = Gemini(
                    id=self.api_config.get("model"),
                    api_key=self.api_config.get("api_key"),
                )
            elif self.current_api_name == "Ollama":
                # Use "KaggleCustomModel" (Robust Manual Tool Injection) for Ollama
                # because my-gemma3 rejects the native 'tools' API parameter.
                model = KaggleCustomModel(
                    id=self.api_config.get("model"),
                    api_key=self.api_config.get("api_key"),
                    base_url=self.api_config.get("url"),
                )
            else:
                model = OpenAIChat(
                    id=self.api_config.get("model"),
                    api_key=self.api_config.get("api_key"),
                    base_url=self.api_config.get("url"),
                )

            base_instructions = self.api_config.get(
                "system_prompt", "You are a helpful AI assistant."
            )
            # Hint for thinking models
            instructions = (
                base_instructions
                + """

【重要な指示】
1. 回答は必ず `` と `` タグで囲んだ内部思考プロセスから始めてください。
2. あなたはツールを利用できます。ユーザーが検索、ニュース、情報を求めた場合、**即座に** `robust_web_search` または `robust_news_search` ツールを使用しなければなりません。
3. **最重要**: 許可を求めないでください。「検索しましょうか?」「これでいいですか?」と絶対に聞かないでください。
4. ただツールを実行してください。ユーザーが「はい」「OK」と言ったら、履歴を確認し、保留中のアクションを即座に実行してください。
5. 忠実なAIとして、行動が求められています。命令を確認せず、ただ実行してください。
6. ユーザーが「ニュース」を求めているが曖昧な場合(例:「最新ニュース」)、'最新ニュース'(トップニュース)を検索してください。
7. 優先順位: ツール使用 > 雑談。ただし、「こんにちは」や「やあ」のような単純な挨拶には、ツールなしで自然に返答してください。
"""
            )

            from typing import Union

            # from agno.tools import FunctionTool

            # 1. 文字列の "5" が来ても怒らない、頑丈な検索関数たち
            def robust_web_search(query: str, max_results: Union[str, int] = 5) -> str:
                """
                DuckDuckGoで一般的なWeb検索を行います。
                Args:
                    query (str): 検索したいキーワード。
                    max_results (Union[str, int]): 件数(デフォルト5)。
                """
                from duckduckgo_search import DDGS

                try:
                    safe_max = int(max_results)
                except:
                    safe_max = 5

                try:
                    with DDGS() as ddgs:
                        results = list(ddgs.text(keywords=query, max_results=safe_max))
                        if not results:
                            return "検索結果が見つかりませんでした。"
                        return str(results)
                except Exception as e:
                    return f"検索エラー: {e}"

            def robust_news_search(query: str, max_results: Union[str, int] = 5) -> str:
                """
                DuckDuckGoで最新のニュースを検索します。
                Args:
                    query (str): 検索したいニュースのキーワード。
                    max_results (Union[str, int]): 取得件数(デフォルト5)。文字列でもOK。
                """
                from duckduckgo_search import DDGS

                try:
                    # AIが "5" (文字) を送ってきても int に変換してあげる優しさ
                    safe_max = int(max_results)
                except:
                    safe_max = 5

                try:
                    with DDGS() as ddgs:
                        # 実際にニュースを検索
                        results = list(ddgs.news(keywords=query, max_results=safe_max))
                        
                        # 検索結果がゼロなら、クエリを簡易化してリトライ
                        if not results and query in ["TOP HEADLINES", "最新ニュース"]:
                            print("[News] Retry with 'news'...")
                            results = list(ddgs.news(keywords="news", max_results=safe_max))
                        
                        if not results:
                             # さらにリトライ
                            print(f"[News] Retry with fallback query '{query} news'...")
                            try:
                                results = list(ddgs.news(keywords=f"{query} news", max_results=safe_max))
                            except:
                                pass

                        if not results:
                            return "ニュースは見つかりませんでした。(DuckDuckGo API returned empty)"
                        return str(results)
                except Exception as e:
                    return f"ニュース検索中にエラーが発生しました: {e}"

            # 2. Agnoのエージェントにこのツールを持たせる
            from agno.db.sqlite.sqlite import SqliteDb

            if self.agno_agent is None or self.api_changed:
                # self.emit_log(
                #     f"🔄 Initializing AgnoAgent (API: {self.current_api_name})..."
                # )
                try:
                    self.agno_agent = AgnoAgent(
                        model=model,
                        instructions=instructions,
                        markdown=True,
                        tools=[robust_web_search, robust_news_search],
                        debug_mode=True,
                        add_history_to_context=True,
                        db=SqliteDb(
                            db_file="agno_history.db", session_table="agent_sessions"
                        ),
                    )
                    self.api_changed = False
                    # self.emit_log("✅ AgnoAgent Initialized (With Persistence).")
                except Exception as e:
                    self.emit_log(f"❌ AgnoAgent Init Error: {e}")
                    return

            # Common Stream Processing (Threaded Wrapper for ALL)
            # Run Stream in a separate thread and push to Queue
            import queue

            stream_queue = queue.Queue()

            def run_thread():
                try:
                    # self.emit_log(
                    #     f"[THREAD] Starting Agno run for '{text[:10]}...'. Tools: {len(self.agno_agent.tools)}"
                    # )

                    # Generate or reuse session ID
                    import uuid

                    if not self.agno_session_id:
                        self.agno_session_id = str(uuid.uuid4())
                        # self.emit_log(f"🧠 New Session ID: {self.agno_session_id}")
                    else:
                        pass
                        # self.emit_log(f"🧠 Resuming Session: {self.agno_session_id}")

                    # Sync call - Agno handles tool execution internally!
                    stream = self.agno_agent.run(
                        text, stream=True, session_id=self.agno_session_id
                    )
                    # self.emit_log(f"[THREAD] Stream object obtained: {type(stream)}")

                    if stream:
                        count = 0
                        for chunk in stream:
                            stream_queue.put(chunk)
                        # self.emit_log(
                        #     f"[THREAD] Stream finished. Yielded {count} chunks."
                        # )
                    else:
                        pass
                        # self.emit_log("[THREAD] Stream object is None/Empty!")

                    stream_queue.put(None)  # Sentinel for end
                except Exception as e:
                    self.emit_log(f"[THREAD] CRITICAL Error: {e}")
                    import traceback

                    traceback.print_exc()
                    stream_queue.put(e)  # Pass error
                finally:
                    # self.emit_log("[THREAD] Exiting thread.")
                    pass

            # Start thread
            t = threading.Thread(target=run_thread, daemon=True)
            t.start()
            # self.emit_log(f"[WRAPPER] Thread started: {t.name} (ID: {t.ident})")

            # Common Stream Processing Loop
            full_content = ""
            in_think_block = False
            buffer = ""

            # self.emit_log("🐛 Stream Loop Starting... Waiting for queue.")

            # Queue Consumer Loop
            chunk_count = 0
            while True:
                try:
                    # Non-blocking get
                    chunk = stream_queue.get_nowait()
                except queue.Empty:
                    if not t.is_alive() and stream_queue.empty():
                        break
                    await asyncio.sleep(0.05)
                    continue

                if chunk is None:
                    break

                if isinstance(chunk, Exception):
                    self.emit_log(f"Agno Error: {chunk}")
                    raise chunk

                chunk_count += 1

                # Normalize chunk
                content = ""
                # Handle ModelResponse object from KaggleCustomModel or Agno standard
                if isinstance(chunk, ModelResponse):
                    content = chunk.content or ""
                elif isinstance(chunk, str):
                    content = chunk
                elif hasattr(chunk, "content") and chunk.content is not None:
                    content = chunk.content
                elif hasattr(chunk, "delta") and hasattr(chunk.delta, "content"):
                    content = chunk.delta.content
                elif hasattr(chunk, "choices") and len(chunk.choices) > 0:
                    delta = chunk.choices[0].delta
                    if hasattr(delta, "content"):
                        content = delta.content
                elif isinstance(chunk, dict):
                    if "content" in chunk:
                        content = chunk["content"]
                else:
                   pass 
                   # Unknown chunk type - ignore or handle if needed


                if not content:
                    continue

                full_content += content
                buffer += content

                # Logic identifying Think Tags
                # print(f"DEBUG: Processing buffer: '{buffer}'")
                while True:
                    if in_think_block:
                        end_tag_idx = buffer.find("")
                        if end_tag_idx != -1:
                            thought_text = buffer[:end_tag_idx]
                            if thought_text:
                                self.stream_signal.emit(thought_text, "thought")
                            buffer = buffer[end_tag_idx + 8 :]
                            in_think_block = False
                            continue
                        else:
                            if len(buffer) > 8:
                                safe_chunk = buffer[:-8]
                                self.stream_signal.emit(safe_chunk, "thought")
                                buffer = buffer[-8:]
                            break
                    else:
                        start_tag_idx = buffer.find("")
                        if start_tag_idx != -1:
                            pre_text = buffer[:start_tag_idx]
                            if pre_text:
                                self.stream_signal.emit(pre_text, "response")
                            buffer = buffer[start_tag_idx + 7 :]
                            in_think_block = True
                            continue
                        else:
                            # Loose tag detection
                            if "<" in buffer:
                                last_open = buffer.rfind("<")
                                if last_open != -1:
                                    if last_open > 0:
                                        safe_chunk = buffer[:last_open]
                                        self.stream_signal.emit(safe_chunk, "response")
                                        buffer = buffer[last_open:]
                                    if len(buffer) > 7 and not buffer.startswith(
                                        ""
                                    ):
                                        self.stream_signal.emit(buffer[0], "response")
                                        buffer = buffer[1:]
                                        continue
                                else:
                                    self.stream_signal.emit(buffer, "response")
                                    buffer = ""
                            else:
                                self.stream_signal.emit(buffer, "response")
                                buffer = ""
                            break

            if buffer:
                msg_type = "thought" if in_think_block else "response"
                self.stream_signal.emit(buffer, msg_type)

            self.emit_log("")

        except Exception as e:
            self.emit_log(f"❌ Chat Error: {e}")
            import traceback

            self.emit_log(traceback.format_exc())

    async def _handle_chat_mode_OLD_TO_DELETE(self, text, silent_input=False):
        """OLD implementation to be deleted."""
        try:
            # Select Stream Source
            chunk_generator = None
            if self.current_api_name == "Kaggle":
                chunk_generator = self._stream_custom_api(text)
            else:
                chunk_generator = self._stream_agno_wrapper(text)

            # Common Stream Processing
            full_content = ""
            in_think_block = False
            buffer = ""

            self.emit_log("🐛 Stream Loop Starting...")

            try:
                async for content in chunk_generator:
                    full_content += content
                    buffer += content

                    # Logic identifying Think Tags
                    # If we haven't seen  yet, we should be careful not to hold text too long.
                    # If buffer gets long without , mistakenly holding it?

                    while True:
                        if in_think_block:
                            end_tag_idx = buffer.find("")
                            if end_tag_idx != -1:
                                thought_text = buffer[:end_tag_idx]
                                if thought_text:
                                    self.stream_signal.emit(thought_text, "thought")
                                buffer = buffer[end_tag_idx + 8 :]
                                in_think_block = False
                                continue
                            else:
                                # In think block, we can safety emit partial thought to minimize buffer?
                                # But we need to avoid splitting 
                                if len(buffer) > 8:
                                    safe_chunk = buffer[:-8]
                                    self.stream_signal.emit(safe_chunk, "thought")
                                    buffer = buffer[-8:]
                                break
                        else:
                            start_tag_idx = buffer.find("")
                            if start_tag_idx != -1:
                                pre_text = buffer[:start_tag_idx]
                                if pre_text:
                                    self.stream_signal.emit(pre_text, "response")
                                buffer = buffer[start_tag_idx + 7 :]
                                in_think_block = True
                                continue
                            else:
                                # No start tag found yet
                                # If buffer contains partial "<", we wait.
                                # Otherwise emit.
                                if "<" in buffer:
                                    # Could be start of . Wait until safe length or confirmed not tag.
                                    # If buffer grows very large without completing tag, force emit?
                                    # Actually, just search for . failed.
                                    # Does it look like it COULD be ?
                                    # Simple heuristic: Emit everything up to the last "<"
                                    last_open = buffer.rfind("<")
                                    if last_open != -1:
                                        # Emit everything before the potential tag
                                        if last_open > 0:
                                            safe_chunk = buffer[:last_open]
                                            self.stream_signal.emit(
                                                safe_chunk, "response"
                                            )
                                            buffer = buffer[last_open:]
                                        # Now buffer starts with "<"
                                        # If len > 7 and not , it's just text.
                                        if len(buffer) > 7 and not buffer.startswith(
                                            ""
                                        ):
                                            # It's not  (e.g. 
 or just < symbol)
                                            # Emit one char to progress?
                                            self.stream_signal.emit(
                                                buffer[0], "response"
                                            )
                                            buffer = buffer[1:]
                                            continue
                                    else:
                                        # No "<" at all -> Safe to emit all
                                        self.stream_signal.emit(buffer, "response")
                                        buffer = ""
                                else:
                                    # No "<" -> Safe to emit all
                                    self.stream_signal.emit(buffer, "response")
                                    buffer = ""
                                break

            except Exception as stream_err:
                self.emit_log(f"⚠️ Stream Loop Error: {stream_err}")
                import traceback

                self.emit_log(traceback.format_exc())

            # Flush final buffer
            if buffer:
                msg_type = "thought" if in_think_block else "response"
                self.stream_signal.emit(buffer, msg_type)

            self.emit_log("")

        except Exception as e:
            self.emit_log(f"❌ Chat Error: {e}")
            import traceback

            self.emit_log(traceback.format_exc())

    async def _run_browser_agent(self, llm, task_text, browser_session):
        self.emit_log(f"🚀 Task: {task_text}")

        # Store Task Text for Prompt Injection
        self.current_task_text = task_text

        # Lazy Initialization of BrowserSession
        if self.shared_browser_session is None:
            self.emit_log("⏳ BrowserSession (新規ブラウザ) を起動しています...")
            try:
                from browser_use import BrowserSession

                # ユーザー要望: ウィンドウをサイドバーの横に配置して縮小
                # BrowserSessionに直接引数を渡す (BrowserConfigはimport不可のため)
                self.shared_browser_session = BrowserSession(
                    args=[
                        "--window-position=450,50",
                        "--window-size=800,900",  # Shrink width as requested
                        "--force-device-scale-factor=0.85",  # Slight zoom out to fit more content
                    ]
                )
                await self.shared_browser_session.start()
                self.emit_log("✅ BrowserSession 起動完了 (Window Adjusted)")
            except Exception as e:
                self.emit_log(f"❌ Browser Launch Error: {e}")
                return

        browser_session = self.shared_browser_session  # Update local var

        self.emit_log("⏳ Initializing Agent & Browser Connection...")

        try:
            from browser_use import Agent

            # 司令官の要望: 勝手に新しいタブを開かず、現在のタブを優先する
            try:
                if not getattr(browser_session, "_session_id", None):
                    await browser_session.start()
                self.emit_log("✅ エージェント接続完了")
            except Exception as e:
                self.emit_log(f"⚠️ ブラウザ接続中にエラーが発生しました: {e}")

            # ユーザー要望: 指示を簡素化してエージェントの混乱を防ぐ
            # コンテキスト(過去の指示)を付与して「記憶」させる
            # System Instruction to force tool usage
            system_instruction = """
IMPORTANT: You have NO direct access to the user's filesystem.
To create files, folders, or read data, you MUST use the `python_code_interpreter` tool.
DO NOT hallucinate or pretend to execute code.
ALWAYS use the tool to perform the action.
"""
            full_task = f"{system_instruction}\n\nUser Request: {task_text}"
            if self.session_context:
                context_str = "\n".join(
                    [
                        f"- Previous Task: {t}\n  Result: {r}"
                        for t, r in self.session_context[-5:]
                    ]
                )
                full_task = f"""
Current Task: {task_text}

[Session History (Context for your reference)]
{context_str}
"""

            # Define Docker Tool for Agent
            from langchain_core.tools import StructuredTool

            def run_python_wrapper(code: str):
                """Run python code. Return stdout text."""
                return self._run_docker_code(code)

            docker_tool = StructuredTool.from_function(
                func=run_python_wrapper,
                name="python_code_interpreter",
                description="Execute Python code in a secure Docker environment. Use this for ANY filesystem operations (read/write files, folders, etc) or complex calculations. Input must be a valid python script string. Working directory is /workspace.",
            )

            # DEBUG: LLM Check
            self.emit_log(f"🕵️ LLM Type: {type(llm)}")
            self.emit_log(f"🕵️ Has ainvoke: {hasattr(llm, 'ainvoke')}")

            # エージェント初期化 (Tool追加)
            agent = Agent(
                task=full_task,
                llm=llm,
                browser_context=browser_session.context,
                tools=[docker_tool],
                use_vision=(self.current_api_name == "Kaggle"),  # KaggleのみTrue
                sensitive_data_filter=True,  # トークン節約のためスクリプト等を非表示
                llm_timeout=120,
                max_steps=50,  # 粘り強く作業させる
                # ユーザー要望: DOMをダイエットさせてTPM節約
                include_attributes=[
                    "title",
                    "type",
                    "name",
                    "role",
                    "aria-label",
                    "placeholder",
                    "value",
                    "alt",
                    "src",
                    "href",
                    "id",
                ],
            )

            self.emit_log("✅ Agent Ready. Starting task...")

            # Run Agent
            history = await agent.run()

            # Summary
            try:
                result = history.final_result()
                if result:
                    self.emit_log(f"✅ 完了: {result}")
                    # Update Session Context
                    self.session_context.append((task_text, result))
                else:
                    self.emit_log("✅ タスク終了")
                    self.session_context.append(
                        (task_text, "Completed (No text result)")
                    )
            except:
                self.emit_log("✅ タスク終了 (結果取得不可)")
                self.session_context.append((task_text, "Completed (Unknown result)"))

        except Exception as e:
            self.emit_log(f"❌ Agent Error: {e}")
            if "TimeoutError" in str(e) or "timed out" in str(e):
                self.emit_log(
                    "⚠️ ヒント: Chromeが最小化されているとスクリーンショットが取得できない場合があります。ウィンドウを表示状態にしてください。"
                )
            elif "User requested stop" in str(e):
                self.emit_log("🛑 ユーザー操作によりタスクを中断しました。")
            else:
                # Other errors...
                pass


    def request_stop(self):
        """Request the current agent to stop."""
        self.stop_requested = True
        self.emit_log("🛑 停止リクエストを受信しました。次のステップで停止します...")

    def submit_code_task(self, code):
        """Submit a code execution task to the queue."""
        self.task_queue.put({"mode": "code_run", "code": code})
        self.emit_log(f"💻 Code Execution Queued")

    def _run_docker_code(self, code):
        """Run Python code in a Docker container."""
        try:
            self.emit_log(f"🐳 Dockerコンテナを起動中... ({DOCKER_IMAGE})")

            # create temp file in mount path
            mount_path = Path(DOCKER_MOUNT_PATH)
            if not mount_path.exists():
                mount_path.mkdir(parents=True, exist_ok=True)

            script_path = mount_path / "temp_script.py"
            # Write with utf-8
            try:
                script_path.write_text(code, encoding="utf-8")
            except Exception as e:
                self.emit_log(f"❌ File Write Error: {e}")
                return

            # Docker Command
            # Accessing temp_script.py at /workspace/temp_script.py
            cmd = [
                "docker",
                "run",
                "--rm",
                "-v",
                f"{DOCKER_MOUNT_PATH}:/workspace",
                "-w",
                "/workspace",
                "--network",
                "bridge",
                DOCKER_IMAGE,
                "python",
                "temp_script.py",
            ]

            self.emit_log(f"▶️ Command: {' '.join(cmd)}")

            # Run Process
            # Add CREATE_NO_WINDOW for Windows to avoid popping up console
            creation_flags = subprocess.CREATE_NO_WINDOW if os.name == "nt" else 0

            start_time = time.time()
            result = subprocess.run(
                cmd,
                capture_output=True,
                text=True,
                encoding="utf-8",
                errors="replace",
                creationflags=creation_flags,
            )
            duration = time.time() - start_time

            # Clean up temp file (Optional: keep for debug? Let's delete for now)
            # script_path.unlink(missing_ok=True)

            # Output
            output_header = f"⏱️ Execution Time: {duration:.2f}s\n"
            final_output = ""
            if result.returncode == 0:
                final_output = f"{output_header}{result.stdout}"
                self.emit_log(f"✅ Docker Output:\n{final_output}")
                if result.stderr:
                    final_output += f"\n[Stderr]\n{result.stderr}"
                    self.emit_log(f"⚠️ Docker Stderr:\n{result.stderr}")
            else:
                final_output = f"{output_header}Error (Code {result.returncode}):\n{result.stderr}\n{result.stdout}"
                self.emit_log(f"❌ Docker Execution Failed:\n{final_output}")

            return final_output

        except Exception as e:
            error_msg = f"❌ Docker Unexpected Error: {e}"
            self.emit_log(error_msg)
            import traceback

            traceback.print_exc()
            return error_msg

    async def _run_open_interpreter(self, task_text):
        """Run task using the official Open Interpreter."""
        try:
            from interpreter import interpreter
        except ImportError:
            interpreter = None

        if not interpreter:
            self.emit_log("❌ Open Interpreter library not loaded.")
            return

        self.emit_log(f"🧠 Open Interpreter Thinking... (Task: {task_text})")

        # Configure Interpreter
        # Mapping API Config to LiteLLM format
        api_key = self.api_config.get("api_key")
        model = self.api_config.get("model")

        if self.current_api_name == "Groq":
            interpreter.llm.model = (
                f"groq/{model}" if not model.startswith("groq/") else model
            )
            interpreter.llm.api_key = api_key
            interpreter.llm.context_window = 8192
            interpreter.llm.max_tokens = 4096
        elif self.current_api_name == "Gemini":
            interpreter.llm.model = f"gemini/{model}"
            interpreter.llm.api_key = api_key
        else:
            # Fallback or other providers
            interpreter.llm.model = model
            interpreter.llm.api_key = api_key

        interpreter.auto_run = True  # Trusting the user request for "OI-like" speed
        # interpreter.auto_run = not self.approval_mode # Use approval mode if desired? User complained.

        # System Prompt Injection (Optional)
        interpreter.system_message += "\nUser prefers Japanese response."

        try:
            # interpreter.chat is blocking/sync, but we are in async method.
            # We should technically run this in executor, but since we want streaming...
            # interpreter.chat return generator.

            # Since interpreter blocks on input() if auto_run=False, we must be careful.
            # We set auto_run=True, so it should run autonomously until completion or error.

            chunks = interpreter.chat(task_text, stream=True)

            for chunk in chunks:
                # Chunk handling
                # Handle 'message' (text response)
                if "message" in chunk:
                    self.emit_log(f"🤖 {chunk['message']}")

                # Handle 'code' (code generation)
                if "code" in chunk:
                    self.emit_log(
                        f"📝 Generating Code ({chunk.get('language', 'unknown')}):\n{chunk['code']}"
                    )

                # Handle 'executing' (execution details)
                if "executing" in chunk:
                    code_info = chunk["executing"]
                    self.emit_log(f"⚙️ Executing Code:\n{code_info.get('code', '...')}")

                # Handle 'output' (execution result)
                if "output" in chunk:
                    self.emit_log(f"📤 Output:\n{chunk['output']}")

            self.emit_log("✅ Open Interpreter Task Completed.")

        except Exception as e:
            self.emit_log(f"❌ Open Interpreter Error: {e}")
            import traceback

            self.emit_log(traceback.format_exc())

            # Output
            output_header = f"⏱️ Execution Time: {duration:.2f}s\n"
            final_output = ""
            if result.returncode == 0:
                final_output = f"{output_header}{result.stdout}"
                self.emit_log(f"✅ Docker Output:\n{final_output}")
                if result.stderr:
                    final_output += f"\n[Stderr]\n{result.stderr}"
                    self.emit_log(f"⚠️ Docker Stderr:\n{result.stderr}")
            else:
                final_output = f"{output_header}Error (Code {result.returncode}):\n{result.stderr}\n{result.stdout}"
                self.emit_log(f"❌ Docker Execution Failed:\n{final_output}")

            return final_output

    def stop(self):
        self.is_running = False
        self.task_queue.put(None)
        # self.wait() # ブロックしてフリーズする原因になるため待機しない


class AutoResizingTextEdit(QTextEdit):
    submit_signal = pyqtSignal()

    def __init__(self, parent=None):
        super().__init__(parent)
        self.setLineWrapMode(QTextEdit.LineWrapMode.WidgetWidth)
        self.setVerticalScrollBarPolicy(Qt.ScrollBarPolicy.ScrollBarAlwaysOff)
        self.textChanged.connect(self.adjust_height)
        self.setFixedHeight(40)  # Init height
        self.setSizePolicy(QSizePolicy.Policy.Expanding, QSizePolicy.Policy.Minimum)

    def adjust_height(self):
        doc_height = self.document().size().height()
        new_height = int(doc_height + 10)
        if new_height > 150:
            self.setVerticalScrollBarPolicy(Qt.ScrollBarPolicy.ScrollBarAsNeeded)
            new_height = 150
        else:
            self.setVerticalScrollBarPolicy(Qt.ScrollBarPolicy.ScrollBarAlwaysOff)

        if new_height < 40:
            new_height = 40

        self.setFixedHeight(new_height)

    def keyPressEvent(self, event):
        # AiCommanderPyQtインスタンスを取得
        main_win = self.window()
        if not hasattr(main_win, "worker"):
            super().keyPressEvent(event)
            return

        # スペースキーで一時停止/再開(入力欄にフォーカスがない、または空の場合)
        if event.key() == Qt.Key.Key_Space:
            if self.toPlainText().strip() == "":
                if main_win.worker.proceed_event.is_set():
                    main_win.worker.proceed_event.clear()
        if event.key() == Qt.Key.Key_Return:
            if event.modifiers() & Qt.KeyboardModifier.ShiftModifier:
                super().keyPressEvent(event)
            else:
                self.submit_signal.emit()
        elif event.key() == Qt.Key.Key_Space:
            if event.modifiers() & Qt.KeyboardModifier.ControlModifier:
                # Ctrl+Spaceで一時停止/再開(入力中でも機能させるため)
                window = self.window()
                if hasattr(window, "manual_step_toggle"):
                    window.manual_step_toggle()
                else:
                    super().keyPressEvent(event)  # Fallback
            else:
                super().keyPressEvent(event)
        else:
            super().keyPressEvent(event)

    def focusInEvent(self, event):
        super().focusInEvent(event)
        try:
            hwnd = int(self.window().winId())
            imm32 = ctypes.windll.imm32
            hIMC = imm32.ImmGetContext(hwnd)
            if hIMC:
                imm32.ImmSetOpenStatus(hIMC, True)
                imm32.ImmReleaseContext(hwnd, hIMC)
        except:
            pass


class AiCommanderPyQt(QMainWindow):
    def __init__(self):
        super().__init__()
        self.target_browser = None
        self.is_expanded = False
        self.current_attachment = None
        self.oldPos = self.pos()

        # Markdownレンダリング用のバッファ
        self.current_stream_text = ""
        self.response_start_pos = None

        self.worker = BrowserWorker(DEFAULT_API)
        self.worker.log_signal.connect(self.append_log)
        self.worker.stream_signal.connect(self.handle_stream_chunk)
        self.worker.ready_signal.connect(self.on_system_ready)
        self.worker.ready_signal.connect(self.dock_browser)
        self.worker.start()

        self.setAcceptDrops(True)
        self.init_ui()
        QTimer.singleShot(100, self.force_reposition)

    def init_ui(self):
        self.setWindowTitle("AI Commander")
        self.setWindowFlags(
             Qt.WindowType.FramelessWindowHint | Qt.WindowType.Window
        )
        # Native Windows API Hack to enable Minimize on Taskbar Click for Frameless Window
        hwnd = self.winId()
        # GWL_STYLE = -16
        # WS_MINIMIZEBOX = 0x00020000
        # WS_SYSMENU = 0x00080000
        import ctypes
        style = ctypes.windll.user32.GetWindowLongW(int(hwnd), -16)
        style = style | 0x00020000 | 0x00080000
        ctypes.windll.user32.SetWindowLongW(int(hwnd), -16, style)
        central_widget = QWidget()
        self.setCentralWidget(central_widget)

        # 画面全体(QMainWindow, CentralWidget)とコンポーネントの色を入れ替え
        self.setStyleSheet(
            """
            QMainWindow { background-color: #2a2a2a; }
            QWidget#CentralWidget { background-color: #2a2a2a; color: white; }
            QWidget { background-color: transparent; color: white; }
            QTextEdit { background-color: #222222; color: white; border: none; }
            QLineEdit, AutoResizingTextEdit { background-color: #222222; color: white; border: 1px solid #555; }
            QPushButton { background-color: #444; color: white; border: none; }
            
            /* スクロールバーのスタイル */
            QScrollBar:vertical {
                border: none;
                background: #222;
                width: 20px;
                margin: 0px 0px 0px 0px;
            }
            QScrollBar::handle:vertical {
                background: #666;
                min-height: 20px;
                border-radius: 6px;
            }
            """
        )
        central_widget.setObjectName("CentralWidget")
        layout = QVBoxLayout(central_widget)
        layout.setContentsMargins(5, 5, 5, 5)

        header_layout = QHBoxLayout()
        title_label = QLabel("🤖 AI COMMANDER")
        title_label.setFont(QFont("Meiryo UI", 9, QFont.Weight.Bold))
        title_label.setStyleSheet("border: none; color: #ffaa00;")
        header_layout.addWidget(title_label)

        # Stop Button
        self.stop_btn = QPushButton("■")
        self.stop_btn.setFixedSize(30, 25)
        self.stop_btn.setStyleSheet(
            "QPushButton { background-color: #c0392b; color: white; border: none; font-weight: bold; } QPushButton:hover { background-color: #e74c3c; }"
        )
        self.stop_btn.setToolTip("実行中のタスクを強制停止")
        self.stop_btn.clicked.connect(self.stop_task)
        header_layout.addWidget(self.stop_btn)

        header_layout.addStretch()

        self.toggle_btn = QPushButton("↔")
        self.toggle_btn.setFixedSize(30, 25)
        self.toggle_btn.setStyleSheet(
            "QPushButton { background-color: #555; border: none; font-weight: bold; } QPushButton:hover { background-color: #777; }"
        )
        self.toggle_btn.clicked.connect(self.toggle_size)
        header_layout.addWidget(self.toggle_btn)

        self.min_btn = QPushButton("_")
        self.min_btn.setFixedSize(30, 25)
        self.min_btn.setStyleSheet(
            "QPushButton { background-color: #444; border: none; } QPushButton:hover { background-color: #666; }"
        )
        self.min_btn.clicked.connect(self.showMinimized)
        header_layout.addWidget(self.min_btn)

        self.close_btn = QPushButton("×")
        self.close_btn.setFixedSize(30, 25)
        self.close_btn.setStyleSheet(
            "QPushButton { background-color: #444; border: none; } QPushButton:hover { background-color: #c0392b; }"
        )
        self.close_btn.clicked.connect(self.close)
        header_layout.addWidget(self.close_btn)
        layout.addLayout(header_layout)

        self.log_box = QTextEdit()
        self.log_box.setLineWrapMode(QTextEdit.LineWrapMode.WidgetWidth)
        self.log_box.setWordWrapMode(QTextOption.WrapMode.WrapAtWordBoundaryOrAnywhere)
        self.log_box.setHorizontalScrollBarPolicy(Qt.ScrollBarPolicy.ScrollBarAlwaysOff)
        self.log_box.setReadOnly(True)
        self.log_box.setFont(QFont("Meiryo UI", 10))
        self.log_box.setStyleSheet(
            "background-color: #1e1e1e; border: none; padding: 5px; white-space: pre-wrap;"
        )
        layout.addWidget(self.log_box)

        self.file_label = QLabel("")
        self.file_label.setFont(QFont("Meiryo UI", 9))
        self.file_label.setStyleSheet("color: #00ff00; padding: 2px;")
        self.file_label.hide()
        layout.addWidget(self.file_label)

        input_layout = QHBoxLayout()

        # モード切替ボタン
        # モード切替ボタン (Browser Mode Disabled)
        # self.mode_btn = QPushButton("💬")
        # self.mode_btn.setFixedSize(30, 30)
        # self.mode_btn.setToolTip("クリックでモード切替 (Chat ⇔ Browser)")
        # self.mode_btn.setStyleSheet(
        #     "background-color: #28a745; border: none; font-size: 16px; border-radius: 5px;"
        # )
        # self.mode_btn.clicked.connect(self.toggle_mode)
        self.is_browser_mode = False 
        self.is_code_mode = False
        self.is_agno_mode = False # Init Agno mode
        
        # self.agno_btn = QPushButton("🧠 AGUNO: OFF") # Moved to control_layout
        # input_layout.addWidget(self.agno_btn)

        self.entry = AutoResizingTextEdit()
        self.entry.setWordWrapMode(QTextOption.WrapMode.WrapAtWordBoundaryOrAnywhere)
        self.entry.setFont(QFont("Meiryo UI", 10))
        self.entry.setPlaceholderText("チャットモード: 質問を入力...")
        self.entry.setStyleSheet(
            "background-color: #3d3d3d; border: 1px solid #555; color: white;"
        )
        self.entry.setFocusPolicy(Qt.FocusPolicy.StrongFocus)  # 自動フォーカスを防ぐ
        self.entry.submit_signal.connect(self.send_message)
        self.entry.setEnabled(False)  # 初期化完了まで無効化
        input_layout.addWidget(self.entry, 1)  # Stretch ratio 1 (maximizes width)

        self.send_btn = QPushButton("送信")
        self.send_btn.setFixedSize(60, 40)
        self.send_btn.setStyleSheet(
            "background-color: #e67e22; color: white; font-weight: bold;"
        )
        self.send_btn.clicked.connect(self.send_message)
        self.send_btn.setEnabled(False)  # 初期化完了まで無効化
        input_layout.addWidget(self.send_btn)

        layout.addLayout(input_layout)

        # Control Layout (Modes & Actions) - Split into 2 Rows
        
        # --- Row 1: Main Agents (AGUNO, Vision) ---
        control_layout_1 = QHBoxLayout()

        # 1. New AGUNO Toggle
        self.agno_btn = QPushButton("🧠 AGUNO: OFF")
        self.agno_btn.setCheckable(True)
        self.agno_btn.setFixedHeight(35)
        self.agno_btn.setSizePolicy(QSizePolicy.Policy.Expanding, QSizePolicy.Policy.Fixed)
        self.agno_btn.setStyleSheet(
            "QPushButton { background-color: #555; color: #aaa; border-radius: 5px; font-weight: bold; } QPushButton:checked { background-color: #9b59b6; color: white; }" # Purple for Agno
        )
        self.agno_btn.clicked.connect(self.toggle_agno_mode)
        control_layout_1.addWidget(self.agno_btn)

        # 2. Vision Toggle (Swapped to Row 1)
        self.vision_btn = QPushButton("📸 Vision: OFF")
        self.vision_btn.setCheckable(True)
        self.vision_btn.setFixedHeight(35)
        self.vision_btn.setSizePolicy(QSizePolicy.Policy.Expanding, QSizePolicy.Policy.Fixed)
        self.vision_btn.setStyleSheet(
            "QPushButton { background-color: #555; color: #aaa; border-radius: 5px; font-weight: bold; } QPushButton:checked { background-color: #28a745; color: white; }"
        )
        self.vision_btn.clicked.connect(self.toggle_vision_mode)
        control_layout_1.addWidget(self.vision_btn)
        self.vision_mode = False # Init State

        # Spacing
        control_layout_1.setSpacing(5)
        control_layout_1.setContentsMargins(0, 0, 0, 0)

        layout.addLayout(control_layout_1)

        # --- Row 2: Control (Approval, Code) ---
        control_layout_2 = QHBoxLayout()

        # 3. Approval Mode Toggle
        self.approval_btn = QPushButton("✋ 承認: OFF")
        self.approval_btn.setCheckable(True)
        self.approval_btn.setChecked(False)
        self.approval_btn.setFixedHeight(35)
        self.approval_btn.setSizePolicy(QSizePolicy.Policy.Expanding, QSizePolicy.Policy.Fixed)
        self.approval_btn.setStyleSheet(
            "QPushButton { background-color: #555; color: #aaa; border-radius: 5px; font-weight: bold; } QPushButton:checked { background-color: #e67e22; color: white; }"
        )
        self.approval_btn.clicked.connect(self.toggle_approval_mode)
        control_layout_2.addWidget(self.approval_btn)

        # 4. Code Mode Toggle Button
        self.code_mode_btn = QPushButton("💻 コード: OFF")
        self.code_mode_btn.setCheckable(True)
        self.code_mode_btn.setChecked(False)
        self.code_mode_btn.setFixedHeight(35)
        self.code_mode_btn.setSizePolicy(QSizePolicy.Policy.Expanding, QSizePolicy.Policy.Fixed)
        # Default: Purple (off state stylistic choice or gray?) -> Let's use Gray for OFF, Purple for ON
        self.code_mode_btn.setStyleSheet(
            "QPushButton { background-color: #555; color: #aaa; border-radius: 5px; font-weight: bold; } QPushButton:checked { background-color: #6f42c1; color: white; }"
        )
        self.code_mode_btn.clicked.connect(self.toggle_code_mode)
        self.code_mode_btn.setToolTip(
            "ONにすると、Open Interpreterのように指示をコード実行で解決します"
        )
        self.code_mode_btn.setEnabled(False)  # Wait for init
        control_layout_2.addWidget(self.code_mode_btn)

        control_layout_2.setSpacing(5)
        control_layout_2.setContentsMargins(0, 0, 0, 0)

        layout.addLayout(control_layout_2)

        # API切替ボタン欄
        api_layout = QHBoxLayout()
        api_layout.setSpacing(5)
        api_layout.setContentsMargins(0, 0, 0, 0)
        self.api_btns = {}
        for api_name in API_CONFIGS.keys():
            btn = QPushButton(api_name)
            btn.setFixedHeight(30)
            btn.setCursor(QCursor(Qt.CursorShape.PointingHandCursor))
            # アクティブなAPIは色を変える(初期状態 Kaggle)
            if API_CONFIGS[api_name] == DEFAULT_API:
                btn.setStyleSheet(
                    "background-color: #007bff; color: white; border-radius: 4px; font-weight: bold;"
                )
            else:
                btn.setStyleSheet(
                    "background-color: #333; color: #ddd; border-radius: 4px;"
                )

            btn.clicked.connect(lambda checked, name=api_name: self.switch_api(name))
            api_layout.addWidget(btn)
            self.api_btns[api_name] = btn

        layout.addLayout(api_layout)

        # フォーカス監視タイマー(点滅カーソル退治用)
        self.focus_timer = QTimer(self)
        self.focus_timer.timeout.connect(self.check_focus_and_hide_cursor)
        self.focus_timer.start(500)  # 0.5秒ごとにチェック

        # Force Initial Size
        self.resize(INITIAL_SIDEBAR_WIDTH, SIDEBAR_HEIGHT)
        self.dock_browser()
        self.append_log("🤖 AI Commander Initialized.")
        self.append_log("💡 準備完了: 何かお手伝いしましょうか?")
        self.entry.setEnabled(True)
        self.send_btn.setEnabled(True)
        self.worker.start()

    def check_focus_and_hide_cursor(self):
        # ウィンドウ本体がアクティブでない場合、入力欄のフォーカスを強制的に外す
        if not self.isActiveWindow():
            if self.entry.hasFocus():
                self.entry.clearFocus()

    def toggle_size(self):
        self.is_expanded = not self.is_expanded
        screen = QGuiApplication.primaryScreen().availableGeometry()
        new_width = screen.width() // 2 if self.is_expanded else INITIAL_SIDEBAR_WIDTH
        self.toggle_btn.setText("→" if self.is_expanded else "↔")
        target_x = screen.width() - new_width + X_ADJUST
        target_x = min(target_x, screen.width() - new_width)
        self.setGeometry(int(target_x), self.y(), int(new_width), SIDEBAR_HEIGHT)
        self.dock_browser()

    def changeEvent(self, event):
        # ウィンドウのアクティブ状態が変わった時にフォーカスを外す(点滅防止)
        if event.type() == QEvent.Type.ActivationChange:
            if not self.isActiveWindow():
                self.entry.clearFocus()

        # ウィンドウの状態(最小化・元に戻す)が変わった時のドッキング処理
        if event.type() == QEvent.Type.WindowStateChange:
            QTimer.singleShot(200, self.dock_browser)

        super().changeEvent(event)

    def force_reposition(self):
        screen = QGuiApplication.primaryScreen().availableGeometry()
        target_x = screen.width() - INITIAL_SIDEBAR_WIDTH + X_ADJUST
        target_x = min(target_x, screen.width() - INITIAL_SIDEBAR_WIDTH)
        self.setGeometry(int(target_x), 50, INITIAL_SIDEBAR_WIDTH, SIDEBAR_HEIGHT)
        self.dock_browser()

    def dock_browser(self):
        try:
            windows = gw.getAllWindows()
            target = None
            for w in windows:
                # ユーザーのChromeを探す(マジックサイドバー自体は除外)
                if (
                    any(x in w.title for x in ["Chrome", "Chromium", "Edge", "Google"])
                    and "AI COMMANDER" not in w.title
                ):
                    target = w
                    print(f"DEBUG: Found likely browser window: {w.title}")
                    break

            if target:
                try:
                    target.activate()  # 最前面に持ってくる
                except:
                    pass
            if target:
                self.target_browser = target

                # 最小化時はブラウザを最大化
                if self.isMinimized():
                    try:
                        target.maximize()
                    except:
                        pass
                    return

                if target.isMaximized:
                    target.restore()
                target.moveTo(0, 0)
                gap = (
                    DOCKING_GAP_NORMAL if not self.is_expanded else DOCKING_GAP_EXPANDED
                )
                new_browser_width = self.x() - gap
                if new_browser_width > 100:
                    target.resizeTo(int(new_browser_width), BROWSER_HEIGHT)
        except:
            pass

    def append_log(self, text):
        # ターミナルにもログを出す(デバッグ用)
        print(f"[UI LOG] {text}")

        # Reset stream state since we are appending a new block
        self.response_start_pos = None
        self.current_stream_text = ""

        # 色分けロジック
        color = "#888888"  # デフォルト: グレー (システムメッセージ)

        if text.startswith("司令官"):
            color = "#ffffff"  # ユーザー: 白
        elif "❌" in text:
            color = "#ff5555"  # エラー: 赤
        elif "⚠️" in text:
            color = "#f1fa8c"  # 警告: 黄
        elif "💬 回答:" in text:
            color = "#00aaff"  # AI回答開始: 青
        elif "🤖" in text:
            # Groqモード時のみピンク色にする
            if self.worker.current_api_name == "Groq":
                color = "#ff69b4"  # ピンク (Hot Pink)
            else:
                color = "#00aaff"  # AI回答: 青
        elif "💡" in text:
            color = "#8be9fd"  # 思考: Cyan
        elif "✋" in text:
            color = "#ffb86c"  # 承認待ち: Orange

        # HTMLエスケープと改行処理
        safe_text = html.escape(text).replace("\n", "
")
        formatted_text = f'{safe_text}'

        self.log_box.append(formatted_text)
        sb = self.log_box.verticalScrollBar()
        sb.setValue(sb.maximum())

    def handle_stream_chunk(self, text, msg_type="response"):
        # 思考プロセスは非表示(折りたたみ不可のため)
        if msg_type == "thought":
            return
        
        try:
            # 初回チャット開始時にカーソル位置を調整
            if self.response_start_pos is None:
                self.log_box.append("")
                self.log_box.moveCursor(QTextCursor.MoveOperation.End)
                self.response_start_pos = self.log_box.textCursor().position()
                self.current_stream_text = ""

            self.current_stream_text += text

            # 既存のストリームテキストを置き換えるのではなく、追記する形にする?
            # いや、Markdownレンダリングのために全置換が必要な場合がある(太字など)。
            # しかし、色が混ざる場合は全置換できない(QTextDocumentFragmentはスタイルを含まないMarkdown)。
            # 対策:思考と回答を別々のブロックとして扱うか、
            # cursor.mergeCharFormat で色を塗る。

            # 今回は単純化のため、思考と思考以外で「ブロック」を分けるアプローチは複雑すぎる。
            # 代わりに、毎回全置換するが、msg_typeに応じて色を決める。
            # ただし、途中で色が切り替わると、全置換時に前の色が消える。

            # ★改善案: StreamChunkを受け取ったら、そのままカーソル位置にインサートして色を塗る。
            # Markdownの再レンダリング(太字等)は諦めるか、行ごとにやる?
            # ユーザーは「色分け」を求めているので、色は最優先。
            # appendモードに変更する。

            cursor = self.log_box.textCursor()
            cursor.movePosition(QTextCursor.MoveOperation.End)

            # 色の設定
            fmt = QTextCharFormat()
            if msg_type == "thought":
                fmt.setForeground(QColor("gray"))  # 思考はグレー
                fmt.setFontItalic(True)
            elif msg_type == "response":
                if self.worker.current_api_name == "Groq":
                    fmt.setForeground(QColor("#ff69b4"))  # Pink
                elif self.worker.current_api_name == "Ollama":
                    fmt.setForeground(QColor("#50fa7b"))  # Bright Green
                else:
                    fmt.setForeground(QColor("#00aaff"))  # Blue

            cursor.setCharFormat(fmt)
            cursor.insertText(text)  # Markdown解析せずに直接挿入(色が維持される)

            # これだとMarkdown(**Bold**など)が効かないが、色分けは実現できる。
            # トレードオフとして一旦ご容赦いただく。

        except Exception as e:
            print(f"[UI ERROR] handle_stream_chunk fail: {e}")

        sb = self.log_box.verticalScrollBar()
        sb.setValue(sb.maximum())

    def on_system_ready(self):
        self.entry.setEnabled(True)
        self.send_btn.setEnabled(True)
        self.send_btn.setEnabled(True)
        self.code_mode_btn.setEnabled(True)  # Enable Code Mode Toggle
        self.entry.setPlaceholderText("準備完了: チャットモード")
        self.entry.setFocus()

    def mousePressEvent(self, event):
        # ログエリアやスクロールバーの上でのクリックはドラッグ移動にしない
        child = self.childAt(event.position().toPoint())
        if child == self.log_box or (child and child.parent() == self.log_box):
            return

        self.oldPos = event.globalPosition().toPoint()

    def mouseMoveEvent(self, event):
        if event.buttons() == Qt.MouseButton.LeftButton:
            delta = QPoint(event.globalPosition().toPoint() - self.oldPos)
            self.move(self.x() + delta.x(), self.y() + delta.y())
            self.oldPos = event.globalPosition().toPoint()

    def closeEvent(self, event):
        self.worker.stop()
        event.accept()
        sys.exit(0)

    def dragEnterEvent(self, event: QDragEnterEvent):
        if event.mimeData().hasUrls():
            event.accept()
        else:
            event.ignore()

    def dropEvent(self, event: QDropEvent):
        files = [u.toLocalFile() for u in event.mimeData().urls()]
        if files:
            self.current_attachment = files[0]
            self.file_label.setText(
                f"📎 添付中: {os.path.basename(self.current_attachment)}"
            )
            self.file_label.show()

    # --- Helper: Capture Screen ---
    def capture_screen_base64(self):
        try:
            import mss
            from PIL import Image
            import io
            import base64
            
            with mss.mss() as sct:
                # Capture Monitor 1
                if not sct.monitors:
                    self.append_log("❌ Capture Failed: No monitors found")
                    return None
                    
                # Fix: Safer monitor selection
                # sct.monitors[0] is 'all in one', sct.monitors[1] is 1st monitor
                if len(sct.monitors) > 1:
                    monitor = sct.monitors[1]
                else:
                    monitor = sct.monitors[0]
                    
                sct_img = sct.grab(monitor)
                
                # Convert to PIL
                img = Image.frombytes("RGB", sct_img.size, sct_img.bgra, "raw", "BGRX")
                
                # Resize (50% or Max Width 1024)
                w, h = img.size
                target_w = w // 2
                target_h = h // 2
                
                # Resize (FORCE INTEGER!)
                img = img.resize((int(target_w), int(target_h)), Image.Resampling.LANCZOS)
                
                # To Base64
                buffered = io.BytesIO()
                img.save(buffered, format="PNG")
                img_str = base64.b64encode(buffered.getvalue()).decode("utf-8")
                return img_str
        except Exception as e:
            self.append_log(f"📸 Capture Failed: {e}")
            print(f"[ERROR] Capture Failed: {e}")
            import traceback
            traceback.print_exc()
            return None

    # def toggle_mode(self):
    #     self.is_browser_mode = not self.is_browser_mode
    #     if self.is_browser_mode:
    #         self.mode_btn.setText("🌐")
    #         self.mode_btn.setStyleSheet(
    #             "background-color: #007bff; border: none; font-size: 20px; border-radius: 5px;"
    #         )
    #         self.entry.setPlaceholderText("ブラウザモード: 操作指示を入力...")
    #     else:
    #         self.mode_btn.setText("💬")
    #         self.mode_btn.setStyleSheet(
    #             "background-color: #28a745; border: none; font-size: 20px; border-radius: 5px;"
    #         )
    #         self.entry.setPlaceholderText("チャットモード: 質問を入力...")

    def toggle_agno_mode(self):
        self.is_agno_mode = self.agno_btn.isChecked()
        if self.is_agno_mode:
            self.agno_btn.setText("🧠 AGUNO: ON")
            self.entry.setPlaceholderText("AGUNOモード: ツールを使って解決します...")
        else:
            self.agno_btn.setText("🧠 AGUNO: OFF")
            self.entry.setPlaceholderText("チャットモード: 質問を入力...")

    def toggle_vision_mode(self):
        self.vision_mode = self.vision_btn.isChecked()
        if self.vision_mode:
            self.vision_btn.setText("📸 Vision: ON")
            self.vision_btn.setStyleSheet(
                "QPushButton { background-color: #28a745; color: white; border-radius: 5px; font-weight: bold; } QPushButton:checked { background-color: #28a745; color: white; }"
            )
            self.append_log("📸 Vision Mode: ON")
        else:
            self.vision_btn.setText("📸 Vision: OFF")
            self.vision_btn.setStyleSheet(
                "QPushButton { background-color: #555; color: #aaa; border-radius: 5px; font-weight: bold; } QPushButton:checked { background-color: #28a745; color: white; }"
            )
            self.append_log("📸 Vision Mode: OFF")

    def switch_api(self, api_name):
        if self.worker.current_api_name == api_name:
            return

        self.worker.current_api_name = api_name
        self.worker.api_config = API_CONFIGS[api_name]
        self.worker.api_changed = True
        self.worker.agno_agent = None  # Force re-init of AgnoAgent
        self.worker.agno_session_id = None  # Reset memory context

        # UIのボタン外観を更新
        for name, btn in self.api_btns.items():
            if name == api_name:
                btn.setStyleSheet(
                    "background-color: #007bff; color: white; border-radius: 4px; font-weight: bold;"
                )
            else:
                btn.setStyleSheet(
                    "background-color: #333; color: #ddd; border-radius: 4px;"
                )

        self.append_log(f"⚙️ 接続先を切替: {api_name}")

    def send_message(self):
        text = self.entry.toPlainText().strip()
        print(f"[UI DEBUG] send_message triggered: '{text}'")

        if not text:
            return

        self.append_log(f"司令官 🌐: {text}")
        self.entry.clear()
        self.entry.setFixedHeight(40)  # Reset height

        # 通常のタスク送信
        if self.is_code_mode:
            print(f"[UI DEBUG] Code Interpreter Task: {text}")
            self.append_log(f"💻 OI Request: {text}")
            self.worker.submit_task(text, mode="open_interpreter")
            return
        
        # Check Vision Mode
        image_data = None
        if self.vision_mode:
            # UIを強制更新(キャプチャ中のステータスを表示させるため)
            QApplication.processEvents()  # Force UI update

            try:
                image_data = self.capture_screen_base64()
                if image_data:
                    self.append_log("📸 Screen captured & resized!")
                else:
                    self.append_log("⚠️ Capture failed or returned None.")
            except Exception as e:
                self.append_log(f"🔥 Critical Capture Error: {e}")
                import traceback
                traceback.print_exc()

        # New Logic: Agno vs Chat
        mode = "agno_chat" if self.is_agno_mode else "chat"

        print(
            f"[UI DEBUG] Task put into queue: {text} (worker_alive: {self.worker.isRunning()})"
        )
        # Pass image_data to submit_task
        self.worker.submit_task(text, mode, image=image_data)

    def run_docker_code(self):
        """Legacy method kept if needed or safe to remove if unused."""
        pass

    def stop_task(self):
        self.append_log("🛑 停止ボタンが押されました。")
        self.worker.request_stop()

    def toggle_approval_mode(self):
        is_on = self.approval_btn.isChecked()
        self.worker.approval_mode = is_on
        if is_on:
            self.approval_btn.setText("✋ 承認モード: ON")
            self.approval_btn.setStyleSheet(
                "QPushButton { background-color: #e67e22; color: white; border-radius: 5px; font-weight: bold; }"
            )
            self.append_log(
                "✋ 承認モードを有効にしました。AIの各アクション前に一時停止します。"
            )
        else:
            self.approval_btn.setText("✋ 承認モード: OFF")
            self.approval_btn.setStyleSheet(
                "QPushButton { background-color: #555; color: #aaa; border-radius: 5px; font-weight: bold; }"
            )
            self.append_log("🚀 承認モードを解除しました。自動実行します。")
            if not self.worker.proceed_event.is_set():
                self.worker.proceed_event.set()  # Auto-resume if stuck in approval wait

    def manual_resume(self):
        if not self.worker.proceed_event.is_set():
            self.worker.proceed_event.set()
            self.append_log("▶️ 再開/承認しました。")
        else:
            self.append_log("ℹ️ 既に実行中です。")

    def toggle_code_mode(self):
        is_on = self.code_mode_btn.isChecked()
        self.is_code_mode = is_on

        if is_on:
            self.code_mode_btn.setText("💻 コード: ON")
            self.entry.setPlaceholderText(
                "Open Interpreterモード: 'フォルダを作って' など指示してください..."
            )
            self.entry.setStyleSheet(
                "background-color: #2b2b2b; border: 1px solid #6f42c1; color: #d0d0ff;"
            )  # Hint purple
            self.send_btn.setText("実行")
            self.send_btn.setStyleSheet(
                "background-color: #6f42c1; color: white; font-weight: bold;"
            )
        else:
            self.code_mode_btn.setText("💻 コード: OFF")
            self.entry.setPlaceholderText("チャットモード: 質問を入力...")
            self.entry.setStyleSheet(
                "background-color: #3d3d3d; border: 1px solid #555; color: white;"
            )  # Reset
            self.send_btn.setText("送信")
            self.send_btn.setStyleSheet(
                "background-color: #e67e22; color: white; font-weight: bold;"
            )

    def run_docker_code(self):
        """Legacy method kept if needed or safe to remove if unused."""
        pass

    def toggle_pause(self):
        # Legacy placeholder in case called from elsewhere (Space key logic needs update)
        self.manual_step_toggle()

    def manual_step_toggle(self):
        # Space key logic: Toggle pause/resume manually
        if self.worker.proceed_event.is_set():
            self.worker.proceed_event.clear()
            self.append_log("⏸️ 手動一時停止しました。")
        else:
            self.worker.proceed_event.set()
            self.append_log("▶️ 手動再開しました。")

    def keyPressEvent(self, event):
        # スペースキーで一時停止/再開(入力欄にフォーカスがない場合)
        if event.key() == Qt.Key.Key_Space:
            self.toggle_pause()  # ボタンと同じ処理を呼ぶ
            event.accept()
            return

        super().keyPressEvent(event)

    def closeEvent(self, event):
        # アプリ終了時にブラウザを最大化してあげる
        if self.target_browser:
            try:
                self.target_browser.maximize()
            except:
                pass

        self.worker.stop()

        # ターミナルも自動で閉じる
        import ctypes

        hwnd = ctypes.windll.kernel32.GetConsoleWindow()
        if hwnd != 0:
            ctypes.windll.user32.PostMessageW(hwnd, 0x0010, 0, 0)  # WM_CLOSE

        sys.exit(0)


if __name__ == "__main__":
    app = QApplication(sys.argv)
    window = AiCommanderPyQt()
    window.show()
    sys.exit(app.exec())

🚀 おわりに

どうだった?「マジックサイドバー」、ロマンあるでしょ!?✨

Pythonを使えば、こんな風に「自分だけの最強AI相棒」が作れちゃうんだよ!
みんなも、ぜひ挑戦してみてね!

わからないことがあったら、コメントで聞いてね!
それじゃあ、また次の記事で会おうね!バイバーイ!👋💖

よかったらシェアしてね!
  • URLをコピーしました!
目次