使用 Google Speech API 將語音檔案辨識為文字

2018-4-30 這個玩意失效了，不在乎花錢的請參考 Google Cloud Speech to Text API

一開始是這個

SpeechRecognition 3.6.5

參考

聲音準備

ffmpeg -i 原始文件 -ar 16000 输出.flac

1	ffmpeg -i 原始文件 -ar 16000 输出.flac

格式不正確就完全辨識不出來

curl 指令送出辨識

curl -X POST --data-binary @1.flac --header 'Content-Type: audio/x-flac; rate=16000;' 'https://www.google.com/speech-api/v2/recognize?client=chromium&output=json&lang=zh-TW&key=AIzaSyBOti4mM-6x9WDnZIjIeyEU21OpBXqWBgw'

1	curl -X POST --data-binary @1.flac --header 'Content-Type: audio/x-flac; rate=16000;' 'https://www.google.com/speech-api/v2/recognize?client=chromium&output=json&lang=zh-TW&key=AIzaSyBOti4mM-6x9WDnZIjIeyEU21OpBXqWBgw'

哪天掛掉怎麼辦？

去 python 看原始碼，直接偷新的作法就好了

    def recognize_google(self, audio_data, key=None, language="en-US", show_all=False):
        """
        Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using the Google Speech Recognition API.
        The Google Speech Recognition API key is specified by ``key``. If not specified, it uses a generic key that works out of the box. This should generally be used for personal or testing purposes only, as it **may be revoked by Google at any time**.
        To obtain your own API key, simply following the steps on the `API Keys <http://www.chromium.org/developers/how-tos/api-keys>`__ page at the Chromium Developers site. In the Google Developers Console, Google Speech Recognition is listed as "Speech API".
        The recognition language is determined by ``language``, an RFC5646 language tag like ``"en-US"`` (US English) or ``"fr-FR"`` (International French), defaulting to US English. A list of supported language tags can be found in this `StackOverflow answer <http://stackoverflow.com/a/14302134>`__.
        Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the raw API response as a JSON dictionary.
        Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the speech recognition operation failed, if the key isn't valid, or if there is no internet connection.
        """
        assert isinstance(audio_data, AudioData), "``audio_data`` must be audio data"
        assert key is None or isinstance(key, str), "``key`` must be ``None`` or a string"
        assert isinstance(language, str), "``language`` must be a string"

        flac_data = audio_data.get_flac_data(
            convert_rate=None if audio_data.sample_rate >= 8000 else 8000,  # audio samples must be at least 8 kHz
            convert_width=2  # audio samples must be 16-bit
        )
        if key is None: key = "AIzaSyBOti4mM-6x9WDnZIjIeyEU21OpBXqWBgw"
        url = "http://www.google.com/speech-api/v2/recognize?{}".format(urlencode({
            "client": "chromium",
            "lang": language,
            "key": key,
        }))
        request = Request(url, data=flac_data, headers={"Content-Type": "audio/x-flac; rate={}".format(audio_data.sample_rate)})

def recognize_google(self, audio_data, key=None, language="en-US", show_all=False):

"""

Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using the Google Speech Recognition API.

The Google Speech Recognition API key is specified by ``key``. If not specified, it uses a generic key that works out of the box. This should generally be used for personal or testing purposes only, as it **may be revoked by Google at any time**.

To obtain your own API key, simply following the steps on the `API Keys <http://www.chromium.org/developers/how-tos/api-keys>`__ page at the Chromium Developers site. In the Google Developers Console, Google Speech Recognition is listed as "Speech API".

The recognition language is determined by ``language``, an RFC5646 language tag like ``"en-US"`` (US English) or ``"fr-FR"`` (International French), defaulting to US English. A list of supported language tags can be found in this `StackOverflow answer <http://stackoverflow.com/a/14302134>`__.

Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the raw API response as a JSON dictionary.

Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the speech recognition operation failed, if the key isn't valid, or if there is no internet connection.

"""

assert isinstance(audio_data, AudioData), "``audio_data`` must be audio data"

assert key is None or isinstance(key, str), "``key`` must be ``None`` or a string"

assert isinstance(language, str), "``language`` must be a string"

flac_data = audio_data.get_flac_data(

convert_rate=None if audio_data.sample_rate >= 8000 else 8000, # audio samples must be at least 8 kHz

convert_width=2 # audio samples must be 16-bit

)

if key is None: key = "AIzaSyBOti4mM-6x9WDnZIjIeyEU21OpBXqWBgw"

url = "http://www.google.com/speech-api/v2/recognize?{}".format(urlencode({

"client": "chromium",

"lang": language,

"key": key,

}))

request = Request(url, data=flac_data, headers={"Content-Type": "audio/x-flac; rate={}".format(audio_data.sample_rate)})

2,570 total views, 1 views today

使用 Google Speech API 將語音檔案辨識為文字

聲音準備

curl 指令送出辨識

哪天掛掉怎麼辦？

Related Post

發佈留言

使用 Google Speech API 將語音檔案辨識為文字

聲音準備

curl 指令送出辨識

哪天掛掉怎麼辦？

Related Post

AI 問掛 - Windows 10 修復 .iso 檔案右鍵沒有燒錄光碟錯誤

網頁的前、後端如何區分

PWA 漸進式網路應用程式 - 3. 桌面版瀏覽網頁通知

發佈留言