Experiments with HoloLens, Bot Framework and LUIS: adding text to speech

- 2 mins

Previously I blogged about creating a Mixed Reality 2D app integrating with a Bot using LUIS via the Direct Line channel available in the Bot Framework.

I decided to add more interactivity to the app by also enabling text to speech for the messages received by the Bot: this required the addition of a new MediaElement for the Speech synthesiser to the main XAML page:

<Page
    x:Class="HoloLensBotDemo.MainPage"
    xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
    xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
    xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
    xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
    mc:Ignorable="d">

    <Grid Background="{ThemeResource ApplicationPageBackgroundThemeBrush}">
        <Grid.ColumnDefinitions>
            <ColumnDefinition Width="10"/>
            <ColumnDefinition Width="Auto"/>
            <ColumnDefinition Width="10"/>
            <ColumnDefinition Width="*"/>
            <ColumnDefinition Width="10"/>
        </Grid.ColumnDefinitions>
        <Grid.RowDefinitions>
            <RowDefinition Height="50"/>
            <RowDefinition Height="50"/>
            <RowDefinition Height="50"/>
            <RowDefinition Height="Auto"/>
        </Grid.RowDefinitions>
        <TextBlock Text="Command received: " Grid.Column="1" VerticalAlignment="Center" />
        <TextBox x:Name="TextCommand" Grid.Column="3" VerticalAlignment="Center"/>

        <Button Content="Start Recognition" Click="StartRecognitionButton_Click" Grid.Row="1" Grid.Column="1" VerticalAlignment="Center" />

        <TextBlock Text="Status: " Grid.Column="1" VerticalAlignment="Center" Grid.Row="2" />
        <TextBlock x:Name="TextStatus" Grid.Column="3" VerticalAlignment="Center" Grid.Row="2"/>

        <TextBlock Text="Bot response: " Grid.Column="1" VerticalAlignment="Center" Grid.Row="3" />
        <TextBlock x:Name="TextOutputBot" Foreground="Red" Grid.Column="3" 
                   VerticalAlignment="Center" Width="Auto" Height="Auto" Grid.Row="3"
                   TextWrapping="Wrap" />
        <MediaElement x:Name="media" />
    </Grid>
</Page>

Then I initialized a new SpeechSynthesizer at the creation of the page:

public sealed partial class MainPage: Page
{
    private SpeechSynthesizer synthesizer;
    private SpeechRecognizer recognizer;

    public MainPage()
    {
        this.InitializeComponent();

        InitializeSpeech();
    }

    private async void InitializeSpeech()
    {
        synthesizer = new SpeechSynthesizer();
        recognizer = new SpeechRecognizer();

        media.MediaEnded += Media_MediaEnded;
        recognizer.StateChanged += Recognizer_StateChanged;

        // Compile the dictation grammar by default.
        await recognizer.CompileConstraintsAsync();
    }

    private void Recognizer_StateChanged(SpeechRecognizer sender, SpeechRecognizerStateChangedEventArgs args)
    {
        if (args.State == SpeechRecognizerState.Idle)
        {
            SetTextStatus(string.Empty);
        }

        if (args.State == SpeechRecognizerState.Capturing)
        {
            SetTextStatus("Listening....");
        }
    } 
…….

And added a new Speech() method using the media element:

private async void Speech(string text)
{
    if (media.CurrentState == MediaElementState.Playing)
    {
        media.Stop();
    }
    else
    {
        try
        {
            // Create a stream from the text. This will be played using a media element.
            SpeechSynthesisStream synthesisStream = await synthesizer.SynthesizeTextToStreamAsync(text);

            // Set the source and start playing the synthesized audio stream.
            media.AutoPlay = true;
            media.SetSource(synthesisStream, synthesisStream.ContentType);
            media.Play();
        }
        catch (System.IO.FileNotFoundException)
        {
            var messageDialog = new Windows.UI.Popups.MessageDialog("Media player components unavailable");
            await messageDialog.ShowAsync();
        }
        catch (Exception)
        {
            media.AutoPlay = false;
            var messageDialog = new Windows.UI.Popups.MessageDialog("Unable to synthesize text");
            await messageDialog.ShowAsync();
        }
    }
}

When a new response is received from the Bot, the new Speech() method is called:

var result = await directLine.Conversations.GetActivitiesAsync(convId);
if (result.Activities.Count > 0)
{
    var botResponse = result
        .Activities
        .LastOrDefault(a => a.From != null && a.From.Name != null && a.From.Name.Equals("Davide Personal Bot"));
    if (botResponse != null && !string.IsNullOrEmpty(botResponse.Text))
    {
        var response = botResponse.Text;

        TextOutputBot.Text = "Bot response: " + response;
        TextStatus.Text = string.Empty;

        Speech(response);
    }
}

And then the recognition for a new phrase is started again via the MediaEnded event to simulate a conversation between the user and the Bot:

private void Media_MediaEnded(object sender, Windows.UI.Xaml.RoutedEventArgs e)
{
    StartRecognitionButton_Click(null, null);
}

As usual, the source code is available for download on GitHub.

Davide Zordan

Davide Zordan

Empowering people with emerging technologies. AR, VR, Mixed Reality developer and enthusiast. Opinions are my own.

rss facebook twitter github gitlab youtube mail spotify lastfm instagram linkedin google google-plus pinterest medium vimeo stackoverflow reddit quora quora