HAL9000 thinking machine all the bells & whistles

TESLACOIL · Post by **TESLACOIL** » Jun 20, 2010 16:44

im building a HAL9000 type thinking machine ( called A.S.M.O.V.1) full on self aware, human type emotions , self teaching etc etc

The approach and methodology is very different...its a lot to take on board
http://www.worldofcomputing.net/nic/nat ... uting.html
http://en.wikipedia.org/wiki/Bio-inspired_computing
http://en.wikipedia.org/wiki/Natural_computing

It would be more correct to call it an artificial lifeform than a robot.....there is little overlap between the two areas

I feel its only right to thank everyone on this forum who has helped me with this project. I have been able to advance a long way in a very short time & get some of the key modules up and running

You can follow the project here ( which is dedicated to large FB projects )

http://freebasic.freeforums.org/

for general overview
http://asimov1.wikispaces.com/

Robotic Ear (simple ear) completed code
http://www.freebasic.net/forum/viewtopic.php?t=15970

Computer Vision go here (simple eye) image sampler code only
http://www.freebasic.net/forum/viewtopic.php?t=15957

Capturing web cam images to hard drive completed
http://www.freebasic.net/forum/viewtopic.php?t=15973

This first post will be UPDATED as things progress so you can track things

the first SUPER HUMAN LEVEL ABILITY #1 im planning to give it is that of
Piano Virtuoso.

Composing & Playing back Midi Files with style passion and flair. This will utilize its emotional center. (These party tricks will help it pay its way in the world as well as learning through play & aiding its acceptance in society)

Though I hope its function in society rises above that of court Musician...lol

Future ASIMOVS will no doubt specialize in many areas just as humans do

Piano ability status = under development now (your input welcome)

ASIMOV1 can now sing (play back vocal song tracks written for it )

-------------------------------------------------------------------------------------

My design skills on this topic are A+++ but my coding skills are limited so i could do with a little help. (beeb basic and a touch of qbasic is about my limit lol)

I coded the self aware "brain bit" back in the 80s on a BBC computer but the computing power just wasn't there for it to operate in real time. Now they are.

I have a very robust modular design for the system and could do with a little help with some of the modules. (most of these modules will be less than 50 lines of code long & perform simple specific tasks)
------------------------------------------------------------------
ADDED Project Outline

100 or so small modules (small stand alone exes)running on a dozen pc's (im collecting dusty throw aways to increase my processing power ) and spreading the brain out into digestible size modules is a good idea with a full monty "self aware" machine

if anyone has a keen interest in building an AI that is smart enough to hold a decent conversation with (reactobot its not) let me know and i can give you a simple module to work on

could be simply detecting loud or unusual noises in the environment (microphone and threshold trigger)

or spotting a face in a bmp or jpeg

or reading text files across a network

its a crazy fun project to work on & you will learn heaps too

(i might even learn to code decently)
-----------------------------------------------------------------

Speech output module

I need to be able to play individual *.wav files (there will be approx 1000 of these to start with...room to expand to 10,000 for specialist words)

001_hello_1secondlong
207_encyclopedia_2secondlong

that kind of thing

Advice on the best way to go about this would be appreciated

this is a great fun project and im loving every minute of it & i look forward to sharing some cool code as the project develops

-----------------------------------------------------------------------

im using cyberbuddy to create phrases & words & record them as wav files. This allows me to fudge....fuj...the words in to very human sounding speech.

Tazti is used for speech recognition

rolliebollocks · Post by **rolliebollocks** » Jun 20, 2010 17:01

Speech output module

I need to be able to play individual *.wav files (there will be approx 1000 of these to start with...room to expand to 10,000 for specialist words)

001_hello_1secondlong
207_encyclopedia_1secondlong

that kind of thing

Yeah, I can definitely be of some assistance to you, here. You need to set up a parsing engine which in FB will be very easy and very fast. You can use FMOD or FBSOUND, I usually work with FMOD.

The only thing I worry about is that it would be far more efficient to create an inflection engine than it would to record all these different words.

Also, you'd want you're AI to be a learning engine, so the capacity to output new wave files to mimic the speech of others...

The more I think about it, the less happy I am with wave files...

You can use SAPI:

Code: Select all

' Speak the Clipboard! v1.0
' (C) 2008 Innova and Kristopher Windsor

#define UNICODE
#include once "disphelper/disphelper.bi"
#include once "windows.bi"

Function clipboard () As String
  Dim As Zstring Ptr s_ptr
  Dim As HANDLE hglb
  Dim As String s = ""
  
  If (IsClipboardFormatAvailable(CF_TEXT) = 0) Then Return ""
  
  If OpenClipboard( NULL ) <> 0 Then
    hglb = GetClipboardData(cf_text)
    s_ptr = GlobalLock(hglb)
    If (s_ptr <> NULL) Then
      s = *s_ptr
      GlobalUnlock(hglb)
    End If
    CloseClipboard()
  End If
  
  Return s
End Function

Sub speak (Byref text As String)
  Dim myt As Wstring * 512
  Dim As Integer isSpeaking
  Dim As HRESULT hr
  
  DISPATCH_OBJ(tts)
  
  dhInitialize(TRUE)
  dhToggleExceptions(FALSE) 'set this TRUE to get error codes
  
  myt = "Sapi.SpVoice"
  hr = dhCreateObject(@myt, NULL, @tts)
  If hr <> 0 Then Exit Sub
  
  myt = text
  dhCallMethod(tts, ".Speak(%S)", @myt)
  
  SAFE_RELEASE(tts)
End Sub

Speak "Hello"

TESLACOIL · Post by **TESLACOIL** » Jun 20, 2010 17:59

some very good & valid points im inclined to agree, but i need to kick start things off on the speech output side

i had been using these commands

chain "1.bat"
run "1.bat"

from inside basic (just to try things out)

which runs individual batch files

text format i used for batch file if anyones interested (widows xp)
-------------------------------------------------------------
@echo off

start sndrec32 /play /close c:\Documents and Settings\etc etc \hellodrchandra.wav

-----------------------------------------------------

the "brain" runs on a different computer and just chucks instructions at the speech computer (which is a dumbot) that just plays wavfile x,y,z etc

Ideally the "inflection" route would be perfect...it would take hints from the "emotion engine" (im angry) and the "context engine" (but its a posh wine tasting)

so "damn" and not "F..." would be the spoken output

idealy the wav files would play with out halting or delaying the freebasic program that called them

if there is a heroic & human sounding inflection engine out there in freebie land im all ears....(scuse the pun)[/b]

TESLACOIL · Post by **TESLACOIL** » Jun 20, 2010 18:22

comparison of voice types/approaches

voice im currently using (not my video btw)
http://www.youtube.com/watch?v=f1eVk9XcCb4

pre scripted actor
http://www.youtube.com/watch?v=2oUQfz0RD2U&NR=1

biomechanical
http://www.youtube.com/watch?v=oIMqxRRv ... re=related

rolliebollocks · Post by **rolliebollocks** » Jun 20, 2010 18:27

Well, I dunno. I can easily do that. You want a program that plays

<somestring>.wav

And, ideally you would have like a word bank or something. So, you would feed some function a string, it would check a directory to see if there is a corresponding .wav file and play that file if there is one.

But, I'd be less interested in the project if it's taking place on 5 different computers where I can't play around with it.

Firstly, does what I described roughly correspond with what you need? Because if so it's like a 10 minute project, no problem.

I'm interested in AI programming, neural nets, etc... So I'd like to see what you intend on doing here.

TESLACOIL · Post by **TESLACOIL** » Jun 20, 2010 18:51

the programs can all reside on 1 pc

But you will need a realy fast one if you want self aware in real time

the whole thing is very very modular ( the brain program just runs with what information it can get its hands on)

if it cant see...it will use its ears etc more....just like a human

TESLACOIL · Post by **TESLACOIL** » Jun 20, 2010 19:10

rolliebollocks wrote:Well, I dunno. I can easily do that. You want a program that plays

<somestring>.wav

And, ideally you would have like a word bank or something. So, you would feed some function a string, it would check a directory to see if there is a corresponding .wav file and play that file if there is one.

But, I'd be less interested in the project if it's taking place on 5 different computers where I can't play around with it.

Firstly, does what I described roughly correspond with what you need? Because if so it's like a 10 minute project, no problem.

I'm interested in AI programming, neural nets, etc... So I'd like to see what you intend on doing here.

ref WORDBANK , that would be ideal for now (and every one else could use it too)

if running all the software on one machine the wordbank cant halt any other programs. Code needs to be crash proof, the dumber and simpler it is the better even if it looks as though its been coded by a caveman

In essence

The brain knows what it want to say to the world.....chucks a number at the speech output program (this could even be done by writing a a text file that the speech prog sniffs n reads) like a log file

this write a txt file method has been used a lot just so we can track the data flowing about

KISS philosophy where possible

number of wav file, what its saying , howlong it plays for can be included in the name of the wavfile itself (just for readability)

its gonna take me a week just to record them all lol, ill make a webby so everyone can download /upload them

note , windows xp REALY dont like having 1000 wavfiles in one folder

maybe better to have 10 folders with a 100 wavfiles in each

------------------------------------------------------------------------------

if u have some tight 10 minute job code that minimizes or eliminates includes etc id be realy interested to use it (copy n paste code would be ideal)

http://www.mycyberbuddy.com/

http://asimov1.wikispaces.com/

setting up a wiki as i type

TESLACOIL · Post by **TESLACOIL** » Jun 20, 2010 20:21

CUNNING PLAN SO FAR

directory wav file voices

Main Folder
voicedb

Sub folders

"a" ..............to............ "z"
"phrases1" ..to............ "phrases20"
"misc1" ........to.......... "misc20"

plus

"num1to100"
"numbigger"
"reserved"
"emergency"

-----------------------------------------

123456wordphrasesaid654321

123456 = wordnumber that program uses to identify

wordphrasessaid = humans can read & understand

654321 = store other less critical data , eg duration, number of characters intonation

-----------------------------------------
with around 70 folders each with around 100 wavfiles in each folder

7000 wav files @ 100k each wav file = 700megs of hd space

(nothing to worry today's hardrives)

---------------------------------------------------------------------------------
long term plan is to get the soundcard to do the work and not the CPU (HD space is cheap)

vdecampo · Post by **vdecampo** » Jun 20, 2010 20:44

Why don't you just use text-to-speech conversion instead of wave files? It has much less storage overhead and makes your program much easier to modify and expand.

-Vince

rolliebollocks · Post by **rolliebollocks** » Jun 20, 2010 20:54

Firstly, in order to do this properly, you want to use a hash map to store the string-triggers, and the sound-responses. Without that, you are:

a) using an array which will cause an increasing lag depending on how large the vocab is.

b) pulling stuff from files which will be even slower and jerkier between words.

There is another version of the SAPI code which uses threads, and it's supposed to be non-blocking.

How will the brain interface with my code? I need to know more before I pursue.

TESLACOIL · Post by **TESLACOIL** » Jun 20, 2010 20:57

vdecampo wrote:Why don't you just use text-to-speech conversion instead of wave files? It has much less storage overhead and makes your program much easier to modify and expand.

-Vince

cpu and memory are close to overload & will always be with big AI like this

wav file is simple , hopefully reliable and it is what we used to get by already (there is a lot of realy realy tough code needs working on elsewhere)

speech output is the least of my worries, there are close to 100 programs running in parallel in order to make it "alive and self aware"

not worried 1 jot by hd storage....got terrabytes empty
------------------------------------------------------------------------

im weak on computer hware architecture...im assuming creating text to speech on the fly will chew precious cpu cycles ?

& run a bigger risks of system resource conflict than playing wav files ?

BasicCoder2 · Post by **BasicCoder2** » Jun 20, 2010 22:39

segin · Post by **segin** » Jun 20, 2010 22:58

rolliebollocks wrote: The more I think about it, the less happy I am with wave files...

So use FLAC instead.

TESLACOIL · Post by **TESLACOIL** » Jun 21, 2010 0:29

sentient ai is hard

its like building a nuclear bomb...you need to reach critical mass before anything realy happens

but intelligence is an emergent property above that level

ants are smart, but they are just reactobots

this AI program im working on is self aware at the most fundamental level.

few teams out there are using the right approach or pushing this as a primary goal. That why you don't see anything particularly interesting

dreaming and emotions are an essential part of my code. Humans dream and have emotions because it is simply more efficient to filter & adapt for good context based guessing than use brute force. (when you are asleep you are defraging your brain) (emotions promote action in low or high stimulus environments)

TESLACOIL · Post by **TESLACOIL** » Jun 21, 2010 0:44

rolliebollocks wrote:Firstly, in order to do this properly, you want to use a hash map to store the string-triggers, and the sound-responses. Without that, you are:

a) using an array which will cause an increasing lag depending on how large the vocab is.

b) pulling stuff from files which will be even slower and jerkier between words.

There is another version of the SAPI code which uses threads, and it's supposed to be non-blocking.

How will the brain interface with my code? I need to know more before I pursue.

the brain (is a separate exe file) it sends a numerical instruction to the speech output

eg 1 = hello , 2 = goodbye

the brain has its own internal logical language. This is interpreted into human type language (inside the brain exe)

after translation the brain outputs 1 number after the other , serial output to the speech centre

at the moment the brain just writes that single number to a text file

speech output engine is simply
if number = 1 then say hello
if number =2 then say goodbye

most common words at the top of the list
that kind of setup
------------------------------------------------------------
the brain has to be kept separate from all the other modules other wise if one crashes it goes down too.....kind of mission critical keeping the brain going 100% of the time

not an issue time wise with 100 or so words but will be with 1000's im sure

first 500 words (at least) will be generated from playing wav files. I need that many on tap to match the output of the brain logical language to human speech

note

the brain doesn't need to know if the speech centre is doing its job....1 way traffic from brain to speech is fine

the brain speaks & thinks logically, culture context & emotion filters (which effect the way it says things) can be applied by inserting a filter

brain output 1,2,3 etc (speech emotion filter) 1+22=23 , can change hello to hi!

if the speech output gets a 23 it will say" high! "

filter off & the speech output gets a 1& it will say " hello "

if you get where im coming from
-----------------------------------------------

once ive got some basic talk talk going on i can judge its moods and emotions from its speech (prerecorded wavs) (watching emotional sliders move doesn't realy give u a feel for conversation flow)

( i didn't want to code 20 different ways of saying hello just for text output....tuning into sapi and using sapi itself to produce emotional nuances will come later)

the brain + emotional filter can generate 20 different numbers/ways to say hello

or construct the same sentence in 20 different ways

i need a good mechanism to play wav files in the first instance (not all audio output will be in human speech) can sapi do a wolf whistle ? or sound a klaxon ? record a dog barking and play it back etc

so there is always a core need to use wav files one way or the other

idealy as u quite rightly mentioned earlier...learning new words on the fly will be an important feature

and finding a way to generate speech that doesn't hammer cpu cycles or interrupt the other programs running on the machine

---------------------------------------------------------------
hope this helps

got about 100 words recorded today into wav files (once i reach 500 ill stop for air)

as u say i will need to use something like sapi to generate 1000s of words on the fly

but those 500 wav words/sounds will be more than enough for any typical hobby robot or basic chat bot. ill upload the files and directory tree once ive hit the 500 wav mark

HAL9000 thinking machine all the bells & whistles

HAL9000 thinking machine all the bells & whistles

some valid points

comparison of voices

the programs can all reside on 1 pc

ditectory wav file voices

sentient ai is hard