...

View Full Version : Natural Language Processing



nickreardon
12-14-2011, 08:01 PM
Hey guys,
I am a linguist, unfortunately I cannot call myself a computer programmer confidently. I have studied Python and am learning ways to implement code. Long story short, I want to be the one who creates a true linguistically competent machine. I have made a few lines of code before but nothing near a true linguistic interaction. I have come up with a theory based on the theories of linguistic predecessors on how to create a linguistically competent machine (you can read all about it at lingforum.com in the articles authored by me). It has occurred to me that the knowledge my machine would be building would have to be statistically correlated with some form of prompt that occurs behind all processes. The dos prompt pretty much. I'm wondering, where can I find this prompt? What is it called? Does it change with every action performed on a computer? Can I create a program that analyzes that data?
I appreciate the help, and I hope to learn as much as possible from this forum. I will check and post often.

Nick


What will it take to make a computer able to learn any language from scratch?

Divinityfound
12-15-2011, 03:48 PM
The problem with most computers is that they are pretty much just super advanced calculators. You can teach a computer new words and phrases, but you'd need to throw it into a database and teach it through repetition how the words can be used. Pretty much you just need to build the machine to be able to adapt to every minute detail that can exist in languages and notice parallels and other fun what not.

Imagine how you learn languages. Computers learn pretty much the same way. Just common usage.

nickreardon
12-15-2011, 06:13 PM
It would require the analysis of a command prompt. The text that occurs with any computational action. Can I create a program with python that will access that command prompt? I think I can take it from there.

I'd like it if someone was able to answer my questions plz thanks.

tracknut
12-15-2011, 06:24 PM
I have no idea how programmatic access to the command/dos prompt gets you any closer to natural language processing, but if you must, the subprocess (http://docs.python.org/library/subprocess.html#module-subprocess)module appears to be the equivalent of the command prompt.

Having computers understand language is a holy grail that's been worked on for probably 50 years now... Best of luck with your project, but I'd suggest getting a computer expert on your team.

Dave

nickreardon
12-15-2011, 10:45 PM
I've come up with a new theory of grammar. I call it Maximalism. I guarantee it will produce a talking machine. I would love to have a computer expert to work with. Please, if you're interested, contact me.

nickreardon
12-15-2011, 10:53 PM
The reason it is necessary to access the command prompt, or the subprocess module, is that it seems that there is an operation that occurs behind every lexical item (or any sound, word, phrase, sentence, conversation, etc.). Any time language is spoken, it occurs with a mental, or in this case computational, event. Consider these examples.

The dog bit my knee.
That tree is big.
He came to the party.
We left at dusk.

and the obvious ones

Take out the garbage.
Do your homework.
Clean your room.

they all perform some sort of operation. So I say that it seems that items are serving as operators. Words come together to direct thought. I have a whole theory on this. If you'd like to hear more I'd be happy to share. I hope someone can put it into code before I die so I can speak with this thing. I have many questions.

nickreardon
12-15-2011, 11:13 PM
So this subprocess module actually accesses the basic components of computation? The command prompt behind everything?...
So if I said something like

subprocess.checkoutput

It would check any action that has occurred computationally?

tracknut
12-16-2011, 12:19 AM
So this subprocess module actually accesses the basic components of computation? The command prompt behind everything?...


I'm not sure what *you* consider the "basic components of computation", but I would say they are the machine instructions of the processor. Add, Multiply, Jump, stuff like that. If that's what you mean, then no, the subprocess module does not access those. You'd write an assembly language program to access those directly. Subprocess, like your initial "DOS prompt" request, provides access to the command line execution of applications, typically operating system commands like "Dir", "LS", etc. "The Command Prompt Behind Everything" sounds like it came from the Hitchhiker's Guide, I don't even know what it means.



So if I said something like

subprocess.checkoutput

It would check any action that has occurred computationally?

No, here's the definition:


subprocess.check_output(args, *, stdin=None, stderr=None, shell=False, universal_newlines=False, timeout=None)

Run command with arguments and return its output as a byte string.

So it will run the command and give you the results in a string. That doesn't have much to do with "checking any action..."

Dave

nickreardon
12-16-2011, 12:36 AM
Ok, this is what I'm trying to explain.

Does the act of taking your mouse and executing an application have a direct command prompt that occurs as a subprocess in an accessible textual environment?
And if the answer is yes, do the executed applications then have their own command prompt that is subprocessing behind what they do?

The idea is this.
I want my computer to make me some money, so I say "Genie, will you make me some money?"

From there, it performs processes that it has learned to do to analyze the information that people have made available over the internet in order to make some money. So it makes a website. But first we would have to teach it how.

It would have to know what processes are getting implemented and how to use those processes. So I put in the input "This is how to make a search." and then perform an action. Can I get it to read which action I have performed? Can I get it to save that information?

tracknut
12-16-2011, 01:00 AM
Very little of what goes on in today's applications is textual. While you might be able to collect an association with a mouse click to a command (e.g. "Run Photoshop"), there will be no list of other commands or subprocesses that are then executed by Photoshop that you could save away to later re-create "this is what was done in photoshop, so next time someone requests photoshop, see if that's what they want to do".

And no, there is no general way you can (from "outside" an application) collect everything its doing and store it for later re-use. A user clicking on stuff, manipulating files, accessing something remotely on the internet, and even knowing when a user has something he'd call an "answer" - all would be impossible (IMO) to store, connect with language, and re-use.

Perhaps I'm not quite the expert you're looking for, because I don't see your vision :)

Dave

nickreardon
12-16-2011, 01:11 AM
It's not impossible, it's not even impractical, it's just a matter of getting it done at this point. Let me at least try to bridge the gap in understanding. A lot of linguists say that the problem is that there aren't many people who understanding programming and understand linguistics. I understand linguistics, and based on what I've learned (and envisioned) I've developed a theory of grammar thats goal is purely computational. It's brand new I promise.

tracknut
12-16-2011, 01:21 AM
Could be you're right Nick. Someone had to invent the light bulb and the vacuum cleaner - maybe you're the guy to do it for natural languages and computers!

Dave

nickreardon
12-16-2011, 01:32 AM
Ok well I've been teaching myself python for a little while now and my understanding is limited. I'm working on getting a program to analyze a few conversations and formulate probable responses given a textual input.

I have made several conversations and saved them as .docx. The only thing that differentiates a speaker from a hearer is the return key. I need to know how I can get a computer to "look" at this data and save every letter as a folder (an array as someone said) within a folder within a folder. This will serve as context.

The "meaning" of the string of arrays will be saved within the final array of the string of arrays as a string (or what I have called a "notepad document within a folder") that has some sort of code(statistical) within it that can be analyzed upon accessing that array.

Are you catching my drift?

nickreardon
12-16-2011, 10:11 PM
Mannn no one?

tracknut
12-16-2011, 10:20 PM
It's just so confusing, hard to know what to say. From your description, if I feed in the sentence "the quick brown fox jumped over the lazy dog", then I will end up with a directory structure that looks like:
folder
|
+- folder
|
+ a
+ b
...
+ z

And in the "g" folder (the final one), you will have a notepad file (text file I guess) with the "meaning" of "the quick brown fox jumped over the lazy dog".

Do I have this right so far? How does this help the computer understand or speak language again?

Dave

nickreardon
12-16-2011, 10:25 PM
I see it like this. Meaning is just the implementation of code. Here's a visual way to represent the sentence "make me money"

m (item)
a (item)
k (item)
e (item)(item)

m (item)
e (item)(item)

m (item)
o (item)
n (item)
e (item)
y (item)(item)(item)

And it determines where to place the word item and how many times it gets placed by counting how often those letters or that phrase occurs.

tracknut
12-16-2011, 10:31 PM
Ok, so from what I can see you now have a way to feed text in one end, and come up with a table that will tell you the number of occurrences of various words in the text. For example the word "and" would have many (item) tags, whereas "linguistics" would have fewer.

So far so good...

Dave

nickreardon
12-16-2011, 10:34 PM
Yea and the idea is that we can take these items, count how much they occur with other items and store that as an item, and then use items to prompt the usage of items. But where the items occur, operations also have to be performed. And the word "and" would have one item tag because it occurs more than once, but the environments (or context) that it's used in would also have item tags for operational purposes.

I have a diagram of a "linguistic string" that I've drawn out here https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=15XYBsevGKkBpj_DXsyx2Rf-4ZG6Z70dD-6gOwiLqAkQ2zB5BIW1vgiRfFhne&hl=en_US

I'm trying to apply this computationally.

tracknut
12-16-2011, 10:41 PM
So in your example, you might track how often the words "me" and "money" appear together, and find out they appear rather frequently, as opposed to "ketchup" and "money". That means something, I suppose.

But then you're saying "where the items occur, operations have to be performed", what operations are you talking about? So far, all I've done is fed in a stream of text and pulled together some figures on word counts and which ones appear nearby others. Where are you associating "operations" with this stream of text?

Dave

nickreardon
12-16-2011, 10:47 PM
It would track how often the items occur together and when it is appropriate to use the items that cooccur through making a response and/or performing an operation. The operations would exist as the statistic that the code analyzes to determine a probable response.

tracknut
12-16-2011, 10:51 PM
But that reminds me of the old cartoon with a guy standing at a blackboard. On the left side are some simple equations, 1+5=6, sqrt(9)=3, etc. On the right side it says something like "therefore cold fusion is possible!". And in the middle it says "magic happens here". I can see no connection from this histogram of word usage to "making a response and/or performing an operation".

Dave

nickreardon
12-16-2011, 10:54 PM
I edit my replies after I send them, sorry. There's no magic involved I promise. It's pretty simple. Just need access to the right resources.

The idea is that with enough conversational input, this thing could have a probable conversation. This thing won't just say anything, it'll form the most probable response to your input. It could even form sentences with no previous knowledge of the input.

I do this with something I call "prompting" in grammar.

tracknut
12-16-2011, 10:57 PM
No problem. I suspect I am not the right resource, so we're just having a pleasant discussion at this point. Perhaps someone else has a better grasp of what you're looking for and will jump in with different thoughts.


The idea is that with enough conversational input, this thing could have a probable conversation. This thing won't just say anything, it'll form the most probable response to your input. It could even form sentences with no previous knowledge of the input.
I guess I should have waited for this part. Yes, I can certainly see that from the data you've supplied (in request/response format), that given enough input, you could simulate a conversation with a computer. It could track the "most likely response to this sentence" and send you back some text that was likely to make sense. It could also default to "...and how does that make you feel?" if it doesn't have enough statistical data on the input sentence. I'm not convinced this is a computer "understanding language" more than replaying previously heard statements though. What if I ask it "how's the weather?" - it will look at previous entries and tell me "The weather is nice" (or whatever) all the time, because that's the most likely response in its dictionary?

Dave

nickreardon
12-16-2011, 11:02 PM
Thanks, any help is better than no help.

nickreardon
12-17-2011, 12:16 AM
omg there's so much I want to do but don't know how to put into codeeeee.

alykins
12-17-2011, 03:29 AM
No problem. I suspect I am not the right resource, so we're just having a pleasant discussion at this point. Perhaps someone else has a better grasp of what you're looking for and will jump in with different thoughts.


I am more than willing to bite at this point- I've been scanning these threads... I am sooooooooo freaking confused though at the allocations of numbers in your picture... to me it is exactly like this (http://www.youtube.com/watch?v=WESgHPk9gyo) clip... what does a bean mean?!?!?!?! what do the numbers mean? how do they get there? there doesn't seem to be any rhyme or reason, there are just random number allocations... can you explain that part better?

nickreardon
12-17-2011, 04:36 AM
ok my grammar includes these concepts, prompting, which I have described as a probable cooccurrence that two items occur within the same environment. Heres an example

How are you?
I am fine.

How prompts fine.
are prompts am
you prompts I.

These are probable cooccurrences. We can track these with statistics. Babies use this method when developing linguistic competence. The word "the" does not occur without another word(item) following it, unless used in a specific environment.

We can use this function to form what I call "presumptions." I have described this as the assumption of pre-existing context that could possibly or would probably lead to a particular linguistic occurrence. We can form presumptions and use that context to form more a probable response. All the items that occur in between the most probable items are prompted linearly (as I say). Which means that phrases like this

Where is my car?
prompt the response
in the garage.
or
on the street.

The words garage and street occur most probably with the two words that occur before it. This is called linear prompting.

I'll post more later

nickreardon
12-17-2011, 08:52 PM
I'm starting a business. Anyone who'd like to get involved should contact me via email. Send some sort of resume, contact information, and any questions you may have.

nreardon786@gmail.com

nickreardon
12-17-2011, 11:00 PM
I hope no one is stealing my ideas and using them for their own profit. Please involve me in some way.

nickreardon
12-18-2011, 12:09 AM
I am more than willing to bite at this point- I've been scanning these threads... I am sooooooooo freaking confused though at the allocations of numbers in your picture... to me it is exactly like this (http://www.youtube.com/watch?v=WESgHPk9gyo) clip... what does a bean mean?!?!?!?! what do the numbers mean? how do they get there? there doesn't seem to be any rhyme or reason, there are just random number allocations... can you explain that part better?

I also posted a document that shows the allocation of these numbers better. You can find it here https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=15XYBsevGKkBpj_DXsyx2Rf-4ZG6Z70dD-6gOwiLqAkQ2zB5BIW1vgiRfFhne&hl=en_US.

Where every unit of knowledge is on that string diagram, consider that a 1. So for every layer of units, there is a 1, it adds up until it reaches a higher integer. We have to consider every layer as its own statistic that gets correlated with other layers (say layer 3 "the bed" gets correlated with layer 2 "sleep") through noticing a probable cooccurrence (or that layer 3 "the bed" occurs 1846 times within the context of layer 2 "sleep")

I left out a layer in that string diagram though, the one associated with individual symbols (layer 1).

It may take a little thought

Look at it like this. "This is the context."
(0)T_(1)t<-H->i(1)h<-I->s(1)i<-S-> (1)s<- ->(1)(0)<-this-> is(2)<-I->s(1)i<-S-> (1)this <-is -> the(2)s<- ->t(1)t<-H->e(1)h<-E-> (1)e<- ->c(1)is <-the ->context (2) <-C->o(1)c<-O->n(1)o<-N->t(1)n<-T->e(1)t<-E->x(1)e<-X->t(1)x<-T->.(1)t<-.-> (1)the <-CONTEXT->.(2)is <-the context.->(0)(3)(0)<-This is the context.->(0)(4)

What comes before the arrow is the context that ques the usage of what comes in between the arrows, and what comes after the arrows is what gets prompted. These layers prompt other layers. Layer 2 prompts layer 1, but can also prompt layer 4 like in this example.

Where?
In the garage.

In this case, layer 2 prompts layer 4 because people most probably respond with the entire layer 4 phrase "in the garage" as opposed to the layer 2 "garage" or the layer 3 "the garage". But the prompting in this example would also require context. What gets me is something like this.

Hey.
Hey whats up?
Nothing really how are things?
They're good.
You still in school?
Yea still studying.
Thats good, how is your girlfriend?
She's great, she started class this week too.

In this example, the word "school" prompts the usage of "too" in the phrase "she started class this week too" which falls further down in the conversation. I think this can be simulated with the help of presumption. For every input and output, it forms presumptions to either strengthen or aid the context. Presumptions are the result of reverse prompting. Consider this.

(Start Conversation)
He left me.

How can we form a probable response for this? The computer would take the words and reverse prompt them by finding the most probable context that would prompt the usage of that particular item and using that in a probable sentence. A prototypical presumption of this example would be "I have a boyfriend."
"He" is reverse prompted to "a boyfriend," "left" is reverse prompted to "have," "me" is reverse prompted to "I". These layers combine to form a presumption that you can use to form a more probable response. (There would also be other presumptions formed before reaching the prototypical presumption).

And it seems that it may be the case that presumption occurs both forwards and backwards. Consider this example.

1. I really don't like him.
2. That's not good.
3. No I really really don't like him.
4. You're not gunna kill him...

It seems that from sentence 3, the speaker has forward prompted to form an assumption in order to respond with sentence 4.

tracknut
12-18-2011, 04:35 AM
I hope no one is stealing my ideas and using them for their own profit. Please involve me in some way.

If you are serious, and consider this your intellectual property of your new business, you might curtail posting details about it. Just a thought, certainly if I were in your shoes I wouldn't be posting something that was proprietary and fundamental to the success of my business.

Dave

nickreardon
12-18-2011, 05:52 AM
ive decided that I don't care. Starting a business means taking risks. Either way, this program will be able to make you money just by asking it. I wanna be involved in this project but ultimately I just hope that this program gets made. I hope I can make it and put it on facebook before you guys can though bahahaha.

nickreardon
03-27-2012, 02:03 AM
I've finalized an equation for cognitive acquisition. Search it on the internet you'll find it



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum