A quick introduction to NLP...
Jul. 30th, 2010 01:48 amNLP stands for Natural Language Processing, by the way.
In the context of this exercise, I will be using Prolog, a logic-based programming language where you don't deliberately write what the computer is supposed to do directly, but give it the input, the general structure of the problem, and some requirements for the solution and the language does the rest.
(perhaps a better description is that it takes the general structure of a problem, then tries to prove queries about the problem)
Before I show you how to recognize sentences, you need to know how Prolog works in general.
For instance, variables start with an upper-case letter, everything else is a constant (also known as an atom) or a function (predicate, from Predicate Calculus. Though there's no math involved).
Statements (or rules) end with a period or full-stop ('.'), and to 'and' things together, you use a comma (',').
Lists can be of anything, are entered as [stuff,goes in the list], and you can even mix different things together.
That should be enough to start, so the next question should be:
How do we represent a sentence in Prolog?
Fairly simply, actually, as a list of constants (atoms).
For example, the sentence "John learned Algebra" would be written as [john,learned,algebra].
Each word in the sentence is converted to lower-case and used as atoms in the list.
Punctuation is usually ignored, unless it happens to be important, which it's not.
To parse this as a sentence, you need to understand that 'john' and 'algebra' are nouns (we don't care about the different types just yet) and 'learned' is a verb.
First, the nouns:
noun(john) --> [john].
noun(algebra) --> [algebra].
In this example, nouns act similar to single-argument functions, with the 'noun' part indicating the name, the stuff between brackets indicating the single parameter, the arrow indicating that it's a "grammar function" (more on this later), the square brackets surrounding words in the sentence list, and the final period or dot indicating the end of the rule.
While this isn't really a function, you can still think of it in this way until you learn otherwise.
Next, we have the verb:
verb(Person, Subject, learned(Person, Subject)) --> [learned].
This one is more complicated, as we need to include information about who did what as well as what the 'what' is.
So, the first parameter (in this case) is the name of the person who learned something, the second parameter is the subject they learned, and the third argument puts those two together to produce a "meaning representation" that Prolog can use later.
This is where you'll probably get confused and ask what the third argument is really doing.
The answer is fairly simple: Prolog only allows predicates (function-like thing) to return True or False, which means that when you need to pass data back out, there has to be another way.
Prolog's goal is to solve logical sets of rules to prove if something is true, not blindly execute statements until the answer comes out like so many other languages.
This means that parameters aren't passed to functions in a one-way fashion, they're more "matched" with the arguments associated with it, which is what allows data to be passed back out as well as passed in as usual.
(this also means that, with no predeclared input and output arguments, you can choose to run the function "forwards", the way you'd intended it, or "backwards", the reverse direction)
And now for the final piece that puts it all together:
sentence(Meaning) --> noun(Noun1), verb(Noun1, Noun2, Meaning), noun(Noun2).
In this example, the rule is not depending directly on words found in the sentence list, but on how other rules have interpreted the words, in this case, a noun followed by a verb followed by another noun.
Running the sentence rule on the sentence above:
?- sentence(Meaning, [john,learned,algebra], []).
(where '?-' is the Prolog prompt)
The result is like this:
Meaning = learned(john, algebra).
true.
This is how Prolog indicates results to interactive queries, by specifying what would have to go in variables to make the query true.
(in this case, there is only one result to make the query true, so that's all it gives)
The result comes in a similar form to the rule definitions that Prolog uses (learned(john, algebra)).
This is not an accident, it's actually rather useful.
This means that you can use this meaning representation to ask other rules if things are true without needing huge amounts of extra rules to convert the meaning into something useful, like you'd have to do in other programming languages.
If you know a lot about a particular written language's grammar (English, for example), this is almost all you need to get started writing full language parsers in Prolog.
Granted, it's greatly simplified and I've left lots of stuff out, but if you managed to get that far, you're probably smart enough to find out what's missing.
Oh, and if you were wondering if this could be used backwards (as I mentioned above about Prolog's capacity to run almost anything backwards), it's very possible:
?- sentence(learned(john, algebra), Sentence, []).
Sentence = [john, learned, algebra].
true.
Granted, running grammars backwards doesn't always work perfectly, especially if you're converting things in a slightly unstable way, but it is very possible and fairly easy and almost free, you just have to learn a bit more about Prolog first.
Anyway, I said all that to help with this:
As easy as Prolog can be to parse written languages, how about writing a text-based adventure game with it?
TBAGs have the advantage of only needing to run the parser one way (forwards, on each command the user gives it) and usually have very simple command sets.
(it would be rather unwieldy to use the parser to generate all output the player sees, as it would cause your list of grammar rules to balloon to an enormous size and waste all sorts of space and time)
At present, I have a tiny demo actually running, and I can give it some typical commands (primarily move and look so far).
What I'll need to do next is figure out how to have objects' properties change what you can do in the room.
(ie, unlock doors, open boxes, take items, etc.)
Anyway, I've got less than a hundred lines of code (as there are quite a number of rules that span several lines) and it's an (almost) working prototype.
I'm still rather impressed...
In the context of this exercise, I will be using Prolog, a logic-based programming language where you don't deliberately write what the computer is supposed to do directly, but give it the input, the general structure of the problem, and some requirements for the solution and the language does the rest.
(perhaps a better description is that it takes the general structure of a problem, then tries to prove queries about the problem)
Before I show you how to recognize sentences, you need to know how Prolog works in general.
For instance, variables start with an upper-case letter, everything else is a constant (also known as an atom) or a function (predicate, from Predicate Calculus. Though there's no math involved).
Statements (or rules) end with a period or full-stop ('.'), and to 'and' things together, you use a comma (',').
Lists can be of anything, are entered as [stuff,goes in the list], and you can even mix different things together.
That should be enough to start, so the next question should be:
How do we represent a sentence in Prolog?
Fairly simply, actually, as a list of constants (atoms).
For example, the sentence "John learned Algebra" would be written as [john,learned,algebra].
Each word in the sentence is converted to lower-case and used as atoms in the list.
Punctuation is usually ignored, unless it happens to be important, which it's not.
To parse this as a sentence, you need to understand that 'john' and 'algebra' are nouns (we don't care about the different types just yet) and 'learned' is a verb.
First, the nouns:
noun(john) --> [john].
noun(algebra) --> [algebra].
In this example, nouns act similar to single-argument functions, with the 'noun' part indicating the name, the stuff between brackets indicating the single parameter, the arrow indicating that it's a "grammar function" (more on this later), the square brackets surrounding words in the sentence list, and the final period or dot indicating the end of the rule.
While this isn't really a function, you can still think of it in this way until you learn otherwise.
Next, we have the verb:
verb(Person, Subject, learned(Person, Subject)) --> [learned].
This one is more complicated, as we need to include information about who did what as well as what the 'what' is.
So, the first parameter (in this case) is the name of the person who learned something, the second parameter is the subject they learned, and the third argument puts those two together to produce a "meaning representation" that Prolog can use later.
This is where you'll probably get confused and ask what the third argument is really doing.
The answer is fairly simple: Prolog only allows predicates (function-like thing) to return True or False, which means that when you need to pass data back out, there has to be another way.
Prolog's goal is to solve logical sets of rules to prove if something is true, not blindly execute statements until the answer comes out like so many other languages.
This means that parameters aren't passed to functions in a one-way fashion, they're more "matched" with the arguments associated with it, which is what allows data to be passed back out as well as passed in as usual.
(this also means that, with no predeclared input and output arguments, you can choose to run the function "forwards", the way you'd intended it, or "backwards", the reverse direction)
And now for the final piece that puts it all together:
sentence(Meaning) --> noun(Noun1), verb(Noun1, Noun2, Meaning), noun(Noun2).
In this example, the rule is not depending directly on words found in the sentence list, but on how other rules have interpreted the words, in this case, a noun followed by a verb followed by another noun.
Running the sentence rule on the sentence above:
?- sentence(Meaning, [john,learned,algebra], []).
(where '?-' is the Prolog prompt)
The result is like this:
Meaning = learned(john, algebra).
true.
This is how Prolog indicates results to interactive queries, by specifying what would have to go in variables to make the query true.
(in this case, there is only one result to make the query true, so that's all it gives)
The result comes in a similar form to the rule definitions that Prolog uses (learned(john, algebra)).
This is not an accident, it's actually rather useful.
This means that you can use this meaning representation to ask other rules if things are true without needing huge amounts of extra rules to convert the meaning into something useful, like you'd have to do in other programming languages.
If you know a lot about a particular written language's grammar (English, for example), this is almost all you need to get started writing full language parsers in Prolog.
Granted, it's greatly simplified and I've left lots of stuff out, but if you managed to get that far, you're probably smart enough to find out what's missing.
Oh, and if you were wondering if this could be used backwards (as I mentioned above about Prolog's capacity to run almost anything backwards), it's very possible:
?- sentence(learned(john, algebra), Sentence, []).
Sentence = [john, learned, algebra].
true.
Granted, running grammars backwards doesn't always work perfectly, especially if you're converting things in a slightly unstable way, but it is very possible and fairly easy and almost free, you just have to learn a bit more about Prolog first.
Anyway, I said all that to help with this:
As easy as Prolog can be to parse written languages, how about writing a text-based adventure game with it?
TBAGs have the advantage of only needing to run the parser one way (forwards, on each command the user gives it) and usually have very simple command sets.
(it would be rather unwieldy to use the parser to generate all output the player sees, as it would cause your list of grammar rules to balloon to an enormous size and waste all sorts of space and time)
At present, I have a tiny demo actually running, and I can give it some typical commands (primarily move and look so far).
What I'll need to do next is figure out how to have objects' properties change what you can do in the room.
(ie, unlock doors, open boxes, take items, etc.)
Anyway, I've got less than a hundred lines of code (as there are quite a number of rules that span several lines) and it's an (almost) working prototype.
I'm still rather impressed...