Parse Analysis and Generation of Turkish Words
v1.0
Gram tests on Turkish word morphology and generating words from parsed words using foma and TRmorph.
|
Gebze Institute of Tech. CSE 484 Introduction to NLP HW01 Gram tests on Turkish word morphology and generating words from parsed words using foma and TRmorph. To use program you have to download and install foma. More...
#include <cstdio>
#include <iostream>
#include <cstdlib>
#include <vector>
#include <string>
#include <sstream>
#include <queue>
#include <iomanip>
#include <fstream>
#include "ngram.hpp"
Functions | |
void | initWords (char *fileName, vector< string > &vect) |
bool | writeGeneratorFileScript (char *fileName, string str) |
bool | editWordFile (char *fileName) |
void | printLoading (int loading) |
void | runScriptForEveryWord (vector< string > &vect, string argv2) |
void | ngramParserFromFile (NGram ngram[]) |
void | wordGenerator (NGram ngram[], string fst, int oneG, int twoG, int threeG) |
void | printResults (NGram ngram[], string fileName) |
void | printNSize (NGram ngram[], int n) |
int | main (int argc, char *argv[]) |
Gebze Institute of Tech. CSE 484 Introduction to NLP HW01 Gram tests on Turkish word morphology and generating words from parsed words using foma and TRmorph. To use program you have to download and install foma.
Main Side of Program.
[1] Mans Hulden, Finite-State Compiler and C Library http://code.google.com/p/foma/
[2] Çağrı Çöltekin, TRmorph: A Turkish Morphological Analyzer https://github.com/coltekin/TRmorph
bool editWordFile | ( | char * | fileName | ) |
Opens and edits created word file. It is used for editing generated words' list.
fileName | Name of source file |
void initWords | ( | char * | fileName, |
vector< string > & | vect | ||
) |
Reads words from text to initialize word list
filename | Name of source file |
vect | Word list vector |
int main | ( | int | argc, |
char * | argv[] | ||
) |
Main routine of program.
*use case
*read file get words
*parse words
*analyse parsed parts
*generate words
void ngramParserFromFile | ( | NGram | ngram[] | ) |
Analyses morphologies of words according to 1-Gram, 2-Gram and 3-Gram.
ngram | NGram instances to get 1-Gram, 2-Gram, 3-Gram results |
void printLoading | ( | int | loading | ) |
Prints loading state to terminal.
loading | Percent of jobs done |
void printNSize | ( | NGram | ngram[], |
int | n | ||
) |
Prints just n number of results for per gram test.
ngram | NGram instances 1-Gram, 2-Gram, 3-Gram |
n | Number of POS Tag will be printed to terminal for per gram |
void printResults | ( | NGram | ngram[], |
string | fileName | ||
) |
Writes whole analyse result to file.
ngram | NGram instances 1-Gram, 2-Gram, 3-Gram |
fileName | Name of file where results will be kept |
void runScriptForEveryWord | ( | vector< string > & | vect, |
string | argv2 | ||
) |
Runs flookup script for per word in word list. Writes parsing result to a file which is used in ngramParserFromFile function.
vect | Word list vector |
argv2 | File name like "trmorph.fst" |
void wordGenerator | ( | NGram | ngram[], |
string | fst, | ||
int | oneG, | ||
int | twoG, | ||
int | threeG | ||
) |
Generates word generator codes and writes them to file to generate words. Function generates words using that codes. Runs foma script to generate words. Edits generated word file to delete inappropriate datas.
ngram | NGram instances 1-Gram, 2-Gram, 3-Gram |
fst | File name like "trmorph.fst" |
oneG | While generating word, use top n number POS Tags in 1Gram |
twoG | While generating word, use top n number POS Tags in 2Gram |
threeG | While generating word, use top n number POS Tags in 3Gram |
bool writeGeneratorFileScript | ( | char * | fileName, |
string | str | ||
) |
Writes string to file. Writes root + tags to generate new words
fileName | Name of output file |
str | Content that will be written to output file |