conversion code Perl en un code java

**soumti84** · 28/01/2009, 18h54

Salut,
s'il vous plaît je cherche un outil de conversion du code Perl en un code
Java vraiment j'en ai trop besoin

le code est:

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
#!/usr/bin/perl
 
my $CMD="tokenise.pl";
my $VERSION="1.1";
my $MODIFIED="30/6/2004";
 
###########################################################################
# tokenises text for French TreeTagger
# Achim Stein <achim@ims.uni-stuttgart.de>
###########################################################################
 
# Separate characters at the beginning of words
my $begin_char='[|{(\/\´\`"»«\202\204\206\207\213\221\222\223\224\225\226\227\233°';
 
# Separate characters at the end of words
my $end_char=']|}\/\'\`\"),;:\!\?\%»«\202\204\205\206\207\211\213\221\222\223\224\225\226\227\233°|';
 
# Separate strings at the beginning of words
my $begin_string='[dcjlmnstDCJLNMST]\'|[Qq]u\'|[Jj]usqu\'|[Ll]orsqu\'';
 
# Separate strings at the end of words
my $end_string='-t-elles?|-t-ils?|-t-on|-ce|-elles?|-ils?|-je|-la|-les?|-leur|-lui|-mêmes?|-m\'|-moi|-nous|-on|-toi|-tu|-t\'|-vous|-en|-y|-ci|-là';
 
###########################################################################
#                    DO NOT MODIFY FOLLOWING CODE !
###########################################################################
 
my $HELP="
---------------------------------------------------------------------------
$CMD $VERSION (c) Achim Stein $MODIFIED
---------------------------------------------------------------------------
FUNCTION: segments text for French TreeTagger
SYNTAX:   $CMD [options] <file>
OPTIONS:
  -h        print help screen
  -w        replace whitespace by SGML-Tags (use TreeTagger -sgml option!)
EXAMPLE:
tokenise.pl test.txt | tree-tagger parfile -token -lemma -sgml > test.tgd
";
 
###########################################################################
# parse command line
###########################################################################
 
use Getopt::Std;
getopts('hw');
 
if(defined($opt_h)) {
  print STDERR "$HELP";
  exit(1);
}
 
###########################################################################
# read the file
###########################################################################
 
while (<>) {
 
# delete \r
s/\r//g;
 
# replace blanks within SGML Tags
    while (s/(<[^<> ]*)[ \t]([^<>]*>)/$1\377$2/g) {};
 
# replace whitespace by SGML-Tags
if(defined($opt_w)) {
    s/\n/<internal_NL>/g;
    s/\t/<internal_TAB>/g;
    s/ /<internal_BL>/g;
}
 
# restore SGML Tags
    tr/\377/ /;
 
# put special characters around SGML Tags for tokenisation
    s/(<[^<>]*>)/\377$1\377/g;
    s/(&[^; \t\n\r]*;)/\377$1\377/g;
    s/^\377//;
    s/\377$//;
    s/\377\377/\377/g;
 
    @S = split("\377");
    for($i=0; $i<=$#S; $i++) {
	$_ = $S[$i];
 
	# skip lines with  only SGML tags
	if (/^<.*>$/) {
	    print $_,"\n";
	}
	# normal text
	else {
	    # put spaces at beginning and end
	    $_ = ' '.$_.' ';
	    # put spaces around punctuation
	    s/(\.\.\.)/ ... /g;
	    s/([;\!\?\/])([^ ])/$1 $2/g;
	    s/([.,:])([^ 0-9.])/$1 $2/g;
 
	    @F = split;
	    for($j=0; $j<=$#F; $j++) {
		my $suffix="";
		$_ = $F[$j];
		# cut off punctuation and brackets
		do {
		    $finished = 1;
		    # preceding brackets etc.
		    if (s/^([$begin_char])(.)/$2/) {
			print $1,"\n";
			$finished = 0;
		    }
		    # following brackets etc.
		    if (s/(.)([$end_char])$/$1/) {
			$suffix = "$2\n$suffix";
			$finished = 0;
		    }
		    # cut off dot after punctuation etc.
		    if (s/([$end_char])\.$//) { 
			$suffix = ".\n$suffix";
			if ($_ eq "") {
			    $_ = $1;
			}
			else {
			    $suffix = "$1\n$suffix";
			}
			$finished = 0; 
		    }
		}
		while (!$finished);
 
		# deal with listed tokens
		if (defined($Token{$_})) {
		    print "$_\n$suffix";
		    next;
		}
 
		# deal with abbrevs like U.S.A.
		if (/^([A-Za-zہ-ے]\.)+$/) {
		    print "$_\n$suffix";
		    next;
		}
 
		# ordinal numbers
		if (/^[0-9]+\.$/) {
		    print "$_\n$suffix";
		    next;
		}
 
		# deal with differnt types of dots
		if (/^(..*)\.$/ && $_ ne "...") {
		    $_ = $1;
		    $suffix = ".\n$suffix";
		    if (defined($Token{$_})) {
			print "$_\n$suffix";
			next;
		    }
		}
 
		# cut  clitics off
		while (s/^($begin_string)(.)/$2/) {
		    print $1,"\n";
		}
		while (s/(.)($end_string)$/$1/) {
		    $suffix = "$2\n$suffix";
		}
		print "$_\n$suffix";
	    }
	}
    }
}

S'il n'y a pas d'outils est-ce qu'il y a une solution.
j'attends vos idées

Invité · 28/01/2009, 20h58

Utilise la balise CODE s'il-te-plaît (signe #), on y verrait plus clair...

**thierry.chich** · 31/01/2009, 20h48

Il n'y en a pas. Le script est trop pur perl pour pouvoir être converti automatiquement en java.

La technique, c'est de trouver un gars qui a la double compétence.