Skip to content

Commit ca5f552

Browse files
committed
BLD initial commit
1 parent 28c37c2 commit ca5f552

File tree

3,186 files changed

+107688
-1
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

3,186 files changed

+107688
-1
lines changed

.gitignore

+51
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# Compiled source #
2+
###################
3+
*.com
4+
*.class
5+
*.dll
6+
*.exe
7+
*.o
8+
*.out
9+
*.so
10+
*.pyc
11+
*.pyx
12+
__pycache__/
13+
*.gch
14+
*.pch
15+
16+
# Packages #
17+
############
18+
# it's better to unpack these files and commit the raw source
19+
# git has its own built in compression methods
20+
*.7z
21+
*.dmg
22+
*.gz
23+
*.iso
24+
*.jar
25+
*.rar
26+
*.tar
27+
*.zip
28+
29+
# Logs and databases #
30+
######################
31+
*.log
32+
*.sql
33+
*.sqlite
34+
.ipynb_checkpoints/
35+
36+
# OS generated files #
37+
######################
38+
.DS_Store
39+
.DS_Store?
40+
._*
41+
.Spotlight-V100
42+
.Trashes
43+
ehthumbs.db
44+
Thumbs.db
45+
46+
# Temporary files #
47+
###################
48+
*~
49+
*.swp
50+
*.swo
51+
*.dSYM/

README.md

+3-1
Original file line numberDiff line numberDiff line change
@@ -1 +1,3 @@
1-
# textcat
1+
# TextCat
2+
3+
A simple text categorizer.

data/analyze.pl

+185
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,185 @@
1+
# Analyze results
2+
3+
$ARGC = @ARGV;
4+
if ($ARGC != 2) {
5+
print STDERR "Usage: analyze.pl PREDICTIONS ACTUAL\n";
6+
print STDERR " PREDICTION: list of system's predictions\n";
7+
print STDERR " ACTUAL: list of the actual labels\n";
8+
exit;
9+
}
10+
11+
$results = $ARGV[0];
12+
$actual = $ARGV[1];
13+
14+
# Process results file.
15+
16+
print STDERR "Processing answer file...\n";
17+
18+
if (!(open(ANSWER, "<$actual"))) {
19+
print STDERR "Error: could not open $actual for input.\n";
20+
exit;
21+
}
22+
23+
# Load the category information
24+
25+
# Stores size of each category
26+
%cat_size = ();
27+
# Set to true for seen categories
28+
%have_seen = ();
29+
# Stores the actual category of every document.
30+
%actual_cat = ();
31+
32+
while ($line = <ANSWER>) {
33+
if (!($line =~ /([^ ]*) (.*)/)) {
34+
print STDERR "Error: Line is actual file not in expected format!\n";
35+
exit;
36+
}
37+
$number = $1;
38+
$actual_cat{$number} = $2;
39+
$have_seen{$2} = 1;
40+
$cat_size{$2} += 1;
41+
}
42+
43+
# Stores list of categories
44+
@categories = keys %have_seen;
45+
$num_categories = @categories;
46+
47+
print "Found $num_categories categories:";
48+
foreach $cat (@categories) {
49+
print " $cat";
50+
}
51+
print "\n";
52+
53+
# Process results file.
54+
55+
print STDERR "Processing prediction file...\n";
56+
57+
if (!(open(RESULTS, "<$results"))) {
58+
print STDERR "Error: could not open $results for input.\n";
59+
exit;
60+
}
61+
62+
# Stores the predicted category of every document.
63+
%predicted_cat = ();
64+
# First dimension of contingency table (row) is system's prediction,
65+
# second dimension (column) is actual category.
66+
%contingency = ();
67+
68+
while ($line = <RESULTS>) {
69+
if (!($line =~ /([^ ]*) (.*)/)) {
70+
print STDERR "Error: Line in results file not in expected format!\n";
71+
exit;
72+
}
73+
$number = $1;
74+
$predicted_cat{$number} = $2;
75+
76+
$contingency{$predicted_cat{$number}}{$actual_cat{$number}} += 1;
77+
}
78+
79+
# Determine overall accuracy.
80+
$correct = $incorrect = 0;
81+
foreach $cat (@categories) {
82+
foreach $cat2 (@categories) {
83+
if ($cat eq $cat2) {
84+
$correct += $contingency{$cat}{$cat2};
85+
}
86+
else {
87+
$incorrect += $contingency{$cat}{$cat2};
88+
}
89+
}
90+
}
91+
if ($correct + $incorrect > 0) {
92+
$ratio = $correct / ($correct + $incorrect);
93+
}
94+
else {
95+
$ratio = "UNDEFINED";
96+
}
97+
print "\n$correct CORRECT, $incorrect INCORRECT, RATIO = $ratio.\n\n";
98+
99+
# Display contingency table, calculate precisions, recalls, and F_1s.
100+
101+
# Header row.
102+
print "CONTINGENCY TABLE:\n ";
103+
foreach $cat (@categories) {
104+
if (length($cat) > 7) {
105+
$header = substr($cat, 0, 7);
106+
}
107+
else {
108+
$header = $cat;
109+
}
110+
printf ("%-8s", $header);
111+
}
112+
print "PREC\n";
113+
114+
# Category rows.
115+
%precision = ();
116+
foreach $cat (@categories) {
117+
if (length($cat) > 7) {
118+
$header = substr($cat, 0, 7);
119+
}
120+
else {
121+
$header = $cat;
122+
}
123+
printf ("%-8s", $header);
124+
125+
$correct = $incorrect = 0;
126+
foreach $cat2 (@categories) {
127+
if ($contingency{$cat}{$cat2} eq "") {
128+
$contingency{$cat}{$cat2} = 0;
129+
}
130+
printf ("%-8d", $contingency{$cat}{$cat2});
131+
if ($cat eq $cat2) {
132+
$correct += $contingency{$cat}{$cat2};
133+
}
134+
else {
135+
$incorrect += $contingency{$cat}{$cat2};
136+
}
137+
}
138+
if ($correct + $incorrect > 0) {
139+
$prec = $correct / ($correct + $incorrect);
140+
}
141+
else {
142+
$prec = 0;
143+
}
144+
$precision{$cat} = $prec;
145+
printf("%.2f\n", $prec);
146+
}
147+
148+
# Recall row.
149+
%recall = ();
150+
print "RECALL ";
151+
foreach $cat (@categories) {
152+
$correct = $incorrect = 0;
153+
foreach $cat2 (@categories) {
154+
# Now $cat is the column and $cat2 is the row.
155+
if ($cat eq $cat2) {
156+
$correct += $contingency{$cat2}{$cat};
157+
}
158+
else {
159+
$incorrect += $contingency{$cat2}{$cat};
160+
}
161+
}
162+
if ($correct + $incorrect > 0) {
163+
$rec = $correct / ($correct + $incorrect);
164+
}
165+
else {
166+
$rec = 0;
167+
}
168+
$recall{$cat} = $rec;
169+
printf("%-8.2f", $rec);
170+
}
171+
print "\n\n";
172+
173+
# F_1 values.
174+
%f1 = ();
175+
foreach $cat (@categories) {
176+
if ($precision{$cat} + $recall{$cat} > 0) {
177+
$f1{$cat} = (2 * $precision{$cat} * $recall{$cat}) /
178+
($precision{$cat} + $recall{$cat});
179+
}
180+
else {
181+
$f1{$cat} = 0;
182+
}
183+
print "F_1($cat) = $f1{$cat}\n";
184+
}
185+
print "\n";

data/corpus1/test/03785.article

+35
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
2+
LONDON (Reuter) - Anti-terrorist police Saturday sifted
3+
through debris for clues after an explosion in a west London
4+
litter bin, but there was still no indication of who was
5+
responsible.
6+
No one was injured, but two cars and windows in nearby shops
7+
and homes were damaged in the blast just after midnight outside
8+
a cemetery in Old Brompton Road, near the vast Earl's Court
9+
exhibition hall.
10+
There was no advance warning of the blast, no claim of
11+
responsibility and no obvious target.
12+
London has been in a state of high alert since the Irish
13+
Republican Army last month ended a 17-month cease-fire in its
14+
campaign of violence against British rule in Northern Ireland
15+
and set off bombs in east London's Docklands redevelopment area
16+
and a double-decker bus in the city center.
17+
``There was no warning. We have had no reports of injuries.
18+
The damage was confined to a couple of cars and windows in
19+
buildings opposite,'' police chief superintendent Peter Rice
20+
told reporters after the litter bin bombing.
21+
Police declined to make any link with the IRA, saying their
22+
forensic work was continuing.
23+
A large area around the blast scene was cordoned off during
24+
the night and sniffer dogs were brought in to search for
25+
possible secondary devices in the area, which is well known for
26+
its nightclubs.
27+
The IRA, frustrated by the lack of progress toward Northern
28+
Irish peace talks, ended its cease-fire on February 9 by
29+
detonating a huge bomb in the eastern Docklands financial
30+
district. Two people died in the blast.
31+
Nine days later an IRA guerrilla died when the device he was
32+
carrying exploded prematurely on a London bus.
33+
The guerrillas said this week they were ready for another 25
34+
years of war if London and Dublin failed to come up with a ``new
35+
deal'' for peace in Northern Ireland.

data/corpus1/test/04043.article

+70
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
2+
WASHINGTON (Reuter) - The United States Tuesday gave another
3+
sharp warning to China against invading Taiwan, and the House of
4+
Representatives approved a resolution stating that Washington
5+
should help defend Taiwan in such an attack.
6+
A senior U.S. official also said Taiwan had asked the United
7+
States in day-long talks here to provide diesel submarines to
8+
Taipei for naval defense, a request rejected by Washington last
9+
year.
10+
The official, who asked not to be identified, said the
11+
request came in annual talks on possible new arms sales to
12+
Taipei as tension grew over Chinese maneuvers off the island in
13+
advance of Taiwan's presidential elections on Saturday.
14+
The Pentagon Tuesday put China on notice that ``America has
15+
the best damned Navy in the world'' and could sail the Taiwan
16+
Strait if it chose to, but the State Department hastened to say
17+
this was not meant to provoke Beijing.
18+
The defense resolution, overwhelmingly passed by the House
19+
despite White House opposition, is not binding on the Clinton
20+
administration. It was drafted in response to Chinese military
21+
exercises n the Taiwan Strait.
22+
``America has the best damned Navy in the world, and no one
23+
should ever forget that,'' said Defense Secretary William Perry
24+
in an obvious warning to Beijing not to launch an attack on
25+
Taiwan.
26+
``Beijing should know -- and this (a gathering U.S. armada)
27+
will remind them -- that, while they are a great military power,
28+
that the premier, the strongest military power, in the western
29+
Pacific is the United States,'' Perry said in a speech to
30+
current and former members of Congress.
31+
He referred to a U.S. naval task force gathering in the
32+
Taiwan region -- although well away from the strait separating
33+
China from what it regards as its renegade province -- to be
34+
centered around two aircraft carriers.
35+
At the State Department, where a reporter asked about the
36+
apparent escalation in rhetoric, spokesman Glyn Davies said
37+
there was no such intention on the U.S. side.
38+
Davies said Perry's comments were ``about his pride in the
39+
United States armed forces and the United States Navy.''
40+
``That was just a statement of fact, that it is the best
41+
navy in the world. But there's no racheting up here, from our
42+
standpoint, of the rhetoric at all.''
43+
At a separate Pentagon briefing, however, Defense spokesman
44+
Ken Bacon noted that the carrier Nimitz would arrive off Taiwan
45+
on Saturday or Sunday to join the carrier Independence -- and he
46+
he rejected China's warning against sending U.S. warships
47+
through the Taiwan Strait.
48+
``We reserve the right to sail in international waters,''
49+
Bacon said. ``Whether or not we sail through the Strait of
50+
Taiwan has not been decided.''
51+
The United States and China, meanwhile, took steps that
52+
might help ease tensions, although not immediately.
53+
In Prague, where Secretary of State Warren Christopher was
54+
traveling, State Department spokesman Nicholas Burns said
55+
Christopher would meet his Chinese counterpart, Qian Qichen, in
56+
The Hague on April 21 to discuss Taiwan and other issues.
57+
At the Pentagon, Bacon said Chinese Defense Minister Chi
58+
Haotian was scheduled to visit for talks with Perry sometime in
59+
April.
60+
U.S. officials refused to characterize the tone of the
61+
one-day arms meeting between Taiwan officials and
62+
representatives of the Defense and State Departments. And they
63+
would not say when any decision could come out of the talks.
64+
The Washington Times reported Tuesday that Taiwan wanted six
65+
German-designed diesel submarines to be built in the United
66+
States as well as new P-3 anti-submarine aircraft and
67+
air-launched Harpoon anti-ship missiles.
68+
But the senior U.S. official told Reuters that Taiwan
69+
officials did not ask for the anti-submarine planes or the
70+
missiles.

0 commit comments

Comments
 (0)