判断随机产生单词的另一种方法

简介:

    在上一篇中我介绍了判断随机产生单词的3种方法,大致都是用了外在程序spell。现在本猫又在Mac OS X系统上找到了如下文件:/usr/share/dict/words ,其中放置了N多个英语单词啊:

apple@kissAir: dict$ls -ldh words

lrwxr-xr-x  1 root  wheel     4B 10 18 14:00 words -> web2

apple@kissAir: dict$ls -lh web2

-r--r--r--  1 root  wheel   2.4M  9 10 04:47 web2

apple@kissAir: dict$wc -n words

wc: illegal option -- n

usage: wc [-clmw] [file ...]

apple@kissAir: dict$wc -l words

  235886 words

一行一个单词,即一共23万多个单词,我们可以抛掉spell程序,自己写一个is_spell?方法来判断单词是否可拼写啦,以下是增加way4后的代码,放弃了命令行参数的方式,而是用benchmark包来测试性能:

#!/usr/bin/ruby
#code by hopy 2014.12.08
#random create some words and check if a valid word!

require 'tempfile'
require 'benchmark'

words_path = "/usr/share/dict/words"
f = File.open(words_path,"r")
$lines = f.readlines
$lines.map! {|word|word.chomp!}
f.close

def rand_words(n=10000,min_len=2,max_len=12)
	chars = (("a".."z").to_a * max_len).freeze
	words = []
	srand
	n.times do |x|
		len = min_len + (rand*1000).to_i % max_len
		idxes = []
		len.times {idxes<<(rand*100)%26}
		chars.shuffle
		words << chars.values_at(*idxes).join
		idxes.clear
	end 
	words
end

#ret word that can spell or ret nil. (way1)
def spell_word(word)
	cmd = `echo #{word}|spell`.chomp
	if cmd == word
		return nil
	else
		return word
	end
end

#spell all words by tmpfile. (way2)
def spell_words(words)
	puts "using spell_words..."
	f = Tempfile.new("#{$$}_spell_blablabla")
	#f = File.open("spell_test","w+")
	#f.write Marshal.dump(words)
	f.write words.join(" ")
	f.close

	cmd = `spell #{f.path}`
	no_spell_words = cmd.split("\n")
	words - no_spell_words
end

#spell all words by tmpfile and spell ret is also use tmpfile. (way3)
def spell_words2(words)
	puts "using spell_words2..."
	f_words = Tempfile.new("#{$$}_spell_words")
	f_ret = Tempfile.new("#{$$}_spell_ret")
	f_ret.close

	f_words.write words.join(" ")
	f_words.close

	cmd = `spell #{f_words.path} > #{f_ret.path}`
	f=File.open(f_ret.path)
	no_spell_words = f.read.split("\n")
	f.close
	words - no_spell_words
end

def is_spell?(word)
	$lines.include? word
end

#利用is_spell?判断word是否可拼写的方法。(way4)
def spell_words3(words)
=begin
	words.each do |word|
		printf "#{word} " if is_spell?(word)
	end
=end
	words.select {|word|is_spell?(word)}
end

def sh_each_spell_word(spell_words)
	spell_words.each {|word|printf "#{word} "}
end

words_count = 2000
$words = nil
puts "words_count is 2000,now test..."
Benchmark.bm do |bc|
	bc.report("rand_words:\n") {$words = rand_words(words_count)};puts ""
	bc.report("way1:spell_word:\n") {$words.each {|w|printf "#{w} " if spell_word(w)}};puts ""
	bc.report("way2:spell_words:\n") {sh_each_spell_word(spell_words($words))};puts ""
	bc.report("way3:spell_words2:\n") {sh_each_spell_word(spell_words2($words))};puts ""
	bc.report("way4:spell_words3:\n") {sh_each_spell_word(spell_words3($words))};puts ""
end

不过Mac OS X自身不带spell程序,用brew不知要安装哪一个;而虚拟机中的ubuntu的spell死活无法升级。等明天用本猫的x61来测试吧!

现在已经是明天鸟!发现ubuntu中自带的words文件包含单词比Mac下的要少,只有9万多个单词啊,遂将其用Mac下的文件替换,可以看到他比spell程序实际枚举的单词要多哦:

wisy@wisy-ThinkPad-X61:~/src/ruby_src$ ./x.rb
words_count is 2000,now test...
       user     system      total        real
rand_words:
  0.050000   0.000000   0.050000 (  0.069850)

way1:spell_word:
ho of ts mu so or wag us to lo um ts pa pip mid hip vs no of oboe iv yr re so   0.330000   3.170000  13.480000 ( 29.903239)

way2:spell_words:
using spell_words...
ho of ts mu so or wag us to lo um ts pa pip mid hip vs no of oboe iv yr re so   0.000000   0.000000   0.080000 (  5.485613)

way3:spell_words2:
using spell_words2...
ho of ts mu so or wag us to lo um ts pa pip mid hip vs no of oboe iv yr re so   0.010000   0.010000   0.100000 (  4.854248)

way4:spell_words3:
ho of pob dob mu bo so sa or wag us jo aw to lo um li ca se pa ava bo sho pip mid til tue ya en hip no of di ug oboe io en yr re da eer so ym  36.580000   0.290000  36.870000 ( 37.444370)

我们写的新的方法(way4)竟然是最慢的!!!不试不知道,一试吓一跳啊!

相关文章
|
2月前
|
算法 前端开发
最大字符串配对数目
最大字符串配对数目
21 0
|
1月前
|
PHP
在数组中,找出给定数字的出现次数,比如[1,2,3,2,2]中2的出现次数是3次(任意编程语言描述)
在数组中,找出给定数字的出现次数,比如[1,2,3,2,2]中2的出现次数是3次(任意编程语言描述)
21 0
|
4月前
|
算法 测试技术 编译器
【算法 | 实验18】在字符矩阵中查找给定字符串的所有匹配项
题目描述 题目 在字符矩阵中查找给定字符串的所有匹配项 给定一个M×N字符矩阵,以及一个字符串S,找到在矩阵中所有可能的连续字符组成的S的次数。所谓的连续字符,是指一个字符可以和位于其上下左右,左上左下,右上右下8个方向的字符组成字符串。用回溯法求解。
32 1
|
10月前
​判断给定字符序列是否是回文
​判断给定字符序列是否是回文
43 0
LeetCode 2018. 判断单词是否能放入填字游戏内(模拟)
LeetCode 2018. 判断单词是否能放入填字游戏内(模拟)
121 0
LeetCode 2018. 判断单词是否能放入填字游戏内(模拟)