推广 热搜： 行业机械设备杯经纪教师系统参数金蒸汽

操作系统自用4 从布朗语料库提取词汇创建字典进程监视和管理

日期：2024-11-11 移动：http://fhzcwj.xhstdz.com/mobile/quote/76954.html

OS Programme Lecture #4

1. BASH Programming（用unix系统） Read one-million words from text files:

一个更复杂的脚本程序

从布朗语料库(第一个机读语料库)，Brown Corpus，提取词汇和词汇使用频率

该脚本自动遍历brown文件夹里的每一个文件，提取词库中的词语和他们的使用频率

程序可以移除一些符号例如',[,],$，创建字数统计在hashmap数据结构中（也被称作“字典”）

一旦所有数据文件都被读取，这个脚本在最后一个for循环中打印词汇使用频率

可使用man sed查询sed关键词含义并尝试理解

创建WordFrequencies.sh，写入以下代码：

declare -A hashmap

for file in brown/*[0-9]; do

echo "Reading $file"

sed 's_([^ ]*)/[^ ]*_1_g' $file > t1.txt

sed "s/'//g" t1.txt > t2.txt

sed "s/`//g" t2.txt > t3.txt

sed "s/[//g" t3.txt > t4.txt

sed "s/]//g" t4.txt > t5.txt

sed "s/\$//g" t5.txt > t6.txt

while read -r line; do

line="$line"

if [ ${#line} -gt 0 ]; then

#echo $line

for word in $line; do

if [ ${#word} -gt 0 ]; then

#echo ${word}

if [ ${hashmap[${word}]+_} ]; then

let hashmap[$word]=$((hashmap[${word}]+1))

else

let hashmap[$word]=1

fi

done

fi

done < "t6.txt"

done

for i in "${!hashmap[@]}"; do

echo $i ${hashmap[$i]}

done

运行！（要有耐心，脚本会运行较长时间！）

注释以上代码中某些行，再看一下程序的输出变化，以加深理解

再尝试以下代码，完成作业中的问题，参考代码：

declare -A hashmap

for file in brown/*[0-9]; do

echo "Reading $file"

sed 's_([^ ]*)/[^ ]*_1_g' $file > t1.txt

sed "s/'//g" t1.txt > t2.txt

sed "s/`//g" t2.txt > t3.txt

sed "s/[//g" t3.txt > t4.txt

sed "s/]//g" t4.txt > t5.txt

sed "s/\$//g" t5.txt > t6.txt

while read -r line; do

line="$line"

if [ ${#line} -gt 0 ]; then

#echo $line

for word in $line; do

if [ ${#word} -gt 0 ]; then

#echo ${word}

if [ ${hashmap[${word}]+_} ]; then

let hashmap[$word]=$((hashmap[${word}]+1))

else

let hashmap[$word]=1

fi

done

fi

done < "t6.txt"

#break

done

numWords=0

topWord=""

topFreq=0

sumFreq=0

for i in "${!hashmap[@]}"; do

echo $i ${hashmap[$i]}

let numWords=$numWords+1

if [ $topFreq -lt ${hashmap[$i]} ]; then

topWord=$i

topFreq=${hashmap[$i]}

fi

let sumFreq+=${hashmap[$i]}

done

avgFreq=`echo $sumFreq/$numWords | bc -l`

echo "What is the total number of words? Answer="$numWords

echo "What is the most frequent word? Answer="$topWord

echo "What is the number of hits of the most frequent word? Answer="$topFreq

echo "Average word frequency="$avgFreq

echo "Does the memory used grow as your script reads more data, and why? Answer=Yes, because the variable 'hashmap' grows with more data."

2. Process Management:

(1) BASH - Process execution

首先，我们写一个无限循环的脚本loop.sh，参考代码如下：

#!/bin/bash

let num=1

while true; do

let square=$num*$num

echo $num $square

let num=$num+1

done

echo "Program terminated ..."

Ctrl+C可以终止脚本运行

在第一个控制台中运行ps aux

打开一个新的terminal，运行ps aux | grep bash

回到第一个控制台，运行loop.sh

切换到新的控制台，运行ps aux | grep bash，运行ps aux | awk '$8 == "R+"'，比较结果

回到第一个控制台，杀死进程loop.sh

切换到新的控制台，运行ps aux | grep bash，运行ps aux | awk '$8 == "R+"'

比较结果！

The ps aux command is a tool to monitor processes running on your Linux system.

A process is associated with any program running on your system, and is used to manage and monitor a program’s memory usage, processor time, and I/O resources.

(2) BASH - process termination with the kill command

用kill终止一个进程运行：

在第一个控制台中运行loop.sh无限循环脚本

切换到第二个控制台，查找运行loop.sh脚本的进程，记录此进程PID

我们现在用kill来终止此进程，在第二个控制台中，尝试运行 kill PID

回到第一个控制台，看一下脚本有没有被终止？

3. 作业：

(1) 熟悉操作WordFrequencies.sh脚本，根据以上参考代码的执行，尝试回答以下几个问题：

- 使用的内存在你的脚本阅读更多数据时，会不会增加？（打开系统监控，观察内存使用情况）

- 词汇总量是多少？

- 使用最频繁的词是什么？

- 最常见单词的命中次数是多少？

- 平均的字数是多少？

(2) 运行(1)WordFrequencies.sh脚本，在过程中打开另一个terminal进行进程监视，随后回到运行该进程的控制台kill该进程，把过程和ps aux | grep bash结果贴图写入实验记录

本文地址：http://fhzcwj.xhstdz.com/quote/76954.html 物流园资讯网 http://fhzcwj.xhstdz.com/ , 查看更多

特别提示：本信息由相关用户自行提供，真实性未证实，仅供参考。请谨慎采用，风险自负。

0 条相关评论

相关最新动态

推荐最新动态

点击排行

操作系统自用4 从布朗语料库提取词汇创建字典 进程监视和管理

操作系统自用4 从布朗语料库提取词汇创建字典进程监视和管理