D-LAB Python X 使用者體驗設計

Pyladies Taiwan

Speaker : Mars

2016/11/24

Roadmap

  • 11/17 練習題
  • 排序
  • 函式
  • 字典
  • 檔案IO

練習

  • 算出一篇文章中出現最多的單字
    • 去除所有的標點符號
      • 可以產生一個有各種標點符號的串列
    • 把大寫改成小寫
    • 需要記得現在最多是出現幾次
    • 需要記得出現最多次的單字是哪一個
    • 範例文章,出現最多次的單字是「the」共有10次。
In [ ]:
article = """
Marvel braves a stumble with each dip into its grab bag of second-tier, less-recognizable characters.
But it won't come with "Doctor Strange," an extremely entertaining and sure-footed adaptation that manages to conjure more than enough magic to easily pass its spell check.

Much of that has to do with a splendid cast, led by Benedict Cumberbatch.
Having top-flight actors -- including Chiwetel Ejiofor, Tilda Swinton and Mads Mikkelsen -- is especially helpful with this enterprise.
That's because some of the mystical-mumbo-jumbo dialogue -- as in, "Dormammu dwells in the dark dimension" -- could sound stilted or silly tripping off less talented tongues.
Largely faithful to the comics, the film tells a straightforward origin story -- in hindsight, an area where Marvel has generally excelled.

The first "Iron Man," for example, is far more satisfying than either of its sequels.
A brilliant neurosurgeon, Dr. Stephen Strange has his million-dollar hands mangled in a terrible car accident.

The closing credits might elicit the biggest laugh with a disclaimer warning about the dangers of distracted driving.
Desperate to be restored, Strange begins exploring alternative means of healing, a search that ultimately leads him to Nepal, a hidden retreat and a bald sage known as the Ancient One (Tilda Swinton, in glorious-weirdness mode).
Aided by her chief lieutenant Mordo (Ejiofor), they train him in the mystic arts.
"""
In [30]:
_symbol = [",",":",";","?","!","(",")","\"","$","[","]","{","}","\n","\t","-","."]

clean_article = article

# 去除所有的標點符號
for i in _symbol:
    clean_article = clean_article.replace(i," ")
    
# 把大寫改成小寫
for i in range(ord('A'),ord('Z')+1):
    clean_article = clean_article.replace(chr(i),chr(i+32))   # ord("A")-ord("a") = -32

# 把文章切成單字
split_article = clean_article.split(" ")

# 計算文章出現最多的單字
max_count = 0
max_word = ""
for word in split_article:
    if word and split_article.count(word)>max_count:
        max_count = split_article.count(word)
        max_word = word
print (max_word,max_count)
the 10

think

  • 如果要拿第二多的、第五多的...

排序

sorted(要排序的資料,各種參數,可省略)
  • 會產生一個新的資料,不會直接修改原有資料的排序

常用參數

  • key:使用的排序方式,預設為每個子資料的第一個值
  • reverse:是否要以降冪排序,預設為False。(True,False)
In [36]:
simple_list = [1,5,3,4,2]
print (sorted(simple_list))
print (sorted(simple_list,reverse=True))
[1, 2, 3, 4, 5]
[5, 4, 3, 2, 1]
In [47]:
nested_list_number = [[5,4],[1,6],[3,7],[4,2]]
nested_list_string = [["Python",4],["Hello",6],["World",7],["PyLadies",2]]
nested_list_ch_string = [["中文",4],["台北",6],["高雄",7],["台中",2]]
print (sorted(nested_list_number))
print (sorted(nested_list_string))
print (sorted(nested_list_ch_string))
print (ord("中"),ord("台"),ord("中"),ord("北"),ord("高"))
[[1, 6], [3, 7], [4, 2], [5, 4]]
[['Hello', 6], ['PyLadies', 2], ['Python', 4], ['World', 7]]
[['中文', 4], ['台中', 2], ['台北', 6], ['高雄', 7]]
20013 21488 20013 21271 39640

[Coding Time]

  • 把文章的單字、頻率以串列型別儲存

    [ ['marvel', 2], ['braves', 1], ...... ]
  • 不要存到空白字

  • 不要存到重複的單字!
    • 可以紀錄已經出現哪些單字
  • 可以試著使用sorted()排序看看
In [4]:
_symbol = [",",":",";","?","!","(",")","\"","$","[","]","{","}","\n","\t","-","."]

clean_article = article

for i in _symbol:
    clean_article = clean_article.replace(i," ")

for i in range(ord('A'),ord('Z')+1):
    clean_article = clean_article.replace(chr(i),chr(i+32))

split_article = clean_article.split(" ")

# [Coding Time] 
my_list = []
word_list = []
for word in split_article:
    if (word) and (word not in word_list):
        word_list.append(word)
        count = split_article.count(word)
        my_list.append([word,count])
print (my_list)
[['marvel', 2], ['braves', 1], ['a', 9], ['stumble', 1], ['with', 5], ['each', 1], ['dip', 1], ['into', 1], ['its', 3], ['grab', 1], ['bag', 1], ['of', 6], ['second', 1], ['tier', 1], ['less', 2], ['recognizable', 1], ['characters', 1], ['but', 1], ['it', 1], ["won't", 1], ['come', 1], ['doctor', 1], ['strange', 3], ['an', 2], ['extremely', 1], ['entertaining', 1], ['and', 3], ['sure', 1], ['footed', 1], ['adaptation', 1], ['that', 3], ['manages', 1], ['to', 6], ['conjure', 1], ['more', 2], ['than', 2], ['enough', 1], ['magic', 1], ['easily', 1], ['pass', 1], ['spell', 1], ['check', 1], ['much', 1], ['has', 3], ['do', 1], ['splendid', 1], ['cast', 1], ['led', 1], ['by', 2], ['benedict', 1], ['cumberbatch', 1], ['having', 1], ['top', 1], ['flight', 1], ['actors', 1], ['including', 1], ['chiwetel', 1], ['ejiofor', 2], ['tilda', 2], ['swinton', 2], ['mads', 1], ['mikkelsen', 1], ['is', 2], ['especially', 1], ['helpful', 1], ['this', 1], ['enterprise', 1], ["that's", 1], ['because', 1], ['some', 1], ['the', 10], ['mystical', 1], ['mumbo', 1], ['jumbo', 1], ['dialogue', 1], ['as', 2], ['in', 6], ['dormammu', 1], ['dwells', 1], ['dark', 1], ['dimension', 1], ['could', 1], ['sound', 1], ['stilted', 1], ['or', 1], ['silly', 1], ['tripping', 1], ['off', 1], ['talented', 1], ['tongues', 1], ['largely', 1], ['faithful', 1], ['comics', 1], ['film', 1], ['tells', 1], ['straightforward', 1], ['origin', 1], ['story', 1], ['hindsight', 1], ['area', 1], ['where', 1], ['generally', 1], ['excelled', 1], ['first', 1], ['iron', 1], ['man', 1], ['for', 1], ['example', 1], ['far', 1], ['satisfying', 1], ['either', 1], ['sequels', 1], ['brilliant', 1], ['neurosurgeon', 1], ['dr', 1], ['stephen', 1], ['his', 1], ['million', 1], ['dollar', 1], ['hands', 1], ['mangled', 1], ['terrible', 1], ['car', 1], ['accident', 1], ['closing', 1], ['credits', 1], ['might', 1], ['elicit', 1], ['biggest', 1], ['laugh', 1], ['disclaimer', 1], ['warning', 1], ['about', 1], ['dangers', 1], ['distracted', 1], ['driving', 1], ['desperate', 1], ['be', 1], ['restored', 1], ['begins', 1], ['exploring', 1], ['alternative', 1], ['means', 1], ['healing', 1], ['search', 1], ['ultimately', 1], ['leads', 1], ['him', 2], ['nepal', 1], ['hidden', 1], ['retreat', 1], ['bald', 1], ['sage', 1], ['known', 1], ['ancient', 1], ['one', 1], ['glorious', 1], ['weirdness', 1], ['mode', 1], ['aided', 1], ['her', 1], ['chief', 1], ['lieutenant', 1], ['mordo', 1], ['they', 1], ['train', 1], ['mystic', 1], ['arts', 1]]
In [5]:
print (sorted(my_list))
[['a', 9], ['about', 1], ['accident', 1], ['actors', 1], ['adaptation', 1], ['aided', 1], ['alternative', 1], ['an', 2], ['ancient', 1], ['and', 3], ['area', 1], ['arts', 1], ['as', 2], ['bag', 1], ['bald', 1], ['be', 1], ['because', 1], ['begins', 1], ['benedict', 1], ['biggest', 1], ['braves', 1], ['brilliant', 1], ['but', 1], ['by', 2], ['car', 1], ['cast', 1], ['characters', 1], ['check', 1], ['chief', 1], ['chiwetel', 1], ['closing', 1], ['come', 1], ['comics', 1], ['conjure', 1], ['could', 1], ['credits', 1], ['cumberbatch', 1], ['dangers', 1], ['dark', 1], ['desperate', 1], ['dialogue', 1], ['dimension', 1], ['dip', 1], ['disclaimer', 1], ['distracted', 1], ['do', 1], ['doctor', 1], ['dollar', 1], ['dormammu', 1], ['dr', 1], ['driving', 1], ['dwells', 1], ['each', 1], ['easily', 1], ['either', 1], ['ejiofor', 2], ['elicit', 1], ['enough', 1], ['enterprise', 1], ['entertaining', 1], ['especially', 1], ['example', 1], ['excelled', 1], ['exploring', 1], ['extremely', 1], ['faithful', 1], ['far', 1], ['film', 1], ['first', 1], ['flight', 1], ['footed', 1], ['for', 1], ['generally', 1], ['glorious', 1], ['grab', 1], ['hands', 1], ['has', 3], ['having', 1], ['healing', 1], ['helpful', 1], ['her', 1], ['hidden', 1], ['him', 2], ['hindsight', 1], ['his', 1], ['in', 6], ['including', 1], ['into', 1], ['iron', 1], ['is', 2], ['it', 1], ['its', 3], ['jumbo', 1], ['known', 1], ['largely', 1], ['laugh', 1], ['leads', 1], ['led', 1], ['less', 2], ['lieutenant', 1], ['mads', 1], ['magic', 1], ['man', 1], ['manages', 1], ['mangled', 1], ['marvel', 2], ['means', 1], ['might', 1], ['mikkelsen', 1], ['million', 1], ['mode', 1], ['mordo', 1], ['more', 2], ['much', 1], ['mumbo', 1], ['mystic', 1], ['mystical', 1], ['nepal', 1], ['neurosurgeon', 1], ['of', 6], ['off', 1], ['one', 1], ['or', 1], ['origin', 1], ['pass', 1], ['recognizable', 1], ['restored', 1], ['retreat', 1], ['sage', 1], ['satisfying', 1], ['search', 1], ['second', 1], ['sequels', 1], ['silly', 1], ['some', 1], ['sound', 1], ['spell', 1], ['splendid', 1], ['stephen', 1], ['stilted', 1], ['story', 1], ['straightforward', 1], ['strange', 3], ['stumble', 1], ['sure', 1], ['swinton', 2], ['talented', 1], ['tells', 1], ['terrible', 1], ['than', 2], ['that', 3], ["that's", 1], ['the', 10], ['they', 1], ['this', 1], ['tier', 1], ['tilda', 2], ['to', 6], ['tongues', 1], ['top', 1], ['train', 1], ['tripping', 1], ['ultimately', 1], ['warning', 1], ['weirdness', 1], ['where', 1], ['with', 5], ["won't", 1]]

函式 Function

  • 自己打造工具
  • 程式碼更為簡潔
  • 方便重複利用
def 函式名稱(傳遞的參數,可為空):
    函式中要做的事
    函式中要做的事
    函式中要做的事

!注意

  • 常會配合return傳回資料。

去咖啡機弄一杯咖啡

  • 打開開關
  • 放下杯子
  • 選擇「咖啡類別」,按下製作
  • 咖啡機「製作咖啡」
  • 得到一杯咖啡
def 製作咖啡(咖啡類別):
    加入濃縮咖啡
    if 咖啡類別=="美式咖啡":
        加入很多的水
    elif 咖啡類別=="拿鐵" or 咖啡類別=="Latte":
        加入很多的牛奶,還有一些奶泡
    elif 咖啡類別=="卡布奇諾":
        加入一些牛奶,還有一些奶泡
    .
    .
    .
    return 咖啡
In [58]:
# 以第二個值排序
def my_value_sort(item):
    return item[1]

nested_list_number = [[5,4],[1,6],[3,7],[4,2]]
print (sorted(nested_list_number,key=my_value_sort))
[[4, 2], [5, 4], [1, 6], [3, 7]]
In [61]:
# 不分大小寫排序
def my_string_sort_old(item):
    for i in range(ord('A'),ord('Z')+1):
        item = item.replace(chr(i),chr(i+32))
    return item
    
def my_string_sort(item):
    return item.lower()

list_string = ["hi","Hello","HAPPY"]
print (sorted(list_string,key=my_string_sort_old))
print (sorted(list_string,key=my_string_sort))
['HAPPY', 'Hello', 'hi']
['HAPPY', 'Hello', 'hi']

[Coding Time]

  • 把文章出現的單字、頻率,以頻率降冪排序
the 10
a 9
of 6
to 6
in 6
with 5
In [65]:
def my_sort(item):
    return item[1]

_symbol = [",",":",";","?","!","(",")","\"","$","[","]","{","}","\n","\t","-","."]

clean_article = article

for i in _symbol:
    clean_article = clean_article.replace(i," ")

clean_article = clean_article.lower()
split_article = clean_article.split(" ")

my_list = []
word_list = []
for word in split_article:
    if (word) and (word not in word_list):
        word_list.append(word)
        count = split_article.count(word)
        my_list.append([word,count])

# [Coding Time] 
for i in sorted(my_list,key=my_sort,reverse=True):
    print (i[0],i[1])
the 10
a 9
of 6
to 6
in 6
with 5
its 3
strange 3
and 3
that 3
has 3
marvel 2
less 2
an 2
more 2
than 2
by 2
ejiofor 2
tilda 2
swinton 2
is 2
as 2
him 2
braves 1
stumble 1
each 1
dip 1
into 1
grab 1
bag 1
second 1
tier 1
recognizable 1
characters 1
but 1
it 1
won't 1
come 1
doctor 1
extremely 1
entertaining 1
sure 1
footed 1
adaptation 1
manages 1
conjure 1
enough 1
magic 1
easily 1
pass 1
spell 1
check 1
much 1
do 1
splendid 1
cast 1
led 1
benedict 1
cumberbatch 1
having 1
top 1
flight 1
actors 1
including 1
chiwetel 1
mads 1
mikkelsen 1
especially 1
helpful 1
this 1
enterprise 1
that's 1
because 1
some 1
mystical 1
mumbo 1
jumbo 1
dialogue 1
dormammu 1
dwells 1
dark 1
dimension 1
could 1
sound 1
stilted 1
or 1
silly 1
tripping 1
off 1
talented 1
tongues 1
largely 1
faithful 1
comics 1
film 1
tells 1
straightforward 1
origin 1
story 1
hindsight 1
area 1
where 1
generally 1
excelled 1
first 1
iron 1
man 1
for 1
example 1
far 1
satisfying 1
either 1
sequels 1
brilliant 1
neurosurgeon 1
dr 1
stephen 1
his 1
million 1
dollar 1
hands 1
mangled 1
terrible 1
car 1
accident 1
closing 1
credits 1
might 1
elicit 1
biggest 1
laugh 1
disclaimer 1
warning 1
about 1
dangers 1
distracted 1
driving 1
desperate 1
be 1
restored 1
begins 1
exploring 1
alternative 1
means 1
healing 1
search 1
ultimately 1
leads 1
nepal 1
hidden 1
retreat 1
bald 1
sage 1
known 1
ancient 1
one 1
glorious 1
weirdness 1
mode 1
aided 1
her 1
chief 1
lieutenant 1
mordo 1
they 1
train 1
mystic 1
arts 1

如果頻率相同,以字母排序

In [66]:
def my_sort(item):
    return item[1],item[0]
# 中略
# 中略
for i in sorted(my_list,key=my_sort,reverse=True):
    print (i[0],i[1])
the 10
a 9
to 6
of 6
in 6
with 5
that 3
strange 3
its 3
has 3
and 3
tilda 2
than 2
swinton 2
more 2
marvel 2
less 2
is 2
him 2
ejiofor 2
by 2
as 2
an 2
won't 1
where 1
weirdness 1
warning 1
ultimately 1
tripping 1
train 1
top 1
tongues 1
tier 1
this 1
they 1
that's 1
terrible 1
tells 1
talented 1
sure 1
stumble 1
straightforward 1
story 1
stilted 1
stephen 1
splendid 1
spell 1
sound 1
some 1
silly 1
sequels 1
second 1
search 1
satisfying 1
sage 1
retreat 1
restored 1
recognizable 1
pass 1
origin 1
or 1
one 1
off 1
neurosurgeon 1
nepal 1
mystical 1
mystic 1
mumbo 1
much 1
mordo 1
mode 1
million 1
mikkelsen 1
might 1
means 1
mangled 1
manages 1
man 1
magic 1
mads 1
lieutenant 1
led 1
leads 1
laugh 1
largely 1
known 1
jumbo 1
it 1
iron 1
into 1
including 1
his 1
hindsight 1
hidden 1
her 1
helpful 1
healing 1
having 1
hands 1
grab 1
glorious 1
generally 1
for 1
footed 1
flight 1
first 1
film 1
far 1
faithful 1
extremely 1
exploring 1
excelled 1
example 1
especially 1
entertaining 1
enterprise 1
enough 1
elicit 1
either 1
easily 1
each 1
dwells 1
driving 1
dr 1
dormammu 1
dollar 1
doctor 1
do 1
distracted 1
disclaimer 1
dip 1
dimension 1
dialogue 1
desperate 1
dark 1
dangers 1
cumberbatch 1
credits 1
could 1
conjure 1
comics 1
come 1
closing 1
chiwetel 1
chief 1
check 1
characters 1
cast 1
car 1
but 1
brilliant 1
braves 1
biggest 1
benedict 1
begins 1
because 1
be 1
bald 1
bag 1
arts 1
area 1
ancient 1
alternative 1
aided 1
adaptation 1
actors 1
accident 1
about 1

出現頻率多在前,同樣頻率按字母順序排序

  • reverse 是直接把排序好的結果變成降冪
In [67]:
def my_sort(item):
    return -item[1],item[0]
# 中略
# 中略
for i in sorted(my_list,key=my_sort):
    print (i[0],i[1])
the 10
a 9
in 6
of 6
to 6
with 5
and 3
has 3
its 3
strange 3
that 3
an 2
as 2
by 2
ejiofor 2
him 2
is 2
less 2
marvel 2
more 2
swinton 2
than 2
tilda 2
about 1
accident 1
actors 1
adaptation 1
aided 1
alternative 1
ancient 1
area 1
arts 1
bag 1
bald 1
be 1
because 1
begins 1
benedict 1
biggest 1
braves 1
brilliant 1
but 1
car 1
cast 1
characters 1
check 1
chief 1
chiwetel 1
closing 1
come 1
comics 1
conjure 1
could 1
credits 1
cumberbatch 1
dangers 1
dark 1
desperate 1
dialogue 1
dimension 1
dip 1
disclaimer 1
distracted 1
do 1
doctor 1
dollar 1
dormammu 1
dr 1
driving 1
dwells 1
each 1
easily 1
either 1
elicit 1
enough 1
enterprise 1
entertaining 1
especially 1
example 1
excelled 1
exploring 1
extremely 1
faithful 1
far 1
film 1
first 1
flight 1
footed 1
for 1
generally 1
glorious 1
grab 1
hands 1
having 1
healing 1
helpful 1
her 1
hidden 1
hindsight 1
his 1
including 1
into 1
iron 1
it 1
jumbo 1
known 1
largely 1
laugh 1
leads 1
led 1
lieutenant 1
mads 1
magic 1
man 1
manages 1
mangled 1
means 1
might 1
mikkelsen 1
million 1
mode 1
mordo 1
much 1
mumbo 1
mystic 1
mystical 1
nepal 1
neurosurgeon 1
off 1
one 1
or 1
origin 1
pass 1
recognizable 1
restored 1
retreat 1
sage 1
satisfying 1
search 1
second 1
sequels 1
silly 1
some 1
sound 1
spell 1
splendid 1
stephen 1
stilted 1
story 1
straightforward 1
stumble 1
sure 1
talented 1
tells 1
terrible 1
that's 1
they 1
this 1
tier 1
tongues 1
top 1
train 1
tripping 1
ultimately 1
warning 1
weirdness 1
where 1
won't 1

think

  • 如果我要從串列中找「strange」這個單字出現幾次
In [69]:
for i in my_list:
    if i[0]=="strange":
        print (i[0],i[1])
strange 3

字典 dict

  • 鍵值key:很像是串列的索引,
  • 設定值:字典名稱[key]=value
  • 拿取值:字典名稱[key]
  • key:value
    {
      "the":10 ,
      "a"  :9  ,
      "of" :6  ,
      "in" :6
    }

!注意

  • 字典是「非有序」的資料型別
  • 字典是存key,然後以key去找資料value
In [81]:
my_dict = {}
my_dict["the"] = 10
my_dict["a"] = 9
my_dict["of"] = 6
my_dict["in"] = 6
my_dict2 = {"the":10,"a":9,"of":6,"in":6}
print (my_dict)
print (my_dict2)
print (my_dict["the"])
{'in': 6, 'a': 9, 'of': 6, 'the': 10}
{'in': 6, 'a': 9, 'of': 6, 'the': 10}
10

!注意

  • 直接使用字典不存在的key去找資料會有錯誤!
In [86]:
my_dict = {"the":10,"a":9,"of":6,"in":6}
print ("with" in my_dict)
print (my_dict["with"])
False
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-86-bfa4e840c783> in <module>()
      1 my_dict = {"the":10,"a":9,"of":6,"in":6}
      2 print ("with" in my_dict)
----> 3 print (my_dict["with"])

KeyError: 'with'

[Coding Time]

  • 把文章的單字、頻率以字典型別儲存
In [3]:
_symbol = [",",":",";","?","!","(",")","\"","$","[","]","{","}","\n","\t","-","."]

clean_article = article.lower()
for i in _symbol:
    clean_article = clean_article.replace(i," ")

word_dict = {}
split_article = clean_article.split(" ")
for word in split_article:
    if (word) and (word not in word_dict):
        word_dict[word] = split_article.count(word)
print (word_dict)
{'chief': 1, 'mikkelsen': 1, 'much': 1, 'a': 9, 'film': 1, 'into': 1, 'come': 1, 'arts': 1, 'more': 2, 'helpful': 1, 'iron': 1, 'entertaining': 1, 'desperate': 1, 'and': 3, 'less': 2, 'where': 1, 'especially': 1, 'his': 1, 'search': 1, 'enough': 1, 'recognizable': 1, 'sage': 1, 'credits': 1, 'of': 6, 'him': 2, 'means': 1, 'actors': 1, 'tilda': 2, 'glorious': 1, 'excelled': 1, 'warning': 1, 'comics': 1, 'about': 1, 'dr': 1, 'bald': 1, 'known': 1, 'or': 1, 'sequels': 1, 'strange': 3, 'jumbo': 1, 'hidden': 1, 'be': 1, 'one': 1, 'adaptation': 1, 'million': 1, 'ultimately': 1, 'splendid': 1, 'they': 1, "won't": 1, 'by': 2, 'first': 1, 'mads': 1, 'mystical': 1, 'mangled': 1, 'has': 3, 'but': 1, 'that': 3, 'conjure': 1, 'could': 1, 'driving': 1, 'its': 3, 'neurosurgeon': 1, 'brilliant': 1, 'extremely': 1, 'faithful': 1, 'led': 1, 'man': 1, 'this': 1, 'satisfying': 1, 'it': 1, 'sure': 1, 'elicit': 1, 'area': 1, 'largely': 1, 'having': 1, 'sound': 1, 'healing': 1, 'ancient': 1, 'dormammu': 1, 'top': 1, 'because': 1, "that's": 1, 'manages': 1, 'distracted': 1, 'generally': 1, 'hindsight': 1, 'bag': 1, 'origin': 1, 'footed': 1, 'in': 6, 'straightforward': 1, 'tongues': 1, 'retreat': 1, 'either': 1, 'tells': 1, 'disclaimer': 1, 'stilted': 1, 'flight': 1, 'cumberbatch': 1, 'an': 2, 'leads': 1, 'magic': 1, 'marvel': 2, 'for': 1, 'restored': 1, 'than': 2, 'hands': 1, 'check': 1, 'dialogue': 1, 'to': 6, 'alternative': 1, 'the': 10, 'her': 1, 'with': 5, 'dark': 1, 'accident': 1, 'biggest': 1, 'dip': 1, 'chiwetel': 1, 'some': 1, 'ejiofor': 2, 'is': 2, 'nepal': 1, 'story': 1, 'dimension': 1, 'braves': 1, 'mode': 1, 'mumbo': 1, 'stumble': 1, 'example': 1, 'swinton': 2, 'mordo': 1, 'lieutenant': 1, 'pass': 1, 'terrible': 1, 'doctor': 1, 'begins': 1, 'laugh': 1, 'tripping': 1, 'closing': 1, 'dangers': 1, 'silly': 1, 'grab': 1, 'each': 1, 'aided': 1, 'second': 1, 'stephen': 1, 'off': 1, 'easily': 1, 'weirdness': 1, 'spell': 1, 'including': 1, 'as': 2, 'far': 1, 'tier': 1, 'dwells': 1, 'train': 1, 'exploring': 1, 'mystic': 1, 'dollar': 1, 'might': 1, 'do': 1, 'characters': 1, 'benedict': 1, 'talented': 1, 'enterprise': 1, 'cast': 1, 'car': 1}

字典的排序

In [84]:
my_dict = {"the":10,"a":9,"of":6,"in":6}
print (my_dict.items())
for i in my_dict.items():
    print (i)
for i in my_dict.items():
    print (i[0],i[1])
dict_items([('in', 6), ('a', 9), ('of', 6), ('the', 10)])
('in', 6)
('a', 9)
('of', 6)
('the', 10)
in 6
a 9
of 6
the 10
In [90]:
def my_sort(item):
    return -item[1],item[0]

my_dict = {"the":10,"a":9,"of":6,"in":6}
for i in sorted(my_dict.items(),key=my_sort):
    print (i[0],i[1])
the 10
a 9
in 6
of 6
In [91]:
def my_sort(item):
    return -item[1],item[0]

my_dict2 = {
    "the":[10,1],
    "a": {"b":1},
    "of":6,
    "in":6
}
for i in sorted(my_dict2.items(),key=my_sort):
    print (i[0],i[1])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-91-e1440b69fb90> in <module>()
      2     return -item[1],item[0]
      3 my_dict2 = {"the":[10,1],"a":{"b":1},"of":6,"in":6}
----> 4 for i in sorted(my_dict2.items(),key=my_sort):
      5     print (i[0],i[1])

<ipython-input-91-e1440b69fb90> in my_sort(item)
      1 def my_sort(item):
----> 2     return -item[1],item[0]
      3 my_dict2 = {"the":[10,1],"a":{"b":1},"of":6,"in":6}
      4 for i in sorted(my_dict2.items(),key=my_sort):
      5     print (i[0],i[1])

TypeError: bad operand type for unary -: 'dict'
In [101]:
def my_sort(item):
    if type(item[1])==type({}) or type(item[1])==type([]) :
        return 0,item[0]
    else:
        return -item[1],item[0]

my_dict2 = {
    "the":[10,1],
    "a": {"b":1},
    "of":6,
    "in":6
}
for i in sorted(my_dict2.items(),key=my_sort):
    print (i[0],i[1])
in 6
of 6
a {'b': 1}
the [10, 1]

字典與串列的比較

  • 串列:索引index => 值value
  • 字典:鍵值key => 值value
In [109]:
my_dict = {"the":10,"a":9,"of":6,"in":6}
print (list(my_dict.items()))
print (type(my_dict.items()))

my_list = [10,9,6,6]
print (list(enumerate(my_list)))
print (type(enumerate(my_list)))
[('in', 6), ('a', 9), ('of', 6), ('the', 10)]
<class 'dict_items'>
[(0, 10), (1, 9), (2, 6), (3, 6)]
<class 'enumerate'>

檔案輸入Input

  • open(檔案位置):以「讀」檔的方式開啟檔案串流
    • open(檔案位置,'r'):預設是讀檔,所以也可以不用寫'r'
  • 檔案串流.read():將檔案串流的資料全部讀入
  • 配合檔案輸出,可以先行抓取大量網頁,再進行解析

檔案輸出Output

  • open(檔案位置,'w'):以「寫」檔的方式開啟檔案串流
  • 檔案串流.write(資料):將資料寫入檔案串流

!注意

  • 不管是寫檔或是讀檔,記得要有關閉檔案串流的好習慣:檔案串流.close()
In [111]:
fileout = open("test.txt",'w')
fileout.write("Hello"+"\n")
fileout.write("Python")
fileout.close()

filein = open("test.txt")
content = filein.read()
filein.close()
print (content)
Hello
Python

分析電影

IMDB電影資料集(需要註冊才能下載)

  • 建立一個資料集:以不同的資料類別當成鍵值/索引
    • 電影名稱(movie_title)、票房(gross)、類別(genres)、成本(budget)、IMDB評分(imdb_score)
    • 演員名稱(actor_x_name)、FB按讚數(actor_x_facebook_likes)
  • 試著以票房排序看看
    • 看看熱賣的電影
      • 都屬於哪些類別(ex.動作片)
      • 是不是有明星加持:明星可能是以按讚數、或是演的電影總數
      • 評分就會高嗎?
      • 是否賺錢(票房-成本)
    • 演員演的電影,會依電影類別有票房差嗎?跟熱賣電影的類別是一樣的嗎?

!注意

  • csv是以,分開每一欄的資料,所以可以利用split(",")去分割,並用串列的索引值去拿取想要的欄位
In [6]:
my_data = """
movie1,2
movie2,5
movie1,1
movie3,7
"""
my_dict = {}
for i in my_data.split("\n"):
    print (i)
    if i:
        temp = i.split(",")
        movie = temp[0]
        count = int(temp[1])
        if movie not in my_dict:
            my_dict[movie]=count
        else:
            my_dict[movie]+=count

print (my_dict)
movie1,2
movie2,5
movie1,1
movie3,7

{'movie1': 3, 'movie3': 7, 'movie2': 5}