Python 程式設計入門 - 07¶

Pyladies Taiwan¶

Speaker : Mars¶

2018/01/07¶

Roadmap¶

前情提要：迴圈
檔案基本操作
常見的檔案類型處理
檔案處理常遇到的問題-以網路爬蟲為例
補充知識：文脈管理協定、pickle
補充知識：檔案進階操作

前情提要：迴圈 ¶

迴圈可以處理重複的事情
迴圈可以很方便處理可迭代的物件
- 串列
- 字串
- 字典
- ......

輸入三個數字，加到串列裡面¶

l = []
l.append(input())
l.append(input())
l.append(input())
print(l)

用迴圈可以改寫為

l = []
for i in range(3):
    l.append(input())
print(l)

拿取串列各元素¶

l = [1,4,7,9]
print(l[0])
print(l[1])
print(l[2])
print(l[3])

用迴圈可以改寫為

l = [1,4,7,9]
for i in range(0,3+1):
    print(l[i])

或是改寫成更為簡潔的

l = [1,4,7,9]
for i in l:
    print (i)

上面的程式碼執行起來的概念等同於

i=1  # 串列l的第1個元素
print(i)
i=4  # 串列l的第2個元素
print(i)
i=7  # 串列l的第3個元素
print(i)
i=9  # 串列l的第4個元素
print(i)

以range迭代，雖然寫起來比較複雜，但彈性比較高
(eg.只想拿取奇數位置的元素)

l = [1,4,7,9]
for i in range(0,3+1):
    print(l[i])

以串列迭代，雖然寫起來比較簡潔，但只能所有元素全部拿取
(eg.只想拿取奇數位置的元素，要另外在迴圈裡寫判斷式)

l = [1,4,7,9]
for i in l:
    print (i)

In [1]:

article = """For years, I had existed safely behind the scenes in politics 
as a fundraiser, as an organizer, but in my heart, I always wanted to run. 
The sitting congresswoman had been in my district since 1992. 
She had never lost a race, 
and no one had really even run against her in a Democratic primary."""
for line in article.split("\n"):
    print("->",line)

-> For years, I had existed safely behind the scenes in politics 
-> as a fundraiser, as an organizer, but in my heart, I always wanted to run. 
-> The sitting congresswoman had been in my district since 1992. 
-> She had never lost a race, 
-> and no one had really even run against her in a Democratic primary.

檔案基本操作¶

f = open(file,mode,encoding,...)
f.close()

file：檔案名稱與路徑，需為字串
- 基準是程式所在的位置
mode：模式，預設為文字讀取(rt)。
- t：文字(預設)
  - r：讀取模式(預設)
  - w：寫入模式，開新檔案，或者覆蓋舊檔
  - a：附加模式，開新檔案，或者附加在舊檔尾端
  - r+：更新模式，可讀可寫，檔案需存在，從檔案開頭開始讀寫(會覆蓋舊資料)
  - w+：更新模式，可讀可寫，開新檔案，或覆蓋舊檔，從檔案開頭開始讀寫
  - a+：更新模式，可讀可寫，開新檔案，或從舊檔尾端開始讀寫
- b：二進位
  - 習慣寫法為：rb、wb、ab、rb+、wb+、ab+
encoding：文字編碼系統，預設None，Windows需要寫上cp950或是utf8。
- f = open(file,encoding='cp950')

注意：¶

使用檔案有開啟就要記得關閉！
檔案也是個「可迭代物件」
建議讀寫分開。

In [2]:

""" doc.txt
For years, I had existed safely behind the scenes in politics 
as a fundraiser, as an organizer, but in my heart, I always wanted to run. 
The sitting congresswoman had been in my district since 1992. 
She had never lost a race, 
and no one had really even run against her in a Democratic primary.
"""
fi = open("doc.txt")
for line in fi:
    print("->",line,end="")
fi.close()

-> For years, I had existed safely behind the scenes in politics 
-> as a fundraiser, as an organizer, but in my heart, I always wanted to run. 
-> The sitting congresswoman had been in my district since 1992. 
-> She had never lost a race, 
-> and no one had really even run against her in a Democratic primary.

Checkpoint：¶

jupyter教學

請大家開啟jupyter
使用 New > Text File 增加一個文字檔
改名為 doc.txt
輸入些文字(也可以輸入中文)
使用上方範例程式碼開檔及印出檔案內容
- Windows：open("doc.txt",encoding='cp950')
- Windows：open("doc.txt",encoding='utf8')

In [3]:

""" doc.txt
For years, I had existed safely behind the scenes in politics 
as a fundraiser, as an organizer, but in my heart, I always wanted to run. 
The sitting congresswoman had been in my district since 1992. 
She had never lost a race, 
and no one had really even run against her in a Democratic primary.
"""
fi = open("doc.txt")
for index,line in enumerate(fi):
    print("{:<5} {}".format(index+1,line),end="")
fi.close()

1     For years, I had existed safely behind the scenes in politics 
2     as a fundraiser, as an organizer, but in my heart, I always wanted to run. 
3     The sitting congresswoman had been in my district since 1992. 
4     She had never lost a race, 
5     and no one had really even run against her in a Democratic primary.

「:<5」是用了字串format方法，代表的是行數有5個的寬度，向左對齊。

檔案的操作¶

f.close()：寫出緩衝區並關閉檔案，再次呼叫無作用
f.read(n=-1)：讀取n個位元組，n=-1呼叫 readall()
f.readall()：讀取全部
f.readline(limit=-1)：限文字模式，預設讀取一行或讀取limit個字元
- 另有f.readlines(hint=-1)：預設讀取多行或hint行(不建議使用)
f.write(data)：寫入資料(需為文字或位元組型別)
f.writelines(data)：限文字模式，寫入多行

注意：¶

雖然read(),readall()很方便，但因為一次讀取，如果遇到太大的檔案，程式可能會crash。

In [4]:

# Windows：open("doc2.txt","w",encoding='utf8')
fo = open("doc2.txt","w")
data = "something"
fo.write(data)
fo.close()

In [5]:

# Windows：open("doc2.txt",encoding='utf8')
fi = open("doc2.txt")
data = fi.read()
print(data)
fi.close()

something

Checkpoint：¶

撰寫程式將文字資料寫入到「doc2.txt」中
利用jupyter介面檢查資料是否正確寫入
撰寫程式讀取「doc2.txt」，並且印出

練習-複製檔案¶

先開啟欲複製的檔案1(doc2.txt)，使用讀取模式
再開啟要複製到的檔案2(doc3.txt)，使用寫入模式(會新增檔案)
將讀取檔案1的資料寫入到檔案2
記得關檔

More¶

可以試試看複製圖片或非文字檔案
- 記得要多加上"b" (rb、wb)
- encoding是給文字的，所以Windows不需多加

練習-多檔案資料統計¶

請下載三個檔案：log20180105、log20180106、log20180107 ，
統計每個user出現過幾次

Hint:¶

可以用串列或字典等可變資料型別來紀錄
- 如果是用字典，可以參考之前的練習題
- 想要排序可以參考之前的投影片
要把換行字元處理掉：可用字串方法replace或是strip。

常見的檔案類型處理¶

CSV (Comma-Separated Values)¶

data11,data12,data13
data21,data22,data23
data31,data32,data33

以逗號,分割是csv最常見型態，
不過有些也會以分號;或是空白分割，data也會用引號包覆。

import csv
csv.reader(開檔,delimiter)
csv.writer(開檔,delimiter)

Python內建的csv模組可以很方便的處理csv檔案

reader：將資料轉換成串列型別
writer：用writerow方法將串列型別寫入檔案
delimiter：分隔符，預設為逗號,

In [6]:

import csv 
fo = open("test.csv","w")
# 在Windows中 會發覺寫出的csv會多了空行，可改成以下開檔方法
# fo = open("test.csv","w",newline='')
cw = csv.writer(fo,delimiter=' ')
cw.writerow(["data11","data12","data13"])
cw.writerow(["data21","data22","data23"])
cw.writerow(["data31","data32","data33"])
fo.close()

In [7]:

import csv 
fi = open("test.csv")
cr = csv.reader(fi)
for row_num,row in enumerate(cr):
    print(row_num,row)
fi.close()

0 ['data11 data12 data13']
1 ['data21 data22 data23']
2 ['data31 data32 data33']

以Yahoo股市下載的資料為範例
把日期以/分隔、只取開盤價Open(小數點後兩位)、成交量Volume，改用空白分割

In [8]:

import csv
fi = open("CSV.csv")
fo = open("CSV2.csv","w")
cr = csv.reader(fi)
cw = csv.writer(fo,delimiter=' ')
for row_num,row in enumerate(cr):
    print(row_num,row)
    if row_num!=0:
        row[0]=row[0].replace("-","/")
        row[1]=round(float(row[1]),2)
    new_row = [row[0],row[1],row[6]]
    cw.writerow(new_row)
fi.close()
fo.close()

0 ['Date', 'Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume']
1 ['2017-12-06', '26.100000', '26.570000', '25.709999', '26.490000', '26.490000', '100400']
2 ['2017-12-07', '26.469999', '26.670000', '26.219999', '26.520000', '26.520000', '82900']
3 ['2017-12-08', '26.530001', '26.780001', '26.340000', '26.629999', '26.629999', '535800']
4 ['2017-12-11', '26.639999', '26.660000', '26.230000', '26.340000', '26.340000', '80200']
5 ['2017-12-12', '26.360001', '26.500000', '26.200001', '26.330000', '26.330000', '78500']
6 ['2017-12-13', '26.270000', '26.540001', '26.150000', '26.299999', '26.299999', '70700']
7 ['2017-12-14', '26.290001', '26.290001', '25.830000', '25.840000', '25.840000', '58300']
8 ['2017-12-15', '25.860001', '26.320000', '25.860001', '26.209999', '26.209999', '199800']
9 ['2017-12-18', '26.379999', '26.469999', '26.139999', '26.230000', '26.230000', '107200']
10 ['2017-12-19', '26.320000', '26.500000', '26.180000', '26.270000', '26.270000', '125400']
11 ['2017-12-20', '26.350000', '26.350000', '25.900000', '26.010000', '26.010000', '75200']
12 ['2017-12-21', '26.120001', '26.120001', '25.879999', '25.900000', '25.900000', '91300']
13 ['2017-12-22', '26.070000', '26.120001', '25.530001', '25.750000', '25.750000', '81200']
14 ['2017-12-26', '25.770000', '25.840000', '25.600000', '25.650000', '25.650000', '39900']
15 ['2017-12-27', '25.590000', '25.799999', '25.590000', '25.680000', '25.680000', '44900']
16 ['2017-12-28', '25.690001', '26.170000', '25.690001', '26.080000', '26.080000', '61300']
17 ['2017-12-29', '26.129999', '26.129999', '25.690001', '25.709999', '25.709999', '56900']
18 ['2018-01-02', '25.719999', '25.980000', '25.570000', '25.959999', '25.959999', '105300']
19 ['2018-01-03', '25.959999', '26.309999', '25.639999', '26.209999', '26.209999', '119400']
20 ['2018-01-04', '26.400000', '26.709999', '26.270000', '26.459999', '26.459999', '130600']
21 ['2018-01-05', '26.520000', '26.670000', '26.270000', '26.420000', '26.420000', '69900']

想要以欄位名稱取值¶

In [9]:

import csv
fi = open("CSV.csv")
cr = csv.reader(fi,delimiter=',')
col_dict={}
for row_num,row in enumerate(cr):
    if row_num==0:
        for index,key in enumerate(row):
            col_dict[key]=index
        print(col_dict)
    else:
        print(row[col_dict["Date"]],row[col_dict["Volume"]])
fi.close()

{'Open': 1, 'Adj Close': 5, 'Close': 4, 'Date': 0, 'Low': 3, 'Volume': 6, 'High': 2}
2017-12-06 100400
2017-12-07 82900
2017-12-08 535800
2017-12-11 80200
2017-12-12 78500
2017-12-13 70700
2017-12-14 58300
2017-12-15 199800
2017-12-18 107200
2017-12-19 125400
2017-12-20 75200
2017-12-21 91300
2017-12-22 81200
2017-12-26 39900
2017-12-27 44900
2017-12-28 61300
2017-12-29 56900
2018-01-02 105300
2018-01-03 119400
2018-01-04 130600
2018-01-05 69900

練習-資料清理¶

以證交所個股日成交資訊為資料集，
我們只想要取日期、收盤價的資訊，另存成一份檔案。

Hint：¶

這份檔案是以Windows系統存檔，
因此不管哪種系統，都需加上encoding='cp950'
可以用len()來判定是否該列的串列資料是否是我們要的

JSON (JavaScript Object Notation)¶

剛開始是基於JavaScript設計的資料交換格式，
後來廣泛運用在網站開發、非關聯式資料庫(NoSQL)。

json.loads(字串)：JSON格式字串 -> Python型別
json.dumps(Python型別)：Python型別 -> JSON格式字串

JSON	Python
object	dict
array	list,tuple
string	str
number	int,float
true	True
false	False
null	None

In [10]:

import json
data = {
    "user":"Mars",
    "work":["Tripresso","PyLadies"]
}
json_data = json.dumps(data)
print(json_data,type(json_data))

{"user": "Mars", "work": ["Tripresso", "PyLadies"]} <class 'str'>

In [11]:

import json
json_data = '{"work": ["Tripresso", "PyLadies"], "user": "Mars"}'
data = json.loads(json_data)
print(data)
print(data["work"])
print(data["work"][0])

{'user': 'Mars', 'work': ['Tripresso', 'PyLadies']}
['Tripresso', 'PyLadies']
Tripresso

好用線上解析工具¶

Json Parser Online：排版最為漂亮，
能很清楚知道是該用字典的key還是串列的index方式取值，
缺點是無法處理太大的資料
Online Json Viewer：可以處理較大的JSON檔案

以行政院環保署-空氣品質監測網的即時空污資料集

網路爬蟲技巧¶

透過Chrome的開發人員工具可以找到真正的資料集
eg. https://taqm.epa.gov.tw/taqm/aqs.ashx?lang=tw&act=aqi-epa&ts=1515231549907
可以觀察網址有哪邊特別的
- lang 是 language的縮寫，可以換成en....
- ts 是 timestamp的縮寫，可以透過一些線上服務轉換

In [12]:

import json
fi = open("aqs.json")
json_data = fi.read()
data = json.loads(json_data)
for stations in data["Data"]:
    print(stations["SiteName"],stations["AQI"])
fi.close()

富貴角 39
陽明 37
萬里 40
淡水 19
基隆 33
士林 32
林口 30
三重 37
菜寮 22
汐止 31
大同 26
中山 30
大園 26
松山 21
萬華 24
新莊 27
觀音 30
古亭 21
永和 23
板橋 26
桃園 22
土城 24
新店 26
平鎮 22
中壢 31
龍潭 23
湖口 26
新竹 22
竹東 29
頭份 25
苗栗 25
三義 24
豐原 35
沙鹿 31
西屯 45
忠明 39
線西 52
大里 46
彰化 52
埔里 41
二林 58
南投 82
竹山 101
崙背 61
麥寮 34
臺西 31
斗六 47
新港 55
朴子 31
嘉義 70
新營 60
善化 78
安南 75
臺南 74
美濃 65
橋頭 116
楠梓 112
仁武 103
左營 100
屏東 99
前金 102
鳳山 100
復興 125
前鎮 117
小港 94
大寮 94
潮州 143
林園 46
恆春 38
宜蘭 31
冬山 26
花蓮 24
關山 27
臺東 20
馬祖 43
金門 35
馬公 34
彰化(大城) 31
屏東(琉球) 86

In [13]:

import json
import requests 
resp = requests.get("https://taqm.epa.gov.tw/taqm/aqs.ashx?lang=tw&act=aqi-epa&ts=1515231549907") 
json_data = resp.text 
data = json.loads(json_data)
for stations in data["Data"]:
    print(stations["SiteName"],stations["AQI"])

富貴角 53
陽明 45
萬里 48
淡水 39
基隆 44
士林 39
林口 39
三重 34
菜寮 32
汐止 54
大同 37
中山 35
大園 44
松山 33
萬華 34
新莊 38
觀音 62
古亭 37
永和 35
板橋 40
桃園 35
土城 38
新店 38
平鎮 35
中壢 29
龍潭 39
湖口 42
新竹 41
竹東 34
頭份 41
苗栗 62
三義 51
豐原 76
沙鹿 77
西屯 76
忠明 85
線西 86
大里 89
彰化 98
埔里 87
二林 93
南投 152
竹山 163
崙背 116
麥寮 109
臺西 108
斗六 151
新港 140
朴子 132
嘉義 159
新營 162
善化 163
安南 164
臺南 162
美濃 160
橋頭 171
楠梓 178
仁武 178
左營 180
屏東 162
前金 186
鳳山 175
復興 179
前鎮 180
小港 177
大寮 164
潮州 160
林園 161
恆春 42
宜蘭 42
冬山 40
花蓮 36
關山 33
臺東 33
馬祖 87
金門 135
馬公 63
彰化(大城) 78
屏東(琉球) 176

練習-資料抓取¶

以PChome 24小時購物搜尋mac商品結果為資料集，
我們想得知各商品名稱及價格。

我們抓取的網址是：
https://ecshweb.pchome.com.tw/search/v3.3/all/results?q=mac&page=1&sort=rnk/dc

Hint:¶

可以透過 Json Parser Online 、Online Json Viewer 方便觀察要如何取值

More:¶

可以試試看替換網址的一些關鍵字，看看會發生什麼事 =v=

檔案處理常遇到的問題-以網路爬蟲為例¶

import requests
resp = requests.get(url)

rep.content ： bytes 型別(位元組)
rep.text ： unicode 型別(字串)

抓取影音¶

因為是位元組，所以要用 resp.content，以及 wb
如果是存放在資料夾，需要先建立資料夾(資料夾不存在會報錯)

In [14]:

import requests 
resp = requests.get("http://tw.pyladies.com/img/logo2.png") 
data = resp.content
fo = open("image/logo.png","wb")
fo.write(data)
fo.close()

副檔名 Filename Extension¶

是幫助電腦知道該用什麼軟體開啟他，
但不管有沒有副檔名，檔案內容的資料都不會受影響。

文字編碼系統：Utf8、Big5與Unicode¶

In [15]:

import requests 
resp = requests.get("http://blog.marsw.tw/") 
data = resp.text
fo = open("blog_mars.html","w")
fo.write(data)
fo.close()

In [16]:

import requests 
resp = requests.get("https://www.csie.ntu.edu.tw/~p92005/Joel/Unicode.html") 
data = resp.text
fo = open("joel.html","w")
fo.write(data)
fo.close()

In [17]:

import requests 
resp = requests.get("https://www.csie.ntu.edu.tw/~p92005/Joel/Unicode.html") 
data = resp.content.decode("big5")
fo = open("joel_decode.html","w")
fo.write(data)
fo.close()

resp = requests.get("http://blog.marsw.tw/") 
data = resp.text

等同於

resp = requests.get("http://blog.marsw.tw/") 
data = resp.content.decode("utf8")

因為requests模組是根據Content-Type決定編碼，在沒有指定的時候會編碼錯誤。

在Python中，如果開檔是文字模式，寫入(fo.write)的資料一定得是「字串」。
但print的結果是根據你電腦的編碼系統，
所以有可能可以成功寫檔，但印出的時候會出現「Unicode Error」之類的訊息

bytes -> unicode：bytes.decode(encoding="utf-8", errors="strict")
unicode -> bytes ：unicode.encode(encoding="utf-8", errors="strict")
- encoding 預設為 "utf8"，可輸入不同編碼系統代號
- errors 是讀取字串時遇到錯誤的處理狀況，預設為 "strict"，即遇到錯誤就會報錯，
  另還有"ignore", "replace", "xmlcharrefreplace", "backslashreplace"，可參考官方文件

Python 3 與 Python 2 str型別的差異¶

Python3 字串=> unicode 字串
- 位元組資料是獨立存在 => bytes
Python2 字串=> 同時做為8位元字串與位元組資料(bytes)
- unicode字串是獨立存在

In [18]:

my_str = "你好"
print (type(my_str))
print (my_str[0])

my_str_b = my_str.encode('utf8')
print (type(my_str_b))
print (my_str_b[0])

<class 'str'>
你
<class 'bytes'>
228

In [19]:

my_str = "範例"
print(my_str.encode("utf8"))
print(my_str.encode("big5"))
print(my_str.encode("utf8").decode("utf8"))
print(my_str.encode("big5").decode("big5"))

b'\xe7\xaf\x84\xe4\xbe\x8b'
b'\xbdd\xa8\xd2'
範例
範例

In [20]:

my_str = "範例"
print(my_str.encode("big5").decode("utf8",errors="ignore"))
print(my_str.encode("big5").decode("utf8",errors="replace"))

d
�d��

一些關於文字編碼系統的小知識：

瞭解Unicode¶：注意！此篇範例為Python2
淺談電腦編碼與 Unicode (一) 基礎概念篇
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
- 中譯版：每個軟體開發者都絕對一定要會的Unicode及字元集必備知識(沒有藉口！)

補充知識¶

文脈管理協定 with as
pickle

文脈管理協定¶

有時候開檔的時候會忘記關檔，可以利用with和as的語法

fo = open("doc2.txt","w")
data = "something"
fo.write(data)
fo.close()

可以改寫成

with open("doc2.txt","w") as fo:
    data = "something"
    fo.write(data)

如果是有多個檔案物件：

fi = open("doc2.txt")
fo = open("doc3.txt","w")
data = fi.read()
fo.write(data)
fi.close()
fo.close()

可以改寫成

with open("doc2.txt") as fi:
    with open("doc3.txt","w") as fo:
        data = fi.read()
        fo.write(data)

或是

with open("doc2.txt") as fi, open("doc3.txt","w") as fo:
    data = fi.read()
    fo.write(data)

pickle¶

有時候我們的資料結構(在Python中都是物件)不支援轉換成json型別，
可以用pickle將這些物件轉成位元組儲存起來。

pickle.dump(資料,開檔)
pickle.load(檔案)

注意：¶

開檔記得要是位元組模式。

In [21]:

import json
data = {
    "user":"Mars",
    "work":set(["Tripresso","PyLadies"])
}
json_data = json.dumps(data)
print(json_data,type(json_data))

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-21-2bf390fd7363> in <module>()
      4     "work":set(["Tripresso","PyLadies"])
      5 }
----> 6 json_data = json.dumps(data)
      7 print(json_data,type(json_data))

/usr/lib/python3.4/json/__init__.py in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
    228         cls is None and indent is None and separators is None and
    229         default is None and not sort_keys and not kw):
--> 230         return _default_encoder.encode(obj)
    231     if cls is None:
    232         cls = JSONEncoder

/usr/lib/python3.4/json/encoder.py in encode(self, o)
    190         # exceptions aren't as detailed.  The list call should be roughly
    191         # equivalent to the PySequence_Fast that ''.join() would do.
--> 192         chunks = self.iterencode(o, _one_shot=True)
    193         if not isinstance(chunks, (list, tuple)):
    194             chunks = list(chunks)

/usr/lib/python3.4/json/encoder.py in iterencode(self, o, _one_shot)
    248                 self.key_separator, self.item_separator, self.sort_keys,
    249                 self.skipkeys, _one_shot)
--> 250         return _iterencode(o, 0)
    251 
    252 def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,

/usr/lib/python3.4/json/encoder.py in default(self, o)
    171 
    172         """
--> 173         raise TypeError(repr(o) + " is not JSON serializable")
    174 
    175     def encode(self, o):

TypeError: {'PyLadies', 'Tripresso'} is not JSON serializable

In [22]:

import pickle
data = {
    "user":"Mars",
    "work":set(["Tripresso","PyLadies"])
}
with open("pickle","wb") as fo:
    pickle.dump(data,fo)
with open("pickle","rb") as fi:
    data2 = pickle.load(fi)
print(data2,type(data2))

{'user': 'Mars', 'work': {'PyLadies', 'Tripresso'}} <class 'dict'>

補充知識-檔案進階操作¶

`f.tell()`：回傳現在位置¶

如果不熟悉檔案位置的話，
會強烈建議檔案讀寫分離！！！

In [23]:

ori_data = "123\n456\n789"
with open("test.txt","w") as fo:
    fo.write(ori_data)

with open("test.txt","r+") as fo:
    now_tell = -1
    print("start",fo.tell())
    fo.write("abc")
    print("after write",fo.tell())
    for i in range(3):
        start = fo.tell()
        data = fo.readline().rstrip("\n")
        end = fo.tell()
        print("{}->{} : {}".format(start,end,data))

start 0
after write 3
3->4 : 
4->8 : 456
8->11 : 789

In [24]:

with open("test.txt") as fi:
    print(fi.read())

abc
456
789

`f.seek(offset,whence)`：指定現在位置到offset¶

whence：
- SEEK_SET：開頭
- SEEK_CUR：現在位置
- SEEK_END：尾端

In [25]:

ori_data = "123\n456\n789"
with open("test.txt","w") as fo:
    fo.write(ori_data)

with open("test.txt","r+") as fo:
    now_tell = -1
    print("start",fo.tell())
    fo.write("abc")
    print("after write",fo.tell())
    fo.seek(0) # 以檔案開頭+0的位置
    print("reset",fo.tell())
    for i in range(3):
        start = fo.tell()
        data = fo.readline().rstrip("\n")
        end = fo.tell()
        print("{}->{} : {}".format(start,end,data))

start 0
after write 3
reset 0
0->4 : abc
4->8 : 456
8->11 : 789

`f.truncate(size=None)`：截取size資料，其餘捨棄¶

In [26]:

ori_data = "123\n456\n789"
with open("test.txt","w") as fo:
    fo.write(ori_data)

with open("test.txt","r+") as fi:
    fi.truncate(2)   
    data = fi.read()
    print(data)
    
with open("test.txt") as fi:
    data = fi.read()
    print(data)

12
12

`f.flush()`：清空緩衝區¶

實際上Python在寫檔的時候，並不是馬上把所有資料寫入檔案中，
而是會先存在緩衝區，等到緩衝區滿了，或是已經呼叫了f.close()、程式中斷，
才會將緩衝區的資料寫入檔案。

以下程式碼可以試試有沒有fo.flush()的差別：

import time
with open("test.txt","w") as fo:
    for i in range(9999):
        fo.write("{}\n".format(i))
        time.sleep(0.001)
        print(i)
        fo.flush()

學習資源對應¶

Python 程式設計入門 (適用於 2.x 與 3.x 版) 葉難
- 07 檔案、文字、編碼、位元組資料
  - 7.1 初探檔案
  - 7.2 型別str、bytes、unicode
  - 7.3 文字檔案
  - 7.4 位元組(二進位)資料：pickle
  - 7.5 文字編碼系統

Python 程式設計入門 - 07¶

Pyladies Taiwan¶

Speaker : Mars¶

2018/01/07¶

Roadmap¶

前情提要：迴圈¶

輸入三個數字，加到串列裡面¶

拿取串列各元素¶

檔案基本操作¶

注意：¶

Checkpoint：¶

檔案的操作¶

注意：¶

Checkpoint：¶

練習-複製檔案¶

More¶

練習-多檔案資料統計¶

Hint:¶

常見的檔案類型處理¶

CSV (Comma-Separated Values)¶

想要以欄位名稱取值¶

練習-資料清理¶

Hint：¶

JSON (JavaScript Object Notation)¶

好用線上解析工具¶

網路爬蟲技巧¶

練習-資料抓取¶

Hint:¶

More:¶

檔案處理常遇到的問題-以網路爬蟲為例¶

抓取影音¶

副檔名 Filename Extension¶

文字編碼系統：Utf8、Big5與Unicode¶

Python 3 與 Python 2 str型別的差異¶

補充知識¶

文脈管理協定¶

pickle¶

注意：¶

補充知識-檔案進階操作¶

f.tell()：回傳現在位置¶

f.seek(offset,whence)：指定現在位置到offset¶

f.truncate(size=None)：截取size資料，其餘捨棄¶

f.flush()：清空緩衝區¶

更多檔案操作方法¶

學習資源對應¶

前情提要：迴圈 ¶

`f.tell()`：回傳現在位置¶

`f.seek(offset,whence)`：指定現在位置到offset¶

`f.truncate(size=None)`：截取size資料，其餘捨棄¶

`f.flush()`：清空緩衝區¶