iInfo 資訊交流: Excel VBA 與 Python 爬「公開資訊觀測站的重大訊息公告」

最新消息

[公告2014/05/30] 如有需要將部落格中,任何一篇文章的程式碼使用在商業用途,請與我聯繫。

[公告2015/04/26] Line版的 iInfo程式與投資應用 群組已上線想加入的朋友們,請先查看 "入群須知" 再與我聯繫 Line : aminwhite5168,加入請告知身分與回答 "入群須知" 的問題。

[公告2017/02/20] 近來有網友詢問 MultiCharts 與 Excel 畫冰火能量圖 (8)MultiCharts 與Excel 畫冰火能量圖(11) --- 更名「台股儀表板」 文章中教學檔案取得方式,有興趣的朋友可透過 Line了解詳情 (請勿以為是免費分享),Line : aminwhite5168。

[公告2018/04/22] 台北 Python + Excel VBA 金融資訊爬蟲課程,於 7/21、7/22 兩天開課,課程如網頁內容 金融資訊爬蟲班:台北班 Python 金融資訊爬蟲、EXCEL VBA 金融資訊爬蟲,5/30 前早鳥優惠票,請盡快把握機會,歡迎券商、大專院校邀約講座。

[公告2018/06/01] 台指能量儀表板教學課程 7/14 (六) 下午 1:30 ~ 4:30,15人開班,課程如網頁內容 台北班:台指能量儀表板

2017年8月27日 星期日

Excel VBA 與 Python 爬「公開資訊觀測站的重大訊息公告」

兩周8小時上完Excel VBA爬蟲入門班課程,在課程最後同學問到如何爬公開資訊觀測站的重大訊息公告,當時誤判用錯 Internet Explorer 方法,造成抓取中文變成亂碼,經過轉碼依舊無法有效(或許還有別的方法,後續再研究),後面改以XMLHttp處理即可順利完成。
Excel VBA雖然可以用不同方法去取得網頁資訊,不過在特定網頁上不是每種方法都可行,需要花時間去試驗,這點與Python只要寫幾行程式碼就能達成相同功能是不同,驗證一句常聽過的話,方法多不見得有用,有用的方法一種就好,至於那一種方法好用,在 Excel 中筆者無法給一個標準答案

以下就來看看相關的Excel VBA與Python程式碼。
公開資訊觀測站的重大訊息公告

Postman畫面。

Excel VBA 抓取網頁資料的程式碼。
Option Explicit

Sub 公開資訊觀測站公告()
    Dim tmp As String
    Dim html As Object, html2 As Object
    Dim htable As Object, htable2 As Object
    Dim i As Integer, j As Integer, year As Integer, month As Integer
    Dim Row As Integer
    Dim url As String
    Dim url2 As String
    Dim stockid As String
    
    stockid = Sheets(1).Range("A2")
    year = Sheets(1).Range("B2")
    
    url = "http://mops.twse.com.tw/mops/web/ajax_t05st01?firstin=1&TYPEK=sii&co_id=" & stockid & "&year=" & year & "&month=&b_date=&e_date="

    Set html = CreateObject("htmlFile")
    Set html2 = CreateObject("htmlFile")
    
    html.body.innerHTML = encodeData(downloadData(url))
    Set htable = html.getElementsByTagName("table")(1)
    
    With ActiveSheet
        Row = .Range("A65536").End(xlUp).Row
        If Row < 5 Then
            Row = 5
        End If
        .Range("A5:F" & Row).Clear
        
        For i = 0 To htable.Rows.Length - 1
            For j = 0 To htable.Rows(i).Cells.Length - 1
                If (htable.Rows(i).Cells.Length - 2) >= j Then
                    .Cells(i + 1 + 3, j + 1) = Trim(htable.Rows(i).Cells(j).innerText)
                End If
            Next
            
            If (htable.Rows.Length - 2) >= i Then
                html2.body.innerHTML = encodeData(downloadData(recombineURL(html.getElementsByTagName("input")(14).onclick)))
                Set htable2 = html2.getElementsByTagName("pre")(1)
                .Cells(i + 2 + 3, j) = Trim(htable2.innerHTML)
            End If
        Next
    End With
    
    Set html = Nothing
    Set htable = Nothing
    Set html2 = Nothing
    Set htable2 = Nothing
End Sub

Function downloadData(url As String)
    Dim oXMLHTTP As Object
    Set oXMLHTTP = CreateObject("Msxml2.XMLHTTP")
    With oXMLHTTP
        .Open "GET", url, False
        .send
        If .Status = 200 Then
            downloadData = oXMLHTTP.ResponseBody
        Else
            MsgBox "無法取得資料"
        End If
    End With
    Set oXMLHTTP = Nothing
End Function

Function encodeData(str As String)
    Dim objStream As Object
    Set objStream = CreateObject("ADODB.Stream")
    With objStream
        .Open
        .WriteText str
        .Position = 0
        .Type = 2
        .Charset = "UTF-8"
        encodeData = .ReadText
        .Close
    End With
    Set objStream = Nothing
End Function

Function recombineURL(str As String)
    Dim tmp As String
    tmp = Replace(Split(Split(str, "{")(1), "}")(0), Chr(10), "")
    tmp = Replace(tmp, "'", "")
    tmp = Replace(tmp, "document.t05st01_fm.seq_no.value", "seq_no")
    tmp = Replace(tmp, ";document.t05st01_fm.spoke_time.value", "&spoke_time")
    tmp = Replace(tmp, ";document.t05st01_fm.spoke_date.value", "&spoke_date")
    tmp = Replace(tmp, ";document.t05st01_fm.co_id.value", "&co_id")
    tmp = Replace(tmp, ";document.t05st01_fm.TYPEK.value", "&TYPEK")
    tmp = "http://mops.twse.com.tw/mops/web/ajax_t05st01?firstin=1&" & Split(tmp, ";")(0) & "&step=2"
    recombineURL = Trim(tmp)
End Function

Sub ClearData()
    Dim Row As Integer
    With Sheets(1)
        Row = .Range("A65536").End(xlUp).Row
        If Row < 5 Then Row = 5
        .Range("A5:F" & Row).Clear
    End With
End Sub
或第二種寫法
Option Explicit

Sub 公開資訊觀測站公告()
    Dim tmp As String
    Dim html As Object, html2 As Object
    Dim htable As Object, htable2 As Object
    Dim i As Integer, j As Integer, year As Integer, month As Integer
    Dim Row As Integer
    Dim url As String
    Dim url2 As String
    Dim stockid As String
    
    stockid = Sheets(1).Range("A2")
    year = Sheets(1).Range("B2")
    
    url = "http://mops.twse.com.tw/mops/web/ajax_t05st01?firstin=1&TYPEK=sii&co_id=" & stockid & "&year=" & year & "&month=&b_date=&e_date="

    Set html = CreateObject("htmlFile")
    Set html2 = CreateObject("htmlFile")
    
    html.body.innerHTML = downloadData(url)
    Set htable = html.getElementsByTagName("table")(1)
    
    With ActiveSheet
        Row = .Range("A65536").End(xlUp).Row
        If Row < 5 Then
            Row = 5
        End If
        .Range("A5:F" & Row).Clear
        
        For i = 0 To htable.Rows.Length - 1
            For j = 0 To htable.Rows(i).Cells.Length - 1
                If (htable.Rows(i).Cells.Length - 2) >= j Then
                    .Cells(i + 1 + 3, j + 1) = Trim(htable.Rows(i).Cells(j).innerText)
                End If
            Next
            
            If (htable.Rows.Length - 2) >= i Then
                html2.body.innerHTML = downloadData(recombineURL(html.getElementsByTagName("input")(14).onclick))
                Set htable2 = html2.getElementsByTagName("pre")(1)
                .Cells(i + 2 + 3, j) = Trim(htable2.innerHTML)
            End If
        Next
    End With
    
    Set html = Nothing
    Set htable = Nothing
    Set html2 = Nothing
    Set htable2 = Nothing
End Sub

Function downloadData(url As String)
    Dim oXMLHTTP As Object
    Set oXMLHTTP = CreateObject("Msxml2.XMLHTTP")
    With oXMLHTTP
        .Open "GET", url, False
        .send
        If .Status = 200 Then
            downloadData = .ResponseText
        Else
            MsgBox "無法取得資料"
        End If
    End With
    Set oXMLHTTP = Nothing
End Function

Function recombineURL(str As String)
    Dim tmp As String
    tmp = Replace(Split(Split(str, "{")(1), "}")(0), Chr(10), "")
    tmp = Replace(tmp, "'", "")
    tmp = Replace(tmp, "document.t05st01_fm.seq_no.value", "seq_no")
    tmp = Replace(tmp, ";document.t05st01_fm.spoke_time.value", "&spoke_time")
    tmp = Replace(tmp, ";document.t05st01_fm.spoke_date.value", "&spoke_date")
    tmp = Replace(tmp, ";document.t05st01_fm.co_id.value", "&co_id")
    tmp = Replace(tmp, ";document.t05st01_fm.TYPEK.value", "&TYPEK")
    tmp = "http://mops.twse.com.tw/mops/web/ajax_t05st01?firstin=1&" & Split(tmp, ";")(0) & "&step=2"
    recombineURL = Trim(tmp)
End Function

Sub ClearData()
    Dim Row As Integer
    With Sheets(1)
        Row = .Range("A65536").End(xlUp).Row
        If Row < 5 Then Row = 5
        .Range("A5:F" & Row).Clear
    End With
End Sub
Excel 檔案畫面。

Python 抓取網頁資料的程式碼。
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
stock = 2330
year = 106
url= "http://mops.twse.com.tw/mops/web/ajax_t05st01?firstin=1&TYPEK=sii&co_id="+ str(stock) + "&year=" + str(year) + "&month=&b_date=&e_date="
res = requests.get(url)
res.encoding='utf-8'
soup = bs(res.text,"lxml")
tb = soup.select('table')[1]
df = pd.read_html(tb.prettify('utf-8'), encoding= 'utf-8')  
df[0]

兩種語言程式碼的比較。

有興趣Python或Excel VBA,可以參考比較,或隨時注意部落格最新消息公布的Exccl VBA爬蟲入門班開課時間。