環境資源報告成果查詢系統

環境資源網站典藏系統建置

中文摘要 本計畫之目的,係建置一個供歷史網站資料下線及保存之「環境資源網站典藏系統」。除了可蒐錄歷年具有重要參考價值之網站資料及重大環境資訊事件,達成擴大環境品質來源資料蒐集與整合之目的外,亦可避免因網站無法更新疏於維護所引發之資安事件。 本計畫主要工作為分析、設計、開發與建置「環境資源網站典藏系統」。其主要作法係採用網站擷取複製機制(Website Copier),以「爬網(web crawling)」的方式,進行網站擷取作業,以建立該網站之歷史備份。同時透過網站詮釋資料與網頁全文索引之建立,提供民眾瀏覽與查詢網站歷史版本之功能。計畫執行過程參考國內外之典藏網站系統並透過需求訪談進行系統之分析與設計,同時採用與國家圖書館網站典藏系統和台灣大學網站典藏系統相同之網站擷取核心進行加值開發,並設計成使用環保署北部與中部機房分散式擷取網站機制,以滿足本計畫之需求。 為確保未來網站典藏機制與流程能有所依據,本計畫亦參考國家圖書館網站典藏規範,設計一套「行政院環境保護署環境資源網站典藏作業規範」。除明定環境資源網站典藏系統所典藏之網站領域範圍、檔案類型、典藏作業方式與原則、相關授權條款外,也擬定了「環境資源網站典藏作業流程」及「單位申請網站典藏表」,以利未來之典藏作業之執行參考。 本計畫於本年度合約規定典藏至少80個網站版本,截至期末為止,環境資源網站典藏系統已完成126個網站之收錄,總計抓取約108萬個檔案。並利用已典藏之網站內容,建置一個以節能減碳為主題的典藏主題網。然因受限於網站典藏技術先天限制,並非所有網站內容均能完整無誤的保存下來。本計畫亦針對無法有效典藏之網站內容進行分析整理,以作為未來網站典藏標的挑選與網站開發時之參考。 隨著網際網路的快速發展,網站典藏已成為數位典藏中重要的一環,本計畫提供了一個網站下架典藏的管道與流程機制。除了建立一個可追溯性網站歷史資料、保留重要紀錄的平台外,也能提供知識庫建構與整合收納環保領域相關資訊的功用,進而提供趨勢變化觀察的效用。本計畫之成果,未來可進一步推廣至其他機關,以協助解決機關內因組織調整或時效、資安等因素所產生之網站下架需求。並可透過策略聯盟,結合其他網站典藏機構以進行資源整合,藉此擴大本計畫之成效。
中文關鍵字 網站典藏;數位典藏;環境資源

基本資訊

專案計畫編號 EPA-99-L103-03-002 經費年度 099 計畫經費 4500 千元
專案開始日期 2010/01/26 專案結束日期 2010/12/31 專案主持人 潘振宇
主辦單位 監資處 承辦人 葉麗雲 執行單位 凌網科技股份有限公司

成果下載

類型 檔名 檔案大小 說明
期末報告 環境資源網站典藏系統期末報告(公開版).pdf 4MB

Environmental Information Web Archive System Establishment Project

英文摘要 The purpose of this project is to build an “Environmental Information Web Archive System” that copies and archives old website data. In addition to archiving valuable reference material and information on major environmental events, as well as achieving the goal of having an expanded quality source of collected and integrated environmental information, the system will also circumvent data security issues resulting from websites that are not updatable and maintenance negligence. The scope of this project consists of analyzing, designing, developing, and building an “Environmental Information Web Archive System.” The main approach is to use a Website Copier’s web crawling mode to collect websites and then back up the corresponding old data. At the same time, metadata is created and full-text indices are built which enable regular users to browse and search previous versions of websites. The project implementation process involves incorporating ideas from international and domestic web archive systems as well as requirement evaluations to analyze and design the system while using the cores of the archive systems of the National Central Library and National Taiwan University for value-added development. To meet the needs of this project, the system utilizes the Environmental Protection Administration’s IDCs located in both northern and central Taiwan, thereby adopting a decentralized method to collect web information. In order to establish a basis for future web archiving systems and processes, we referred to the National Central Library’s archive system to design a set of policies: the “Environmental Protection Administration Executive Yuan Environmental Information Web Archive Policies.” These policies specify the types of websites the Environmental Information Web Archive System will archive, collection methods and principles, and relevant licensing terms. The policies also formulate the “Environmental Information Web Archive Process” and the “Web Archive Application Form” to facilitate the implementation of future archival work. The contract requires archiving at least 80 websites by the end of the current year. The Environmental Information Web Archive System has already archived 126 websites and collected approximately 1.08 million files, and used this content to build a website devoted to energy conservation and carbon reduction. However, due to the inherent limitations of web archiving technology, not all content from web pages can be completely saved and free of errors. This project also analyzed and categorized these limitations and acted as a reference for future website selection and development purposes. With the rapid development of the Internet, web archiving has become an important part of digital archiving. This project offers a mechanism for archiving old website data and developed a platform that saves important records and provides searchable versions of old website versions. It can also construct and integrate knowledge databases filled with environmental related information to help people observe changes in trends. The project’s achievements can be further applied to other organizations to help solve the need for removing website information due to internal organizational restructuring, expiration, information security, and so on. Furthermore, strategic alliances can be formed with other web archive agencies to combine resources to increase the effectiveness of the project.
英文關鍵字 Web Archive;Digital Archive;Environmental Resources