Lượm lặt một số nguồn data raw để xử lý:
-http://viet.jnlp.org/download-du-lieu-tu-vung-corpus -https://vi.wikipedia.org/wiki/Khai_th%C3%A1c_v%C4%83n_b%E1%BA%A3n -https://github.com/trannguyenhan/preprocessing-data -https://blog.luyencode.net/phan-loai-van-ban-tieng-viet/ -https://vi.wikipedia.org/wiki/Th%E1%BB%83_lo%E1%BA%A1i:Ti%E1%BA%BFng_Vi%E1%BB%87t -https://vi.wikipedia.org/wiki/N%C3%B3i_l%C3%A1i -https://vi.wiktionary.org/wiki/Th%E1%BB%83_lo%E1%BA%A1i:M%E1%BB%A5c_t%E1%BB%AB_ti%E1%BA%BFng_Vi%E1%BB%87t