-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem on importing data #2
Comments
Hi @SyuuGenn , there are two possibilities:
In order to check what is really happening, try Sorry to say, but doing text analysis on Windows is really hard because of its poor support of Unicode. I use Windows machine, but do text analysis on Linux in Virtual Box. This is what you should do if you are seriously want to analyze non-Chinese texts. |
@SyuuGenn I forget to mention that the problem might not be about importing but dispalying a dfm on Windows (it is a known bug). On my non-Japanese Windows, data <- read.csv('C:/Users/Kohei/Desktop/asahi.csv', sep = "\t",
stringsAsFactors = FALSE, encoding = 'UTF-8')
head(data, 2)
# date edition section page length
#text592027 2016-01-01 <U+671D><U+520A> 3<U+7DCF><U+5408> 3 1288
# text592028 2016-01-01 <U+671D><U+520A> 3<U+7DCF><U+5408> 3 595
# head
# text592027 <U+89E3><U+6563><U+6642><U+671F><U+3001><U+653F><U+6A29><U+898B><U+6975><U+3081> <U+652F><U+6301><U+7387>·<U+682A><U+4FA1><U+3082><U+8003><U+616E><U+304B> <U+540C><U+65E5><U+9078><U+8996><U+91CE>
# text592028 <U+5927><U+7D71><U+9818><U+5E9C><U+304C><U+8AC7><U+8A71><U+3001><U+4E16><U+8AD6><U+6C88><U+9759><U+5316><U+56F3><U+308B> <U+65E5><U+97D3><U+5408><U+610F><U+53D7><U+3051>2<U+5EA6><U+76EE>
# hash year month
# text592027 8b94af77cf10b662e4728e89257d252b 2016 1
# text592028 2c974c3cdb7a2e995fda5316d1bf6961 2016 1
as.matrix(head(data, 2))
# date edition section page length head
# text592027 "2016-01-01" "朝刊" "3総合" "3" "1288" "解散時期、政権見極め 支持率・株価も考慮か 同日選視野"
# text592028 "2016-01-01" "朝刊" "3総合" "3" " 595" "大統領府が談話、世論沈静化図る 日韓合意受け2度目"
# hash year month
# text592027 "8b94af77cf10b662e4728e89257d252b" "2016" "1"
# text592028 "2c974c3cdb7a2e995fda5316d1bf6961" "2016" "1" In short, |
@koheiw Thank you very much! Short after I posted this issue, I bought a MacBook. This problem does not exist in MacOS. |
Nice, but can you check if |
I tried the |
Thanks. It seems like importing problem. |
Dear Dr. Watanabe,
I'm using your tutorial on Japanese text analysis, but I got a problem at the first step. I successfully imported your sample date "asahi.csv" using the code
data <- read.csv('data/asahi.csv', sep = "\t", stringsAsFactors = FALSE, encoding = 'UTF-8')
but when I use
head(data)
, something wrong happened.Some data are not located in the right place. Can you help me fix this problem?
Here are my environment information
The text was updated successfully, but these errors were encountered: