python - Efficient way of exporting large R dataset to excel -
As a title, I have a dataset with approximately 13000 rows and 255 columns (in fact I have more than 255 columns but
I tried RODBC / P> Code> and xlsx package takes more than 5 minutes to export. I wonder if there is a more efficient way to do this? I knew a little dragon (listed emails in the mailbox Instead of using a python to connect to the point of view), if there is a way to export using python, instead it is also welcome.
Update 01 < P> There is a lot to suggest for using CSV, in my case this is not possible because it has a field with free fields which I can not control Update 02 < P> Thanks for the suggestions, but me So if that R package rightly Detafrem is relatively small and slow with a character all columns Detafim any suggestions?
There are several options:
- Code> XLSX > With multiple sheets (you have tried to do it and it is very slow, I know)
- Use
write.csv and should be fast and excel it Read by - Use
bigmemory to use RODBC - within
odbcConnectExcel2007 For large dataframes, especially if you can make it into a sparse matrix -
XLConnect has worked for the same problem - Write it in a SQL database with
RODBC or < Code> RPostgreSQL , etc. and then create a connection to DB in Excel. I do a lot of this. - Create a
tab-delimited text file and then import it into Excel: write.table (table, sep = - Try (I'm not sure if this is not \ "\ t \", quote = FALSE, line.Name = FALSE, file = file.name) will actually be faster, but This will be a trendy solution with at least the added benefit, such as providing a good way to safely store your data and you can use Excel's Whatever you should ask by using it)
- Finally, there are "one lakh ways to connect to R and Excel", which you can find useful, although I think I actually You have given more options than the article
I will start with the simplest solution like fread , then work your way for more complex solutions if you still want the result Are not receiving. Depending on the exact nature of your project, you can also benefit from equality or multicore processing. Which in most cases do not promote your I / O speed, but it can accelerate any processing / alteration of your data in your processing, so that your overall data pipeline is faster.
Python too is well-equipped to handle this problem, but there are so many solutions within R , hopefully you only There will be no need to switch languages to write data. However, you can try openpyxl's customized reader and author -
XlsxWriter in static memory mode, or -
package
If you want to try a Python-based solution.
Comments
Post a Comment