************************************************************************                                           
                               README
                            Elec dataset
                             March 2015
    Electronics product review dataset for sentiment classification 
************************************************************************

Contents:
---------
1. Data Description
2. Filename Conventions
3. Experiments in [1]
4. Contact
5. References

---------------
1. Data Description 
This dataset consists of electronics product reviews associated with 
binary labels (positive/negative) for sentiment classification, used in 
[1].  The data was derived from Electronics.tar.gz at 

  http://snap.stanford.edu/data/web-Amazon.html (also see [2]) 
  
which is a large dataset of Amazon reviews.  

The data only includes the text section.  The summary section is not 
included.  The labels are balanced so that each set contains the same 
number of positive reviews and negative reviews.  The products reviewed 
in the test set are disjoint from the products reviewed in the training 
sets.  Each training set is ordered so that if we divide it into five 
partitions of the equal size, the reviewed products in each partition 
are disjoint from the rest.  

-----------------------
2. Filename Conventions

Training set: elec-${size}-train.${ext}; size: the number of reviews. 
Test set:     elec-test.${ext} 
  
ext=txt     : Text data before tokenization.  One review per line. 
ext=txt.tok : Tokenized data.  Tokens are separated by blank.  
              One review per line in the same order as *.txt. 
ext=cat     : Labels. One label per line in the same order as *.txt. 
              "1": negative, "2": positive.

---------------------    
3. Experiments in [1]     

To reproduce the experiments, download the code archive and use 
test2/train_elec_*.sh. 

----------
4. Contact 
riejohnson@gmail.com

-------------
5. References
[1] Rie Johnson and Tong Zhang.  Effective use of word order for text 
categorization with convolutional neural networks.  NAACL HLT 2015.  

[2] Julian McAuley and Jure Leskovec.  Hidden factors and hidden topics: 
understanding rating dimensions with review text.  RecSys, 2013. 
