Do multilingual people build bridges across countries?

Today Irene Eleta  from university of Maryland visited the lab with a seminar called   Multilingual Users of Twitter: Social Ties Across Language Borders or How a Story Could Travel the World.”  As me, she has been working a lot with Twitter for her PhD thesis. She is particularly interested in exploring the role of multilingual users in different social media platforms.  Among her challenges , she aims to find solutions in

  • classifying tweets in a certain language but quoting the name of songs/books/movies in another language.
  • Detecting automatic messages in different languages
  • Scripts of translators in arabic, jew, etc. (coding)

My suggestion in the first problem is to “ignore” single tweets where two languages are detected and only consider those that have a high probability of being only from one language. The reason is that many people use many english expressions without knowing English.  Users may be classified as multilingual when only they are using names of movies in English or using expressions etc.

On the other hand, (maybe what I was more interested about) was about the role of multilingual people in Information propagation.  The idea is to measure the real importance of these people in the moment of special events like protests, revolutions, crisis or catastrophic events such as earthquakes.  Some of the questions to answer would be:

  • Are previously classified multilingual people important somehow in propagating information in special events? My naive hypothesis would be “yes, they are.” Because they know other languages, multilingual people will care more about propagating information to the world so that the world can also understand what is going on….. and in particular, the language chosen to communicate with the world will be English. In order to explore this, temporal analysis will be needed.

Looking from another perspective, the detection of multilingual users and the study of their interactions can trigger the invention of new useful functionalities in many sites. For example, up to now I always have problems with language detections and spell checking using Gmail….wouldn’t it be nice if Gmail will know the “language” that you use with your friends and automatically change the spell checker? It seems for me that up to now Gmail saves the previously used spell checker…and it bother me a lot to be switching languages all the time to avoid those annoying red lines.

Posted in Research | Tagged , , | Leave a comment

Goodbye Aaron Swartz

Didn’t know much about Aaron Swartz until recently. He committed suicide after the fear of being in prison for almost all his life. He was a great programmer, activist, full of great  ideas about free information  and with many dreams. At age of 14 he co-authored the RSS 1.0 project. He seems to have been a great combination of good intentions, intelligence and fairness. He dreamed of freely sharing information, of finding a way around to the stupidity of patent laws (seriously some of them are truly ridiculous). Big loss for those who fight for fairness in this world.  There are two interesting blog posts I read so far about him, one from a someone who knew him personally  danah boyd  and other post arguing in his defense and the unfairness of the 35 year sentence. His family’s official statement is here, they blame MIT and JSTOR for this.

“Punishment sometimes don’t seem to fit the crime,” and definitely it was not fair to condem him for 35 years of prison just because he wanted to share the scientific articles of MIT to the world. People do that all the time. Sometimes you can get a scientific article just by asking the author. I feel saddened when these things happen, specially if it involves smart, creative and good people. Who knows the great things he could have done for humanity…  RIP for Aaron Swartz.

Good bye Aaron Swartz.

Here some explanation about some of the irony of Intelectual Property.

P.S: I refer to a great site in Spanish for “Hacktivistas y Cultura Libre” where a friend and former Scientist of Yahoo Labs! actively participates.

 

Update: Now, you can liberate knowledge in JSTOR LIBERATOR site! People like Aaron can leave this world but they will influence others to continue unfinished tasks.

Posted in Uncategorized | Leave a comment

PhD, things to keep in mind.

There is a talk I heard a couple of years ago, it was Marisa Meyer (current yahoo CEO) IT commencement address at Standford.  I can now understand and fully appreciate when she said

 ”Find the smartest people you can and surround yourself with them. Working with smart people means that you will be challenged to do your best. You have to strive to keep up with them and as a result they will elevate your thinking. When there are better players around you, you get better.”

It is hard though to realize all the things one have to do to cope with people we admire. I have been so lucky to meet so many “great” people here. It was exactly what I asked the world when I left my country. Keeping up with them is another story. I realized that my work methodology is chaotic and that it urgently needs to be improved. I am trying to do that this year.

So the things that I have learned observing people who do “good research and still enjoy free time” are the following :

  1.  Efficiency at work: no procrastination. Establish clear goals every day during work.   This is a big problem for me. I tend to multitask too much. Although it has been said that women multitask a lot, I think this is not good for research. I know there are people have different methods to cope with work. There are people at the lab even on Sundays but I guess that if one aims to have a “life,” so no procrastination and efficiency is a must.
  2.  Time management: Plan always the next action otherwise it will be hard to do everything you want to do…even the weekends.
  3. Team work: Finding a team to work. I think it is more productive and fun. If you code a lot, it would be great if you find a PhD partner who also likes coding , you can share work, discuss, motivate each other. If you plan to write two papers, one with your friend as first author and the other you as first author then even better.
  4. A good advisor: Advisors do not have time but comments from them are helpful. So in order to have a good feedback, you need to have results or hypothesis ready to show to your advisor. Their experience and help always are useful. I personally ask advice from people I admire.
  5. Love:  try to love what you are doing as much as you can.  This is  hard sometimes…an idea that you believe can be great and fun can turn out to be the worst nightmare. I feel it has happened to me but oh well… love “bites” sometimes, we have to keep trying.
Posted in Uncategorized | Leave a comment

Loving your job

When I finished high school I was convinced that I wanted to study International Affairs. I grew up in an environment where the most common topics of discussion were politics, history and sociology. On top of that, my first boyfriend was a sociologist that would absolutely love to talk about politics, laws, religion, etc.

Years later, I was lucky to do an internship at the UN for the Ecuadorian mission in New York where I attended several conferences and meetings including the meeting  Women 2000: Gender Equality , Development and Peace where a bunch of women got together to talk about the progress that was done on gender equality.

Ironically, it was precisely at the UN that I got disappointed about it all. I realized with much grief that people would spend so much time talking about how to write the paper of a meeting or how important it was for a country to show in that paper what their representatives have talked…I had the impression that the majority of people there didn’t really care about the solutions and the actions of very important things, they cared more about the protocols, the papers and the meetings and connections. I also felt like many of those delegates were not the right people to be there … I mean, it was difficult for me to understand how the presence of certain people could actually help in something.

Given that I always had a fascination for math , logic and programming I changed my mind and chose other major: Computers. At first it was Computer Science and then I changed to Computer Engineering when I got back to Ecuador.

The truth is that protocols, procedures and writing processes are important in almost every field. Connections as well, sometimes they are fair, sometimes they are not. I do believe that if you are brilliant in something, people will look for you, if you are brilliant in something you are lucky! But if you are just someone who struggles hard and who can make a good job after a lot of effort then connections always help. The sad part is when you are bad at something and you still get a good position in something you are bad at because of the connections.

I also realized that in life you find few passionate and courageous people in their jobs …. I have the impression that everybody is tired the majority of the time regardless the field chosen to work in. Few are the ones who really love their jobs…

Do I love my choice? I don’t know…. I want to discover it. Can I be someone who makes a bit of a difference in this?

I must confess, sometimes I do not find any meaning in what I am doing …sometimes I think I will be doing something more productive if I plant potatoes in the garden of my house in Ecuador. But maybe I am not the only one thinking that. At the end, research is about discovery … maybe I will find my passion soon in the middle of the screen among Pig scripts. What I want to discover though…is something that could be used to help people…in anything, but to help people.

What I love about this new world (Research) though is that I find very interesting fellows …a lot of the people I have come across have different talents and interests…. and almost all of them share authenticity in their personality. In my lab , beside researchers, you find musicians, athletics, dancers and writers…

I love getting to know women in Tech, despite the fact that we are only a small percentage in this field, so far they all have made such a great impression upon me.

Tomorrow is the New Chinese Year, the year of the rabbit….please dear rabbit let it be my year of discovery.

Posted in Uncategorized | Tagged | 1 Comment

PIG AND HADOOP CONFIGURED!

Experience

I finally managed to configure Pig and Hadoop on my computer. I used Pig 0.7 and Hadoop 0.20.2. It took me a while to configure but finally I made it.  Hadoop and Pig are constantly getting updated so don’t trust much on tutorials of older versions if you are not very experienced on the matter.  Nevertheless, I should mention this tutorial because it helped me a great deal in understanding how to configure hadoop. The only major misunderstanding was with the configuration of the ssh,  so if you are a beginner like me, be careful to mess with ssh .

Advices:

  1. Read the apache tutorials on Pig and Hadoop but be careful with some mistakes they make on the writing
  2. Use the tutorial that comes within the folder of Pig (the tutorial files they talk on the Pig tutorial are inside the Pig’s folder).
  3. Get the latest stable versions

1. Red the apache tutorials on Pig and Hadoop but be careful with some mistakes : It means that they do have some mistakes, for example on this part of the tutorial the id.pig  is:

A = load 'passwd' using PigStorage(':');
B = foreach A generate $0 as id;
dump B;
store B into ‘id.out’;

They forget to mention that you either use dump or store…you may have some errors if you use both. Second if you copied and pasted this code then you will for sure have an error instead change the last part with ‘id.out’ (not the same as above).   I also received an error with the following mapreduce script

Unix:   $ java -cp pig.jar:.:$HADOOPDIR idmapreduce

It can not find the passwd file on hdfs directory and  it does not have a logout file to write the results. Instead of figuring out the problem, I went ahead and ran another mapreduce job with another command from the next section of this tutorial (following the steps) and it worked!

$ java -cp $PIGDIR/pig.jar:$HADOOP_CONF_DIR  org.apache.pig.Main 
script1-hadoop.pig

So if this script worked fine then the previous one must have something wrong, I will test tomorrow if putting passwd on the hdfs  would eventually solve the problem.

2. Use the tutorial that comes within the folder of Pig (the tutorial files they talk on the Pig tutorial are inside the Pig’s folder) and 3. Get the latest stable versions

This is important because there are changes between versions. I made a stupid stupid mistake on this. I did not know that the files used for testing on the Pig’s tutorials are actually inside a folder called “tutorial” inside my Pig’s folder. So I downloaded a tutorial of a previous Pig’s version….and of course I kept getting mistakes since I was running with a later version of Pig.  After I made the  appropriate corrections , it worked!!

The errors I was having were scary and hard to interpret , I got for example: “INFO executionengine.HExecutionEngine: Connecting to hadoop file system at: file:///”  and “ERROR mapReduceLayer.MapReduceLauncher: java.io.IOException: excite.log.bz2 does not exist” (posted here).

It was finally solved when I used the appropriate tutorial files. It was not easy to figure it out.

Future work

Well, now that I have Pig and hadoop running smoothly, I will start to make a lot of experiments. My task is to give a “score” to tweets according to a list of words with or without weights.  So for example if my tweet of 8 words is  “Samsung Launching New Android Device on November   http://on.mash.to/9wJbGC” and my list has three words Iphone BlackBerry and Android , the total weight of this tweet will be 1/8. Things get more complicated when I have to filter content and use weights…. I will run my experiments in one large file containing a lot of tweets and THEN after having it right I will run in the cluster of yahoo…which has huge amount of data.

Questions

  1. Should we consider numbers and urls as words? I was told that urls should be considered as counting words but I am a little reluctant about it. (of course RT,  via, @… will not be considered)
  2. I am afraid with regard to the languages….how to sort that?

Motivation

I will assist remotely to a class in California LA regarding Pig (introductory 2 hour course) and given by yahoo :)

Posted in Research, Uncategorized | Tagged , , | Leave a comment

Going to Yahoo R +D

Started my phd with a lot of work. After two weeks of getting things ready and beautifying my desk I was moved to yahoo R&D where I am supposed to stay at least half of the day… but in the reality I am staying almost the whole day due to the complexity of my tasks.

I am motivated of course but I have to do a lot of things I have never done before…lots of learning these days.

What I can say is that the project I am getting into is very interesting because it will analyze “diversity” of opinions and cultural differences…At least try to catch that from what people say online.

Hadoop + Pig = me crazy.

How is that for my first post?

Posted in Research | Tagged , , | Leave a comment

Hello world!

Hello world, this is my new blog!.

I have always liked to write what I do and think.  Since 11 y.o I carry a diary, the frequency has decreased a big deal and I do not update my diary on a paper anymore, now everything I write is digital. I think that is one of the reasons of my very bad handwriting.

What does this blog differentiate from the previous one? In this blog, I will focus more on my PHD and everything that I discover along the way. In other words, no drama, no love stories, just science, code and of course some observations about life in general.

I will write mostly in English but I may be tempted to write in Spanish a couple of times. English is not my native language but I take it as a challenge and a way to practice my writing skills.

Posted in Uncategorized | Leave a comment